tag:blogger.com,1999:blog-4594832939334410220Wed, 20 Sep 2017 22:01:28 +0000social psychologymath and statisticssciencetrivialpoliticsreviewcognitionblogging a to zbookscognitive biasesmoviesmental healthidentitygendercelebritiesgrad schoolmusicnonconscious processeshuman braincomputersbehaviorismpublish or perishstereotypesdogs and puppieshorror moviesStatistics Sundayphysical healthlistsmemorydevelopmental psychologywritingequal rightsphysiologyreadingstatistical sinsBtVSresearch ethicsself-awarenessFreudliterary elementscult filmsreviewsself-regulationgreat minds in statisticsheuristicsintrusive thoughtsruminationsleepsleep deprivationweight managementcredibilitydreamsfluencyhigher educationinspirationmedianetflixself-efficacysportstheatreweatherbeeropen letterrole theorytrollsbasketballforgivenessgay prideinsomniamarriageStranger ThingsperceptionsensationDeeply Trivialhttp://www.deeplytrivial.com/noreply@blogger.com (Sara)Blogger653125tag:blogger.com,1999:blog-4594832939334410220.post-268420870914004453Mon, 18 Sep 2017 19:19:00 +00002017-09-18T15:10:12.490-05:00booksmath and statisticsreadingwritingWords, Words: From the Desk of a PsychometricianI've decided to start writing more about my job and the types of things I'm doing as a psychometrician. Obviously, I can't share enough detail for you to know <i>exactly </i>what I'm working on, but I can at least discuss the types of tasks I encounter and the types of problems I'm called to solve. (And if you're curious what it takes to be a psychometrician, I'm working on a few posts on that topic as well.)<br /><br />This week's puzzle: examining readability of exam items. Knowing as much as we do about the education level of our typical test-taker - and also keeping in mind that our exams are supposed to measure knowledge of a specific subject matter, as opposed to reading ability - it's important to know how readable are the exam questions. This information can be used when we revise the exam, and could also be used to update our exam item writing guides (creating a new guide is one of my ongoing projects).<br /><br />Anyone who has looked at the readability statistics in Word knows how to get Flesch-Kinkaid statistics: reading ease and grade level. Reading ease, which was developed by Rudolph Flesch, is a continuous value based on the average number of words per sentence and average number of syllables per word; higher scores mean the text is easier to read. The US Army, led by researcher John Kinkaid, created grade levels based on the reading ease metric. So the grade-level you receive through your analysis reflects the level of education necessary to comprehend that text.<br /><br />And to help put things in context, the average American reads at about a <a href="http://www.clearlanguagegroup.com/readability/">7th grade level</a>.<br /><br />The thing about Flesch-Kinkaid is that it isn't always well-suited for texts on specific subject matters, especially those that have to use some level of jargon. In dental assisting, people will encounter words that refer to anatomy or devices used in dentistry. These multisyllabic words may not be familiar to the average person, and may result in higher Flesch-Kinkaid grade levels (and lower reading ease), but when placed in the context for practicing dental assistants - who would learn these terms in training or on-the-job - they're not as difficult. And as others have pointed out, there are common multisyllabic words that aren't difficult. Many people - even people with low reading ability - probably know words like "interesting" (a 4-syllable word).<br /><br />So my puzzle is to select readability statistics that are unlikely to be "tricked" by jargon, or at least find some way to take that inflation into account. I've been reading about some of the other readability statistics - such as the Gunning FOG index, where FOG stands for (I'm not kidding) "Frequency of Gobbledygook." Gunning FOG is very similar to Flesch-Kinkaid: it also takes into account average words per sentence and, instead of average syllables, looks at average number of complex (3+ syllables) words. But there are other potential readability statistics to explore. One thing I'd love to do is to generate a readability index for each item in our exam pools. The information, along with difficulty of the item and how it maps onto exam blueprints, could become part of item metadata. But that's a long-term goal.<br /><br />To analyze the data, I've decided to use R (though Python and its Natural Language Processing tools are another possibility). Today I discovered the <a href="https://cran.r-project.org/web/packages/koRpus/vignettes/koRpus_vignette.pdf">koRpus package</a> (R package developers love capitalizing the r's in package names). And I've found the <a href="https://github.com/kbenoit/readtext">readtext</a> package that can be used to pull in and clean text from a variety of formats (not just txt, but JSON, xml, pdf, and so on). I may have to use these tools for a text analysis side project I've been thinking of doing.<br /><br />Completely by coincidence, I also just started reading <i><a href="https://www.amazon.com/Nabokovs-Favorite-Word-Mauve-Bestsellers/dp/1501105388">Nabokov's Favorite Word is Mauve</a></i>, in which author Ben Blatt uses different text analysis approaches on classic and contemporary literature and popular bestsellers. In the first chapter, he explored whether avoidance of adverbs (specifically the -ly adverbs, which are despised by authors from Ernest Hemingway to Stephen King) actually translates to better writing. In subsequent chapters, he's explored differences in voice by author gender, whether great writers follow their own advice, and how patterns of word use can be used to identify authors. I'm really enjoying it.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://www.amazon.com/Nabokovs-Favorite-Word-Mauve-Bestsellers/dp/1501105388" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1600" data-original-width="1055" height="320" src="https://3.bp.blogspot.com/-bwFzExg21qU/WcAaOHbbF_I/AAAAAAAAJQg/03-8mDVXYEouaK3zYHkyahOATNyxAbjIgCLcBGAs/s320/Nabokov.jpg" width="211" /></a></div><br />Edit: I realized I didn't say more about getting Flesch-Kinkaid information from Word. Go to Options then Proofing and select "Show readability statistics." You'll receive a dialogue box with this information after you run Spelling and Grammar Check on a document.http://www.deeplytrivial.com/2017/09/words-words-from-desk-of-psychometrician.htmlnoreply@blogger.com (Sara)3tag:blogger.com,1999:blog-4594832939334410220.post-2792698619685339393Sun, 17 Sep 2017 13:30:00 +00002017-09-17T08:30:02.514-05:00grad schoolmath and statisticsStatistics SundayStatistics Sunday: What Are Degrees of Freedom? (Part 2)Last week, in <a href="http://www.deeplytrivial.com/2017/09/statistics-sunday-what-are-degrees-of.html">part 1</a>, I talked about degrees of freedom as the number of values that are free to vary. This is where the name comes from, of course, and this is still true in part 2, but there’s more to it than that, which I’ll talk about today. <br /><br />In the part 1 example, I talked about why degrees of freedom for a <a href="http://www.deeplytrivial.com/2017/04/t-is-for-t-test.html">t-test</a> is smaller than sample size – 2 fewer to be exact. It’s because all but the last value in each group is free to vary. Once you get to that last value in determining the group mean, that value is now determined – from a statistical standpoint, that is. But that’s not all there is to it. If that was it, we wouldn’t really need a concept of degrees of freedom. We could just set up the table of t critical values by sample size instead of degrees of freedom. <br /><br />And in fact, I’ve seen that suggested before. It could work in simple cases, but as many statisticians can tell you, real datasets are messy, rarely simple, and often require more complex approaches. So instead, we teach concepts that become relevant in complex cases using simple cases. A good way to get your feet wet, yes, but perhaps a poor demonstration of why these concepts are important. And confusion about these concepts - even among statistics professors - remains, because some of these concepts just aren't intuitive.<br /><br />Degrees of freedom can be thought as the number of independent values that can be used for estimation.<br /><br />Statistics is all about estimation, and as statistics become more and more complex, the estimation process also becomes more complex. Doing all that estimating requires some inputs. The number of inputs places a limit on how many things we can estimate, our outputs. That’s what your degrees of freedom tells you – it’s how many things you can estimate (output) based on the amount of data you have to work with (input). It keeps us from double-dipping - you can't reuse the same information to estimate a different value. Instead, you have to slice up the data in a different way.<br /><br /><b>Degrees of freedom measures the statistical fuel available for the analysis.</b><br /><b><br /></b>For analyses like a t-test, we don’t need to be too concerned with degrees of freedom. Sure, it costs us 1 degree of freedom for each group mean we calculate, but as long as we have a reasonable sample size, those 2 degrees of freedom we lose won't cause us much worry. We need to know degrees of freedom, of course, so we know which row to check in our table of critical values – but even that has become an unnecessary step thanks to computer analysis. Even when you’re doing a different t-test approach that alters your degrees of freedom (like Welch’s t, which is used when the variances between your two groups aren’t equal – more on that test later, though I've mentioned it <a href="http://www.deeplytrivial.com/2017/05/statistics-sunday-getting-started-with-r.html">once before</a>), it’s not something statisticians really pay attention to.<br /><br />But when we start adding in more variables, we see our degrees of freedom decrease as we begin using those degrees of freedom to estimate values. We start using up our statistical fuel.<br /><br />And if you venture into even more complex approaches, like <a href="http://www.deeplytrivial.com/2017/04/g-is-for-goodness-of-fit.html">structural equation modeling</a> (one of my favorites), you’ll notice your degrees of freedom can get used up very quickly – in part because your input for SEM is not the individual data but a matrix derived from the data (specifically a covariance matrix, which I should also blog about sometime). That was the first time I remember being in a situation where my degrees of freedom didn't seem limitless, where I had to simplify my analysis because I had used up all my degrees of freedom, and not just once. Even very simple models could be impossible to estimate based on the available degrees of freedom. I learned that degrees of freedom isn’t just some random value that comes along with my analysis.<br /><br />It’s a measure of resources for estimation and those resources are limited.<br /><br />For my fellow SEM nerds, I might have to start referring to saturated models – models where you’ve used up every degree of freedom – as “out of gas.” <br /><br />Perhaps the best way to demonstrate degrees of freedom as statistical fuel is by showing how degrees of freedom are calculated for the <a href="http://www.deeplytrivial.com/2017/05/statistics-sunday-analysis-of-variance.html">analysis of variance (ANOVA)</a>. In fact, it was <a href="http://www.deeplytrivial.com/2017/04/f-is-for-ronald-fisher.html">Ronald Fisher</a> who came up with both the concept of degrees of freedom and the ANOVA (and the independent samples t-test referenced in part 1 and again above). Fisher also came up with the correct way to determine degrees of freedom for Pearson’s <a href="http://www.deeplytrivial.com/2017/06/statistics-sunday-chi-square-anova-for.html">chi-square</a> – much to the chagrin of Karl Pearson, who was using the wrong degrees of freedom for his own test. <br /><br />First, remember that in ANOVA, we’re comparing our values to the grand mean (the overall mean of everyone in the sample, regardless of which group they fall in). Under the <a href="http://www.deeplytrivial.com/2017/07/statistics-sunday-null-and-alternative.html">null hypothesis</a>, this is our <a href="http://www.deeplytrivial.com/2017/08/statistical-sins-regression-to-mean.html">expected value</a> for all groups in our analysis. That by itself uses 1 degree of freedom – the last value is no longer free to vary, as discussed in part 1 and reiterated above. (Alternatively, you could think of it as spending 1 degree of freedom to calculate that grand mean.) So our total degrees of freedom for ANOVA is N-1. That's always going to be our starting point. Now, we take that quantity and start partitioning it out to each part of our analysis. <br /><br />Next, remember that in ANOVA, we’re looking for effects by partitioning variance – variance due to group differences (our <i>between groups</i> effect) and variance due to chance or error (our <i>within group</i> differences). Our degrees of freedom for looking at the between group effect is determined by how many groups we have, usually called <i>k</i>, minus 1.<br /><br />Let’s revisit the movie theatre example from the ANOVA post.<br /><br />Review all the specifics <a href="http://www.deeplytrivial.com/2017/05/statistics-sunday-analysis-of-variance.html">here</a>, but the TL;DR is that you're at the movie theatre with 3 friends who argue about where to sit in the theatre: front, middle, or back. You offer to do a survey of people in these different locations to see which group best enjoyed the movie, because you're that kind of nerd.<br /><br />If we want to find out who had the best movie-going experience of people sitting in the front, middle, or back of the theatre, we would use a one-way ANOVA comparing 3 groups. If <i>k</i> is 3, our between groups degrees of freedom is 2. (We only need two because we have the grand mean, and if we have two of the three group means - the between groups effect - we can figure out that third value.)<br /><br />We subtract those 2 degrees of freedom from our total degrees of freedom. If we don’t have another variable we’re testing – another between groups effect – the remaining degrees of freedom can all go toward estimating within group differences (error). We want our error degrees of freedom to be large, because we take the total variance and divide it by the within group degrees of freedom. The more degrees of freedom we have here, the smaller our error, meaning our statistic is more likely to be significant.<br /><br />But what if we had another variable? What if, in addition to testing the effect of seat location (front, middle, or back), we also decided to test the effect of gender? We could even test an interaction between seat location and gender to see if men and women have different preferences on where to sit in the theatre. We can do that, but adding those estimates in is going to cost us more degrees of freedom. We can't take any degrees of freedom from the seat location analysis - they're already spoken for. So we take more degrees of freedom from the leftover that goes toward error.<br /><br />For gender, where <i>k</i> equals 2, we would need 1 degree of freedom. And for the interaction, seat location X gender, we would multiply the seat location degrees of freedom by the gender degrees of freedom, so we need 2 more degrees of freedom to estimate that effect. Whatever is left goes in the error estimate. Sure, our leftover degrees of freedom is smaller than it was before we added the new variables, but the error variance is also probably smaller. We’re paying for it with degrees of freedom, but we’re also moving more variance from the error row to the systematic row.<br /><br />This is part of the trade-off we have to make in analyzing data – trade-offs between simplicity and explaining as much variance as possible. In this regard, degrees of freedom can become a reminder of that trade-off in action: what you’re using to run your planned analysis. <br /><br />It's all fun and games until someone runs out of degrees of freedom.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-sLV38uXViBc/Wbb5no9z7vI/AAAAAAAAJNs/1JJRrU20tQofUjXhbMeyfufJ4e4AgeBOwCLcBGAs/s1600/fuel-gauge-163728.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="837" data-original-width="1600" height="167" src="https://3.bp.blogspot.com/-sLV38uXViBc/Wbb5no9z7vI/AAAAAAAAJNs/1JJRrU20tQofUjXhbMeyfufJ4e4AgeBOwCLcBGAs/s320/fuel-gauge-163728.jpg" width="320" /></a></div>http://www.deeplytrivial.com/2017/09/statistics-sunday-what-are-degrees-of_17.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-14239749382176872Fri, 15 Sep 2017 13:00:00 +00002017-09-15T09:29:56.075-05:00great minds in statisticsmath and statisticsscienceGreat Minds in Statistics: Paul LévyFor today's Great Minds in Statistics post, I'd like to introduce you to French mathematician Paul Lévy (happy 131st birthday!), who contributed so many concepts to mathematics and statistics, that his Wikipedia article is basically just his name followed by math terms over and over again.<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://2.bp.blogspot.com/--zXV-WsPdUE/WbrlePz48oI/AAAAAAAAJPc/f1lK7e1C67Y0bLQGRpjpv9pgkp7KC-9FACLcBGAs/s1600/Paul_Pierre_Levy_1886-1971.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="400" data-original-width="290" height="320" src="https://2.bp.blogspot.com/--zXV-WsPdUE/WbrlePz48oI/AAAAAAAAJPc/f1lK7e1C67Y0bLQGRpjpv9pgkp7KC-9FACLcBGAs/s320/Paul_Pierre_Levy_1886-1971.jpg" width="232" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">By Konrad Jacobs, Erlangen - <a class="external free" href="http://owpdb.mfo.de/detail?photo_id=2531" rel="nofollow">http://owpdb.mfo.de/detail?photo_id=2531</a>, <a href="http://creativecommons.org/licenses/by-sa/2.0/de/deed.en" title="Creative Commons Attribution-Share Alike 2.0 de">CC BY-SA 2.0 de</a>, <a href="https://commons.wikimedia.org/w/index.php?curid=7824258">Link</a></td></tr></tbody></table>Lévy comes from a family of mathematicians. He excelled in math early, and received his education at École Polytechnique then École des Mines. He became a professor at École des Mines in 1913, then returned to École Polytechnique as professor in 1920.<br /><br />Much of his work is on the topic of sequences of random events, what we call "stochastic processes." That is, each event in the string of events has an associated probability. You would study the value of each event (the outcome) across time and/or space. This mathematical concept is frequently used in studying things like behavior of the stock market or growth of bacteria.<br /><br />One type of stochastic process is called a random walk; random walks describe a path within a mathematical space. It can be used to describe literal movement, such as the path of an animal looking for food, or more figurative movement, such as the financial gains and losses of a gambler. Though the term <i>random walk</i> was created by Karl Pearson, Lévy did a great deal of research into this concept, identifying special cases of random walks (such as the so-called Lévy flight).<br /><br />Lévy also identified an interesting probability puzzle known as a martingale: a random process where the expected value on the next observation is equal to the previously observed value. For example, the best guess of what an interest rate will be tomorrow is what it is today.<br /><br />There are some interesting stories about where the name "martingale" comes from, with some arguing that it comes from the device used with horses; a martingale hooks around the horse's head and connects to a strap on the neck, to keep the horse from moving its head too far up or down. But wherever the name comes from, when Lévy described it, he drew upon a particular approach to gambling made popular in 18th century France. What it involves is doubling one's bet with each loss, so the goal is to recoup lost money while also making a profit.<br /><br />Theoretically, this strategy is winning, because if the game is fair, I won't lose every time, and I'll get the money I lost back. The problem in practice is that the gambler could go broke before he or she gets far enough to win anything. Sure, one good hand would turn everything around. But each bad hand gets the gambler deeper into debt. Of course, I could also bankrupt the house in the meantime. This strategy isn't so much a strategy; it just depends on the game being fair and probability doing its thing (eventually). The martingale is one of the reasons casinos place limits on how much you can bet.<br /><br />To tie these two concepts together, a random walk could be a martingale if it has no trend. That is, if each step one direction is counteracted by a step in the opposite direction, the trend line will be flat. So the expected value is always the same: 0.<br /><br />A few more facts on Lévy:<br /><br />Like many statisticians, he was called upon to assist with the war effort during World War I.<br /><br />In addition to the information above, he contributed to topics of functional analysis, differential equations, and partial differential equations. And though today, he is considered the forefather of many modern concepts, he wasn't viewed as very important in his time. (There was quite a bit of snobbery from pure mathematicians about statistics. It was considered glorified arithmetic, inspired by such low-brow activities as gambling.)<br /><br />During World War II, he was fired from his job as professor at École Polytechnique, because of laws discriminating against Jews. His job was reinstated, though, and he remained there until retiring in 1959.<br /><br />Both his daughter (Marie-Hélène Schwartz) and son-in-law (Laurent Schwartz) were also mathematicians.<br /><br />Some of his research, which was considered esoteric at the time, has turned out to have incredibly important applications. This is why I argue with people when they say we should fund applied rather than basic research - you never know when or how basic research will end up being useful.http://www.deeplytrivial.com/2017/09/great-minds-in-statistics-paul-levy.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-4074142495407247535Thu, 14 Sep 2017 16:05:00 +00002017-09-14T14:02:22.933-05:00math and statisticsphysical healthscienceWhy Statistics on Airline Safety Are Out of DateEvery time you fly, you hear the airplane safety demonstration. What to do if you lose cabin pressure, the location of your life vest, and so on. Research and statistical modeling has been conducted to ensure that, in an emergency, people have a high chance of getting out safely, and to know exactly what that chance is in different scenarios.<br /><br />What you may not know is that the reduction of coach legroom is not only annoying - it's dangerous, <a href="http://www.thedailybeast.com/flying-coach-is-so-cramped-it-could-be-a-death-trap">because it nullifies that research and modeling</a>:<br /><blockquote>As airlines pack seats tighter than ever, the tests supposed to show that passengers can get out alive in a crash are woefully out of date. The FAA won’t make the results public, and a court warns there is “a plausible life-and-death safety concern.”<br /><br />The tests carried out to ensure that all the passengers can safely exit a cabin in an emergency are dangerously outdated and do not reflect how densely packed coach class seating has become—or how the size of passengers has simultaneously increased.<br /><br />No coach class seat meets the Department of Transportation’s own standard for the space required to make a flight attendant’s seat safe in an emergency.<br /><br />Neither Boeing nor the Federal Aviation Administration will disclose the evacuation test data for the newest (and most densely seated) versions of the most widely used jet, the Boeing 737.</blockquote>For instance, you've probably seen the picture in the safety card showing the "crash position":<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-I9gj7MIN5W8/WbqkWbP8ndI/AAAAAAAAJPM/ScVFr6VyNJECUMcJ_QXXfCdys3AjNLX-QCLcBGAs/s1600/united-airlines-safety-card.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="241" data-original-width="732" height="105" src="https://2.bp.blogspot.com/-I9gj7MIN5W8/WbqkWbP8ndI/AAAAAAAAJPM/ScVFr6VyNJECUMcJ_QXXfCdys3AjNLX-QCLcBGAs/s320/united-airlines-safety-card.jpg" width="320" /></a></div><br />That position, also known as the "brace position," is intended to reduce head and spine trauma during a crash. But to keep from hitting your head on the seat in front, it requires about 35 inches of headroom. The average amount of space today is more like 31 inches, and on some planes, it's as low as 28 inches.<br /><br />More passengers in a small space also means evacuations take longer, which can be the difference between life and death if, for instance, the plane catches on fire.<br /><br />Yes, crashes are rare. <i>Very</i> rare. The problem is that, if a crash occurs, the probability that everyone can get out alive is unknown, at least to the public.<br /><br />But we may know soon:<br /><blockquote>In a case brought by the non-profit activist group Flyers Rights and heard by the U.S. Court of Appeals for the District of Columbia Circuit, a judge said there was “a plausible life-and-death safety concern” about what is called the “densification” of seats in coach. The court ordered the Federal Aviation Administration to respond to a petition filed by Flyers Rights to promulgate new rules to deal with safety issues created by shrinking seat sizes and space in coach class cabins.<br /><br />The court gave the FAA until Dec. 28 to respond.</blockquote>http://www.deeplytrivial.com/2017/09/why-statistics-on-airline-safety-are.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-4887952533217079093Wed, 13 Sep 2017 16:09:00 +00002017-09-13T11:09:02.408-05:00math and statisticssciencestatistical sinsStatistical Sins: Smoking, E-Cigarettes, and ContaminationA few years ago, it seems like every smoker I knew "quit" smoking by taking up vaping. I held my tongue and didn't point out that they hadn't actually quit <i>smoking</i>; they quit smoking cigarettes and had started smoking something else. And despite assurances that vaping was perfectly safe, I was skeptical.<br /><br />Since then, I've waited for research to show whether vaping is in fact safer. I've heard about the potential risks of vaping, such as the potential for bacterial infections for failing to clean or change the filter, as well as potential carcinogens. But I haven't seen anything more definitive - no large-scale clinical studies.<br /><br />Today, I learned some of the reasons for the dearth of research into the safety of e-cigarettes. As reported in the Guardian by Dr. Rebecca Richmond and Jasmine Khouja, a lack of e-cigarette <i>only</i> smokers as well as propaganda have <a href="https://www.theguardian.com/science/sifting-the-evidence/2017/sep/13/e-cigarette-science-is-scaremongering-hampering-research-opportunities">made it difficult to recruit for their research</a>:<br /><blockquote>A recent detailed study of over 60,000 UK 11-16 year olds has found that young people who experiment with e-cigarettes are usually those who already smoke cigarettes, and even then experimentation mostly doesn’t translate to regular use. Not only that, but smoking rates among young people in the UK are still declining. Studies conducted to date investigating the gateway hypothesis that vaping leads to smoking have tended to look at whether having ever tried an e-cigarette predicts later smoking. But young people who experiment with e-cigarettes are going to be different from those who don’t in lots of other ways – maybe they’re just more keen to take risks, which would also increase the likelihood that they’d experiment with cigarettes too, regardless of whether they’d used e-cigarettes.<br /><br />But e-cigarettes have really divided the public health community, with researchers who have the common aim of reducing the levels of smoking and smoking-related harm suddenly finding themselves on opposite sides of the debate. This is concerning, and partly because in a relative dearth of research on the devices the same findings are being used by both sides to support and criticise e-cigarettes. And all this disagreement is playing out in the media, meaning an unclear picture of what we know (and don’t know) about e-cigarettes is being portrayed, with vapers feeling persecuted and people who have not yet tried to quit mistakenly believing that there’s no point in switching, as e-cigarettes might be just as harmful as smoking.</blockquote>So the statistical sin here isn't really something the researchers have done (or didn't do). It's an impossibility created by confounds. How does one recruit people who have only smoked e-cigarettes or who at least have very little experience with regular cigarettes? What's happening here is really an issue of contamination - a threat to validity that occurs when the treatment of one group works its way into another group. Specifically, it's a threat to <i>internal </i>validity - the degree to which our study can show that our independent variable causes our dependent variable. In smoking research, internal validity is already lowered, because we can't randomly assign our independent variable. We can't assign certain people to smoke; that would be unethical. Years and years of correlational research into smoking has provided enough evidence that we now say "smoking causes cancer." But technically, we would need randomized controlled trials to say that definitively. <br /><br />That's not to say I don't believe there is a causal link between smoking and negative health outcomes like cancer. But that the low level of internal validity has provided fuel for people with an agenda to push (i.e., people who have ties to the tobacco industry or who otherwise financially benefit from smoking). Are we going to see the same debate play out regarding e-cigarettes? Will we have to wait just as long for enough evidence to accrue before we can say something definitive about e-cigarettes?<br /><br />And for the here and now, how can one control the contamination of experience with regular cigarettes in the vaping group? And if this area of research really has become such a hot button issue, what kind of research would people on either side of the issue be willing to accept?<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-_VmYn6PXGkQ/WblXMTHMJKI/AAAAAAAAJOI/dFoUMWa0f10RttfQxGFGZV9W5OO5DVuxACLcBGAs/s1600/smoke-2592482.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="862" data-original-width="1600" height="172" src="https://4.bp.blogspot.com/-_VmYn6PXGkQ/WblXMTHMJKI/AAAAAAAAJOI/dFoUMWa0f10RttfQxGFGZV9W5OO5DVuxACLcBGAs/s320/smoke-2592482.jpg" width="320" /></a></div>http://www.deeplytrivial.com/2017/09/statistical-sins-smoking-e-cigarettes.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-2162190656025743010Tue, 12 Sep 2017 18:47:00 +00002017-09-12T13:47:52.772-05:00bookscelebritiespoliticsIs Anybody Listening?As a follow-up to my <a href="http://www.deeplytrivial.com/2017/09/two-new-books-to-check-out.html">post earlier today</a> about Hillary Clinton's new book, <i><a href="https://www.amazon.com/What-Happened-Hillary-Rodham-Clinton/dp/1501175564/ref=pd_cp_14_1?_encoding=UTF8&pd_rd_i=1501175564&pd_rd_r=B61KFYVXZ34ZAN3EQD55&pd_rd_w=Lm3rR&pd_rd_wg=qkqwi&psc=1&refRID=B61KFYVXZ34ZAN3EQD55">What Happened</a></i>, here's an article from FiveThirtyEight, in which Walt Hickey<a href="https://fivethirtyeight.com/features/politicians-write-lots-of-books-heres-how-far-into-them-people-read/"> examines whether people actually read (and finish) these books</a> using data from Audiobooks.com:<br /><blockquote>I was curious how far readers typically make it through these books. I couldn’t get any reading data, but I reached out to <a href="https://www.audiobooks.com/">Audiobooks.com</a> for listening data on political memoirs by presidential candidates going back to the 2000 election. We focused on books by every “serious” candidate published before or shortly after each presidential election — “serious” as defined by my colleague Harry Enten <a href="https://fivethirtyeight.com/datalab/if-hillary-clinton-runs-for-president-when-might-she-announce/">back before the 2016 election</a>. (Basically, any candidate who held a major political office before running or got at least a bit of the vote in Iowa or New Hampshire.) Audiobooks.com was able to find the books with more than 10 downloads and sent over the average percentage of the book that listeners sat through.<br /><br />Before we get to the data, there are some caveats! If you don’t see a book on here, remember that not all books have an audio version widely available. Moreover, sales for some of these books peaked long before Audiobooks.com began collecting data. Second, if a completion rate number seems low, keep in mind that most people don’t finish reading most things. Most likely, less than half of the people who started this article made it to this sentence. It’s the nature of the game.</blockquote>The book with the highest proportion listened to was John McCains's <i>Faith of My Fathers</i>, for which the average completion rate was 74.7%. But part of that high completion rate was the total length of the book: this audiobook could be completed in 4.8 hours, and the average user listened to 3.6 hours. The book with the longest time listened to was George W. Bush's <i>Decision Points</i>, which had a 59.7% completion rate that corresponded to 12 (of 20.2) hours. <br /><br />The difference between proportion and time can also be seen in the books with the lowest averages. While Rand Paul's <i>Taking a Stand </i>had the lowest completion rate of 30.5%, Hillary Clinton's <i>It Takes a Village </i>had the lowest listen time at 1.3 hours (which corresponded to 48.2% of the book).<br /><br />It begs the question, "What are words for if no one listens anymore?"<br /><br /><div style="text-align: center;"><iframe allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/IasCZL072fQ" width="560"></iframe></div>http://www.deeplytrivial.com/2017/09/is-anybody-listening.htmlnoreply@blogger.com (Sara)2tag:blogger.com,1999:blog-4594832939334410220.post-5583700149160881888Tue, 12 Sep 2017 16:13:00 +00002017-09-12T11:13:30.128-05:00booksTwo New Books to Check OutThis morning, as I run queries in our exam results database, I've been listening to a few podcasts, through which I found out about two new books - one that came out a week ago and the other that came out today:<br /><br /><i><a href="https://www.amazon.com/Fantasyland-America-Haywire-500-Year-History/dp/1400067219">Fantasyland: How America Went Haywire: A 500-Year History</a></i> by Kurt Andersen explores our current fake news/alternative facts culture and how we got here. You can hear more about the book on the <a href="https://www.nypl.org/blog/2017/09/10/podcast-181-kurt-andersen">New York Public Library podcast</a>, and in a <a href="https://www.theatlantic.com/notes/all/2017/07/postcards-from-fantasyland/534552/">cover story from the Atlantic</a> that came out a little over a month ago.<br /><br /><i><a href="https://www.amazon.com/What-Happened-Hillary-Rodham-Clinton/dp/1501175564/ref=pd_cp_14_1?_encoding=UTF8&pd_rd_i=1501175564&pd_rd_r=B61KFYVXZ34ZAN3EQD55&pd_rd_w=Lm3rR&pd_rd_wg=qkqwi&psc=1&refRID=B61KFYVXZ34ZAN3EQD55">What Happened</a></i> by Hillary Rodham Clinton just came out today. You can hear more about the book (from Hillary Clinton herself) in today's NPR Up First podcast:<br /><br /><iframe frameborder="0" height="290" scrolling="no" src="https://www.npr.org/player/embed/550359250/550362362" title="NPR embedded audio player" width="100%"></iframe>And here's a <a href="https://www.washingtonpost.com/outlook/clintons-account-of-how-she-was-shivved-in-the-2016-presidential-election/2017/09/11/f6740438-957f-11e7-89fa-bb822a46da5b_story.html?utm_term=.a6888d11e6f2">review of the book</a> from the Washington Post.http://www.deeplytrivial.com/2017/09/two-new-books-to-check-out.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-1255006815780188345Mon, 11 Sep 2017 13:00:00 +00002017-09-11T08:00:13.637-05:00BtVSliststrivialTrivial Only Post: "Friends"-Style Names for "Buffy" EpisodesSomeone has gone through every BtVS (Buffy the Vampire Slayer) episode and <a href="https://www.buzzfeed.com/bprofitt/if-buffy-episodes-were-titled-like-friends-epi-v954">renamed them Friends style</a>. Here are a few of my favorites:<br /><br />2. The One Where Cordelia Presses Deliver<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://img.buzzfeed.com/buzzfeed-static/static/2015-02/25/20/enhanced/webdr03/original-17693-1424915198-8.jpg?downsize=715:*&output-format=auto&output-quality=auto" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="540" data-original-width="715" height="242" src="https://img.buzzfeed.com/buzzfeed-static/static/2015-02/25/20/enhanced/webdr03/original-17693-1424915198-8.jpg?downsize=715:*&output-format=auto&output-quality=auto" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Admit it: This made you love Willow even more.</td></tr></tbody></table>19. The One Where Everyone Knows How Vampires Dress<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://img.buzzfeed.com/buzzfeed-static/static/2015-02/25/22/enhanced/webdr12/enhanced-4020-1424920170-2.jpg?downsize=715:*&output-format=auto&output-quality=auto" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="238" data-original-width="358" height="212" src="https://img.buzzfeed.com/buzzfeed-static/static/2015-02/25/22/enhanced/webdr12/enhanced-4020-1424920170-2.jpg?downsize=715:*&output-format=auto&output-quality=auto" width="320" /></a></div><br />36. The One Where They Kill the Cat<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://img.buzzfeed.com/buzzfeed-static/static/2015-02/26/10/enhanced/webdr09/anigif_enhanced-19513-1424966006-19.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="238" data-original-width="500" height="152" src="https://img.buzzfeed.com/buzzfeed-static/static/2015-02/26/10/enhanced/webdr09/anigif_enhanced-19513-1424966006-19.gif" width="320" /></a></div><br />81. The One Where Xander Does the Snoopy Dance<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://img.buzzfeed.com/buzzfeed-static/static/2015-02/26/13/enhanced/webdr12/anigif_enhanced-27976-1424973956-27.gif?downsize=715:*&output-format=auto&output-quality=auto" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="217" data-original-width="300" height="231" src="https://img.buzzfeed.com/buzzfeed-static/static/2015-02/26/13/enhanced/webdr12/anigif_enhanced-27976-1424973956-27.gif?downsize=715:*&output-format=auto&output-quality=auto" width="320" /></a></div><br />98. The One Where They Chase the Knights of Byzantine<br />Alternate title: "The One Where They Almost Jumped the Shark" <br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://img.buzzfeed.com/buzzfeed-static/static/2015-02/26/13/enhanced/webdr05/enhanced-8780-1424975126-2.jpg?downsize=715:*&output-format=auto&output-quality=auto" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="232" data-original-width="406" height="182" src="https://img.buzzfeed.com/buzzfeed-static/static/2015-02/26/13/enhanced/webdr05/enhanced-8780-1424975126-2.jpg?downsize=715:*&output-format=auto&output-quality=auto" width="320" /></a></div><br />116. The One Where Buffy Juggles Oranges<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://img.buzzfeed.com/buzzfeed-static/static/2015-02/26/16/enhanced/webdr10/original-17217-1424984434-8.jpg?downsize=715:*&output-format=auto&output-quality=auto" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="401" data-original-width="715" height="179" src="https://img.buzzfeed.com/buzzfeed-static/static/2015-02/26/16/enhanced/webdr10/original-17217-1424984434-8.jpg?downsize=715:*&output-format=auto&output-quality=auto" width="320" /></a></div><br />133. The One Where Here Endeth the Lesson<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://img.buzzfeed.com/buzzfeed-static/static/2015-02/26/16/enhanced/webdr12/anigif_enhanced-22033-1424985876-13.gif?downsize=715:*&output-format=auto&output-quality=auto" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="160" data-original-width="245" height="209" src="https://img.buzzfeed.com/buzzfeed-static/static/2015-02/26/16/enhanced/webdr12/anigif_enhanced-22033-1424985876-13.gif?downsize=715:*&output-format=auto&output-quality=auto" width="320" /></a></div>http://www.deeplytrivial.com/2017/09/trivial-only-post-friends-style-names.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-8869391371469919661Sun, 10 Sep 2017 13:00:00 +00002017-09-18T12:59:35.087-05:00grad schoolmath and statisticssocial psychologyStatistics SundayStatistics Sunday: What Are Degrees of Freedom? (Part 1)I've had a few requests to write a post on degrees of freedom, an essential but not always well understood concept in statistics. Degrees of freedom are used in many statistical tests; they help you identify which row of a table you should use to find critical values and you probably had to calculate your degrees of freedom for multiple tests in introductory statistics. But there's rarely an explanation of what degrees of freedom are and why this concept is so important.<br /><br />There are a couple ways you can think of degrees of freedom. For today's post, I'll present the typical way. But there's an additional way to understand degrees of freedom, that I'll write about next week. (I'm mostly splitting it up because of concerns about length, but I also find the idea of a cliffhanger statistics post quite entertaining! Edit: And now you don't even have to wait. You can find part 2 <a href="http://www.deeplytrivial.com/2017/09/statistics-sunday-what-are-degrees-of_17.html">here</a>.)<br /><br />First up, the most literal definition of degrees of freedom - they are the number of values that are free to vary in calculating statistics. Let's say I tell you I have a sample of 100 people:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-y89vnZ0Vw3Y/WbQJmwS7ERI/AAAAAAAAJL0/CIx0OWeYrFgyZCc1lbftnaa-nbmPiMZlgCLcBGAs/s1600/lego_crowd.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="832" data-original-width="1600" height="166" src="https://2.bp.blogspot.com/-y89vnZ0Vw3Y/WbQJmwS7ERI/AAAAAAAAJL0/CIx0OWeYrFgyZCc1lbftnaa-nbmPiMZlgCLcBGAs/s320/lego_crowd.jpg" width="320" /></a></div><br />For each one, I have their score on the final exam of a traditional statistics course. Let's say I also tell you the <a href="http://www.deeplytrivial.com/2017/04/d-is-for-descriptive-statistics.html">mean</a> of those 100 scores is 80.5. Can you recreate my sample based on that information alone?<br /><br />You can't. Because there are many different configurations that could produce a mean of 80.5. It could be that half the sample had a score of 80 and the other half had a score of 81. And that's just one possibility.<br /><br />So then I ask you, "How many scores do I need to give you before you can recreate my sample?"<br /><br />The answer is 99. If you have 99 of the scores from the sample, you can figure out the last one, because that one is now determined. It can't be just any number; it has to be a number that results in a mean of 80.5 when combined with the 99 scores I gave you. That last value is no longer free to vary.<br /><br />Let's keep going with that sample of test scores. I have 100 scores from people who took a basic statistics course. Now, let's say I have an additional 100 scores from people who took a combined statistics and methods course.<br /><br />This is one of my pedagogical soap boxes: statistics should never be taught in isolation, but in the context of the type of research related to one's major. So I would hypothesize that my psychology majors who took a course that combined stats and methods will understand statistics better, and do better on the test, than students who took only statistics.<br /><br /><b>Full disclosure:</b> My undergrad did a 2-semester combined stats and methods course with a lab component. I'm probably biased when I say that's how it should be done. I certainly wasn't an expert when I got out of that course, or even when I finished grad school; my current level of knowledge comes from years of practice, reading, and thinking. But I feel I had a much better understanding of statistics when I got to grad school than many of my classmates, so I had a solid foundation on which to build.<br /><br />We'll keep the mean for the traditional stats course as 80.5. For the stats + methods group, let's say their mean is 87.5. I would compare these two means with each other using a <a href="http://www.deeplytrivial.com/2017/04/t-is-for-t-test.html">t-test</a>. But first, let's figure out our degrees of freedom. You already know that the degrees of freedom for the traditional course group is 99. How many degrees of freedom do we have for the stats + methods group? Also 99. All but that last score I give you is free to vary. So that gives us a total degrees of freedom of 198.<br /><br />"But wait!" you say. "When I took introductory statistics, there was a formula to determine degrees of freedom for a t-test." And you would be right. That formula is N-2. I have 100 in each group, for a total of 200, and 200-2 would be 198. You'll find for many statistics involving 2 group comparisons (t-test, <a href="http://www.deeplytrivial.com/2017/04/r-is-for-r-correlation.html">correlation coefficient</a>), the degrees of freedom would be N-2. And that's because 99 of the scores in each group are free to vary.<br /><br />But there's another way, a more conceptual way that gets at why degrees of freedom is important. That way of thinking becomes very helpful when determining degrees of freedom for <a href="http://www.deeplytrivial.com/2017/05/statistics-sunday-analysis-of-variance.html">ANOVA</a>. Tune in next week for the <a href="http://www.deeplytrivial.com/2017/09/statistics-sunday-what-are-degrees-of_17.html">exciting conclusion</a>!<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://www.deeplytrivial.com/search/label/Statistics%20Sunday" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1086" data-original-width="1600" height="217" src="https://1.bp.blogspot.com/-vqodBgX1BcQ/WbQO_QZUWZI/AAAAAAAAJMI/baxeQjuDuQYAWZrqvkHtM2wC8UtXsuqRgCLcBGAs/s320/statistics_sunday.jpg" width="320" /></a></div>http://www.deeplytrivial.com/2017/09/statistics-sunday-what-are-degrees-of.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-9090298087679995860Fri, 08 Sep 2017 14:51:00 +00002017-09-08T14:38:41.124-05:00cognitive biasesgenderreviewsciencesocial psychologyIs Networking Overrated?This morning, I received my Friday email from the <a href="https://www.psychologicalscience.org/">Association for Psychological Science</a>, which includes links to media coverage of psychological science. The very first headline caught my eye: <a href="https://www.nytimes.com/2017/08/24/opinion/sunday/networking-connections-business.html?_r=0">Good News for Young Strivers: Networking Is Overrated</a>. As someone who dislikes networking, I was pleased to read in the first paragraph that many people find it distasteful - so distasteful that research shows <a href="http://journals.sagepub.com/doi/abs/10.1177/0001839214554990?journalCode=asqa">people actually feel physically dirty</a> after visualizing themselves networking. In fact, the column - written by Adam Grant, a professor at Wharton School - argues that networking doesn't help you accomplish great things. Rather, accomplishing great things helps you build a network:<br /><blockquote>Look at big breaks in entertainment. For George Lucas, a turning point was when Francis Ford Coppola hired him as a production assistant and went on to mentor him. Mr. Lucas didn’t schmooze his way into the relationship, though. As a film student he’d won first prize at a national festival and a scholarship to be an apprentice on a Warner Bros. film — he picked one of Mr. Coppola’s.<br /><br />Networks help, of course. In a study of <a href="http://journals.sagepub.com/doi/abs/10.2189/asqu.53.4.685">internet security start-ups</a>, having a previous connection to an investor increased the odds of getting funded by that investor in the first year. But it was pretty much irrelevant afterward. Accomplishments were the dominant driver of who invested over time.</blockquote>But as I got farther and farther along in the article, I couldn't help but think that he was making it sound too easy. I don't mean that working hard and being successful is easy. But it felt like he was brushing off all the people who lacked connections and the resources that some people are born into by saying, "Well, just be successful and the network will come to you." I couldn't completely figure out why I was so bothered by his article. Then he said this:<br /><blockquote>I don’t mean to suggest that success in any field is meritocratic. It’s dramatically easier to get credit for achievements and break into the elite if you’re <a href="https://www.nytimes.com/2015/01/11/opinion/sunday/speaking-while-female.html?_r=0">male</a> and <a href="http://www.annualreviews.org/doi/abs/10.1146/annurev.soc.32.061604.123127">white</a>, your pedigree is full of <a href="http://journals.sagepub.com/doi/abs/10.2189/asqu.51.2.169">fancy degrees</a> and <a href="http://onlinelibrary.wiley.com/doi/10.1002/smj.2272/full">prestigious employers</a>, you come from a family with <a href="http://journals.sagepub.com/doi/abs/10.2189/asqu.2010.55.2.278">wealth and connections</a>, and you speak <a href="http://psycnet.apa.org/record/2013-28924-001">without a foreign accent</a>. (Unless it’s a British accent, which has the uncanny ability to make you <a href="http://www.sciencedirect.com/science/article/pii/S0388000183800217">sound smart</a> regardless of what words come out of your mouth.) But if you lack these status signals, it’s even more critical to produce a portfolio that proves your potential.</blockquote>And that's when I realized what was bothering me. Sure, it's easy to say that if you don't have any of these privileged characteristics, you just need to work harder - something minorities and women have been hearing for a very long time. The problem is that 1) "success" and "achievement" are very subjective terms, and people may evaluate your achievements differently depending on your characteristics and 2) getting your achievements noticed also depends on your network. It's as though Professor Grant thinks there's a place where the powerful can go and peruse all the portfolios of young and successful people.<br /><br />And sure, there are situations like that - his anecdote about George Lucas and the national festival is one place where you can go and see young people's work in the hopes of finding an up and coming director. But getting into film school, having the resources and training to create a product that gets attention, and then getting that attention from the judges are influenced by a person's background and privilege.<br /><br />It feels as though Professor Grant is himself falling prey to the "myth of the self-made man." Anyone who says he (or she) is "self-made" is completely downplaying the influence of an environment conducive to success. Just like Donald Trump likes to downplay the financial help he received from his father to start his business.<br /><br />In fact, here's a great example of how people evaluate success differently for young and hopeful entrepreneurs. Penelope Gazin and Kate Dwyer <a href="https://www.fastcompany.com/40456604/these-women-entrepreneurs-created-a-fake-male-cofounder-to-dodge-startup-sexism">created a fake male cofounder</a> to help launch their startup:<br /><blockquote>“When we were getting started, we were immediately faced with ‘Are you sure? Does this sound like a good idea?’,” says Dwyer. “I think because we’re young women, a lot of people looked at what we were doing like, ‘What a cute hobby!’ or ‘That’s a cute idea.'” <br /><br />Regardless, the concept seems to be paying off. <a href="http://witchsy.com/">Witchsy</a>, the alternative, curated marketplace for bizarre, culturally aware, and dark-humored art, celebrated its one-year anniversary this summer. The site, born out of frustration with the excessive clutter and limitations of bigger creative marketplaces like Etsy, peddles enamel pins, shirts, zines, art prints, handmade crafts and other wares from a stable of hand-selected artists. Witchsy eschews the “Live Laugh Love” vibe of knickknacks commonly found on sites like Etsy in favor of art that is at once darkly nihilistic and lightheartedly funny, ranging in spirit from fiercely feminist to obscene just for the fun of it. <br /><br />After setting out to build Witchsy, it didn’t take long for them to notice a pattern: In many cases, the outside developers and graphic designers they enlisted to help often took a condescending tone over email. These collaborators, who were almost always male, were often short, slow to respond, and vaguely disrespectful in correspondence. In response to one request, a developer started an email with the words “Okay, girls…” <br /><br />That’s when Gazin and Dwyer introduced a third cofounder: Keith Mann, an aptly named fictional character who could communicate with outsiders over email. <br /><br />“It was like night and day,” says Dwyer. “It would take me days to get a response, but Keith could not only get a response and a status update, but also be asked if he wanted anything else or if there was anything else that Keith needed help with.”</blockquote>It wasn't enough to have a good idea. It wasn't enough to prove they had the ability to execute it. It literally took emails with a man's name attached to get their business going. Success is never achieved in a vacuum, and even if it were, those characteristics Professor Grant highlights as making it "easier to get credit" can influence whether that vacuum is beneficent or hostile.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-gtdDasi2eM0/WbKuQ3jaAuI/AAAAAAAAJKk/WmRbfAOka1U0_ivqVbp6O9GDZU6d0sKigCLcBGAs/s1600/to-reach-2697951.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="681" data-original-width="1600" height="136" src="https://3.bp.blogspot.com/-gtdDasi2eM0/WbKuQ3jaAuI/AAAAAAAAJKk/WmRbfAOka1U0_ivqVbp6O9GDZU6d0sKigCLcBGAs/s320/to-reach-2697951.jpg" width="320" /></a></div>http://www.deeplytrivial.com/2017/09/is-networking-overrated.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-7316696985586460887Thu, 07 Sep 2017 13:00:00 +00002017-09-07T08:00:28.255-05:00politicssocial psychologyGame Theory and the Situation in North Korea: Time To Rethink Our Models?A <a href="https://fivethirtyeight.com/features/how-to-win-a-nuclear-standoff/">recent article by Oliver Roeder</a> from FiveThirtyEight starts off with a game:<br /><blockquote>Imagine that a crisp $100 bill lies on a table between us. We both want it, of course, but there’s no chance of splitting it — our wallets are empty. So we vie for it according to a few simple rules. We’ll each write down a secret number — between 0 and 100 — and stick that number in an envelope. When we’re both done, we’ll open the envelopes. Whichever of us wrote down the higher number pockets the $100. But here’s the catch: There’s a percentage chance that we’ll each have to burn $10,000 of our own money, and that chance is equal to the lower of the two numbers. <br /><br />So, for example, if you wrote down 10 and I wrote down 20, I’d win the $100 … but then we’d both run a 10 percent risk of losing $10,000. This is a competition in which, no matter what, we both end up paying a price — the risk of disaster. <br /><br />Now imagine that you’re playing the same game, but for much more than $100. You’re a head of state facing off against another, and the risk you run is a small chance of nuclear war. The $100 prize becomes the concession of some international demand — a piece of disputed territory, say — while the $10,000 potential cost becomes untold death and destruction, nuclear winter and the very fate of our species and planet. <br /><br />How would you play the game then?</blockquote>This kind of game has been played before. In fact, this particular game is one example used in game theory, an approach to understanding conflicts and cooperation. And the application of game theory to war situations (and especially to nuclear war) comes from Thomas Schelling, a Nobel laureate and economist who worked for the Truman administration - Truman, as you may recall, was the President who made the decision to drop two nuclear weapons on Japan, the only President to drop nuclear bombs on another country as an act of war.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://espnfivethirtyeight.files.wordpress.com/2017/09/nuke_abomb_truman_combo_f.jpg?w=1024&quality=100&strip=info" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="379" data-original-width="800" height="152" src="https://espnfivethirtyeight.files.wordpress.com/2017/09/nuke_abomb_truman_combo_f.jpg?w=1024&quality=100&strip=info" width="320" /></a></div><br />Schelling even wrote a book on his economic analysis of incentives, behaviors, and consequences - for which you can find <a href="http://elcenia.com/iamapirate/schelling.pdf">full text here</a>.<br /><br />Nuclear weapons change the nature of the game. Before, the size and skill of the nation's army had a strong impact on whether it would win a battle, and eventually the war. There isn't a perfect relationship, of course, and there is a lot to say for strategy and the selection of allies. But if two countries have nuclear weapons, they can do immense damage to the other, even if they are in terms of manpower smaller and weaker.<br /><br />So playing the game of war with nuclear weapons becomes less about the accuracy of the archers, and more about convincing the other country's leader that you're a risk taker who will not hesitate to launch some nukes. What happens when both leaders do this?<br /><blockquote>Game theory isn’t a perfect vessel. There are more than two players in today’s nuclear standoff, for example. China, South Korea, Japan and Russia all have their own $100 bills to gain. And, of course, the people involved in any game aren’t necessarily always rational. We often hear that Kim Jong Un, for example, is a madman — “We can’t let a madman with nuclear weapons let on the loose like that,” Trump once <a href="https://theintercept.com/2017/05/23/read-the-full-transcript-of-trumps-call-with-philippine-president-rodrigo-duterte/">told</a> the president of the Philippines. <br /><br />But theorists do not begin their work from the premise that people or states are hyper-calculating rationality machines. Rather, they start from the reasonable notion that people respond to incentives. And madman or not, Kim will certainly respond to the disincentive of his country being blown to kingdom come.</blockquote>Let's hope for all our sakes that both Kim and Trump will respond to those disincentives.http://www.deeplytrivial.com/2017/09/game-theory-and-situation-in-north.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-6461673985399179425Wed, 06 Sep 2017 16:21:00 +00002017-09-06T11:21:17.281-05:00math and statisticspoliticsstatistical sinsStatistical Sins: Bad Polling, Cherry-Picking, and TrustOpinion polling has been used in politics for decades. In fact, the first known example dates from 1824, when Andrew Jackson went up against John Quincy Adams for the Presidency. The poll ended up correctly predicting that Jackson would win, and as a result, opinion polling became more popular. Today, it is a huge industry that many have gotten into - and that's not necessarily a good thing.<br /><br />Two weeks ago, FiveThirtyEight’s <a href="https://fivethirtyeight.com/features/fake-polls-are-a-real-problem/">Harry Enten discussed</a> a highly publicized, though likely fake, poll that showed musician Kid Rock as a strong contender for US Senator of Michigan. (By the way, the link to the original poll results is <a href="https://delphianalytica.org/2017/07/24/kid-rock-ahead-in-hypothetical-matchup-with-debbie-stabenow-large-number-of-voters-are-undecided/">no longer available</a>.)<br /><br />Fake polls are certainly problematic when they're used to frame and support arguments. As Enten points out, poll results can influence donors and voters, making polling results a sort of self-fulfilling prophecy. And as internet surveys become easier to conduct, <a href="http://nymag.com/daily/intelligencer/2017/08/the-rising-tide-of-shoddy-polls.html">the number of poorly conducted polls is likely to increase</a>:<br /><blockquote>There is no power on earth that can or will keep politicians from cherry-picking polling data to get press, raise donations, or simply boost internal campaign morale. The bad news Enten shares is that the number of “pollsters” generating questionable data is rising, and could rise even more in the very near future. <br /><br />Some political observers will prove to be suckers for phony or slanted polls, while others will throw the baby out with the bathwater and refuse to look at polling data altogether, relying instead on disingenuous “insider” assessments of elections or, worse yet, making anecdotes a substitute for analysis (anyone can sound superficially informed by talking to preselected groups of voters and gleaning predictable insights).</blockquote>That quote above comes from a New York Magazine article discussing Enten's work. But the conclusion they draw from the rise of bad polling is very likely: people will respond to poll results that turn out to be inaccurate by simply disbelieving any poll results. In fact, we've probably already begun to see it in part due to the fact that, <a href="http://www.deeplytrivial.com/2016/11/on-polls-and-probability.html">though polls had Hillary Clinton winning the Presidency, Trump won instead</a>.<br /><br />Yes, it's happened before:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-pJ7T6Gz8c2Y/WbAduSapUNI/AAAAAAAAJHQ/BO0bvr0h8-8GoGuOtYo9M3v7i2SttcqIACLcBGAs/s1600/o-DEWEY-DEFEATS-TRUMAN.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="419" data-original-width="570" height="235" src="https://2.bp.blogspot.com/-pJ7T6Gz8c2Y/WbAduSapUNI/AAAAAAAAJHQ/BO0bvr0h8-8GoGuOtYo9M3v7i2SttcqIACLcBGAs/s320/o-DEWEY-DEFEATS-TRUMAN.jpg" width="320" /></a></div><br />And yes, we've always had a problem with politicians cherry-picking poll results to only pay attention to and present polls that are favorable to them. And, oh by the way, yes, we have a <a href="http://www.deeplytrivial.com/2015/11/the-importance-of-scientific-literacy.html">numerical and scientific literacy</a> problem in this country, that goes back quite a ways - certainly before Trump won the Presidency.<br /><br />The problem is that now we have a person in the White House who not only cherry-picks poll results - which has to be challenging considering how <a href="http://www.gallup.com/poll/217346/trump-job-approval-stabilizing-lower-level.aspx?utm_source=alert&utm_medium=email&utm_content=morelink&utm_campaign=syndication">poorly he does in most polls</a> - but has actually said he only believes results that are favorable to him. Verbalizing that issue combined with these examples of bad polls popping up more often is likely going to give people without the knowledge of what makes a poll good or bad license to do the same as Trump. At the very least, bad polling is going to hurt the public's trust in polling in general, even outside of politics.<br /><br />It appears some are recognizing that Trump's attitude toward polls reflects <a href="https://www.washingtonpost.com/news/monkey-cage/wp/2017/09/01/even-republicans-are-starting-to-doubt-trumps-competence/?utm_term=.26fd06d12938">his lack of competence</a> as opposed to problems with the nature of polling. But among his base - that is, the people who go along with him the majority of the time - we're likely to continue to see lack of trust in polling, cherry-picking of results, and a blurred line between opinion and fact.http://www.deeplytrivial.com/2017/09/statistical-sins-bad-polling-cherry.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-8433298104977331165Tue, 05 Sep 2017 19:37:00 +00002017-09-05T14:37:50.374-05:00bookswritingShow Your Library Some LoveSeptember is <a href="http://www.ala.org/conferencesevents/celebrationweeks/card">Library Card Sign-Up Month</a>!<br /><blockquote>This September, crimefighting DC Super Heroes, the Teen Titans, will team up with the American Library Association (ALA) to promote the value of a library card. As honorary chairs, DC’s Teen Titans will remind parents, caregivers and students that signing up for a library card is the first step towards academic achievement and lifelong learning.</blockquote><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-p81Ro3wdAuI/Wa78Q3Mr0VI/AAAAAAAAJGg/IW0kQhA9XwANJFB2GPLLJ8IdBOPPok5agCLcBGAs/s1600/library-card-sign-up-month-facebook-share.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-p81Ro3wdAuI/Wa78Q3Mr0VI/AAAAAAAAJGg/IW0kQhA9XwANJFB2GPLLJ8IdBOPPok5agCLcBGAs/s320/library-card-sign-up-month-facebook-share.jpg" width="320" height="168" data-original-width="1200" data-original-height="630" /></a></div><br />The focus is really on getting teens to use their library, and help build important skills for academics and life, but anyone can join in on the fun. Recently, I picked up an alumni card from Loyola, where I attended grad school, that lets me check out books and also access electronic resources while I'm on campus. And Chicago's <a href="https://www.chipublib.org/locations/34/">Harold Washington Library</a> is one of my favorite places to write. Though I already have all the library cards I need/am eligible to get, I'll definitely be showing my libraries some love this month.<br /><br />How are you planning to celebrate?http://www.deeplytrivial.com/2017/09/show-your-library-some-love.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-7284284301570846867Tue, 05 Sep 2017 16:10:00 +00002017-09-05T12:16:30.729-05:00math and statisticsNumbers in Your LifeLast week, I took to Facebook and LinkedIn to make a small request from my friends and followers in order to get some inspiration for a writing project: What are the numbers you encounter on a regular basis? Not the numbers themselves; rather, the types of numbers they encounter regularly. I gave the example of credit score and calories, and told people a great piece of advice I learned from a friend in my dance class: There are no wrong answers in brainstorming.<br /><br />I feared I'd only get a few responses and that most would consist of, "Yeah, credit score, calories." I ended up getting responses from almost 60 friends, each with great lists of numbers from their personal lives, profession, child-raising, health, and so on.<br /><br />Some examples by category:<br /><ul><li>Finance: Credit score, account balance, interest rate, stock prices</li><li>Health and well-being: Weight, blood glucose, cholesterol, dosages, hours slept, number of minutes of meditation</li><li>Library: Dewey decimal system, number of stacks, linear feet of storage space, barcodes galore</li><li>Alcohol: Beverages consumed, alcohol by volume, IBUs, gravity degrees plato, blood alcohol content</li><li>Fitness tracking: Steps, miles, heart rate, pace, laps, reps</li><li>Healthcare: Number of meds passed, patient satisfaction score, Medicare diagnosis-related group</li><li>Childcare: Number of minutes on iPad, hours slept, diapers, formula measurements</li></ul><div>The descriptions ranged from numbers that count things, numbers that categorize things, and numbers that fall on a continuum. More than one person commented that this was a fun brainstorming exercise. </div><div><br /></div><div>So now, readers, I look to you. What numbers do you encounter regularly? It doesn't have to be every day, just the types of numbers that are part of your life - whether at home, work, the gym, or the bar. And remember: There are no wrong answers in brain storming!</div><div><br /></div><div>(Note: I've checked my comment settings so hopefully they should work for you to add your numbers below. You shouldn't have to have a Google login to comment. You're also welcome to provide your numbers as a reply to this post in Facebook groups.)</div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-Us_w_XX4tBs/Wa7MYeohxjI/AAAAAAAAJGM/6SfaDOOwexQMazcfm-WqfUJKrn3Us_WGgCLcBGAs/s1600/numbers-2614133.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1600" data-original-width="1200" height="320" src="https://2.bp.blogspot.com/-Us_w_XX4tBs/Wa7MYeohxjI/AAAAAAAAJGM/6SfaDOOwexQMazcfm-WqfUJKrn3Us_WGgCLcBGAs/s320/numbers-2614133.jpg" width="240" /></a></div><div><br /></div>http://www.deeplytrivial.com/2017/09/numbers-in-your-life.htmlnoreply@blogger.com (Sara)4tag:blogger.com,1999:blog-4594832939334410220.post-3535405353663212716Sun, 03 Sep 2017 13:00:00 +00002017-09-03T09:33:27.407-05:00math and statisticsStatistics SundaytrivialStatistics Sunday: Everyone Loves a Log (Odds Ratio)A couple weeks ago, I introduced the concept of the <a href="http://www.deeplytrivial.com/2017/08/statistics-sunday-odds-ratios.html">odds ratio</a>, the odds of one outcome relative to another. Odds ratios are often used to present and understand dichotomous outcome data, and researchers using logistic regression - which like <a href="http://www.deeplytrivial.com/2017/06/statistics-sunday-linear-regression.html">linear regression</a>, uses one or more variables to predict an outcome, but unlike linear regression, predicts a dichotomous (not continuous) outcome - will often present results in terms of odds ratios. And odds ratios are used a lot in news stories because they're a bit easier for us to understand: e.g., people who do X are twice as likely to have this outcome than people who do Z. We're naive statisticians, with a rudimentary understanding of gambling, so we have <i>some</i> understanding of odds.<br /><br />The thing about odds ratios - and this issue becomes more pronounced when you're working with a bunch of odds ratios - is that the distribution is not symmetrical, which creates some very interesting results when we look at the inverse odds. For instance, X being twice as likely as Y (odds ratio = 2.0) makes sense. Y being half as likely as X (odds ratio = 0.5) might not make as much sense. But they're the same. And it gets more tricky with other odds ratios. Because something with 50/50 odds will have an odds ratio of 1.0, a more likely outcome A will be greater than 1 and a less likely outcome A will be the inverse, a fraction between 0 and 1. The upper bound of an odds ratio > 1.0 is infinite (∞). And the lower bound of an odds ratio < 1.0 is also infinite, but as a fraction (1/∞) moving asymptotically toward 0.<br /><br />Most people I know who work with odds ratios regularly will - instead of presenting a fraction odds ratio - simply switch the order of the variables in the analysis. But what if you're working with a bunch of odds ratios around an outcome and you know that some will be greater than 1.0 and some will be less than 1.0?<br /><br />It's not an unusual situation. A logistic regression may have multiple predictors, some of which will have a negative coefficient, meaning a less likely outcome A. And if you wanted to do a <a href="http://www.deeplytrivial.com/2017/04/m-is-for-meta-analysis.html">meta-analysis</a> on something with a binary outcome, your effect size will be odds ratio. Some of the analyses you would run in a meta-analysis - such as a special type of regression frequently called meta-regression - won't work so well with variables that have an asymmetric distribution. Meta-regression, which is similar to linear regression but adds an additional weight to outcomes, assumes a continuous linear outcome.<br /><br />But have no fear! There's a solution: the log odds ratio. Here are the basic equations you need:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-mY6jZTVfDSc/WauQgfQgaiI/AAAAAAAAJD0/EWsvNxXUYA0caOIpURDJN5iWvE-saMqGgCLcBGAs/s1600/log_odds_ratio.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="334" data-original-width="646" height="165" src="https://1.bp.blogspot.com/-mY6jZTVfDSc/WauQgfQgaiI/AAAAAAAAJD0/EWsvNxXUYA0caOIpURDJN5iWvE-saMqGgCLcBGAs/s320/log_odds_ratio.jpg" width="320" /></a></div><br />It's a really easy correction. You're simply doing a log-transform of your odds ratio - more specifically, you're taking the natural log of your odds ratio.<br /><br />You can do this in Excel with =LN(oddsratio). And many statistics programs are capable of log-transformations.<br /><br />In SPSS, the syntax is very similar to Excel: COMPUTE log_oddsratio = LN(oddratio). (Or, if you prefer to use the GUI, go to the Transform menu, and click Compute Variable. You can type that text directly in the box, or find the LN function in the Arithmetic function group.)<br /><br />The syntax in R is simply: dataframe$log_oddsratio <- LOG(dataframe$oddsratio)<br /><br />And if you're working in SQL to interact with a relational database, natural log is a mathematical function, usually LOG or LN, depending on which vendor you're using. (For instance, I just wrapped an online course where I learned PostgreSQL, for which the syntax is LN.)<br /><br />Because this is probably better seen than described, I've done the following. I referred back to the 2x2 contingency table from the odds ratio post and decided to test out some different frequencies. This table has 4 cells, so I simplified things a bit. I made it so the placebo group had 50/50 odds of being in remission, so those two cells are both 250. The only thing I changed is the drug group, where I tested values of 1 in the 'in remission' group and 499 in the 'not in remission' group all the way to 499 in remission and 1 not. The odds ratios for those combinations ranged from 0.002 to 499.0 (Note, these extremes would be exceedingly rare - it's very unlikely you'll see an odds ratio over 10, let alone in the 100s. This is purely for demonstration purposes.) When you graph it, it looks like this:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-y7uBFBEOd6I/WaghZHBChYI/AAAAAAAAJAs/zeQWu6CWixoQiVUPVd-1bmyKCbBG_DfUQCLcBGAs/s1600/odds_ratio.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="752" data-original-width="1057" height="228" src="https://3.bp.blogspot.com/-y7uBFBEOd6I/WaghZHBChYI/AAAAAAAAJAs/zeQWu6CWixoQiVUPVd-1bmyKCbBG_DfUQCLcBGAs/s320/odds_ratio.png" width="320" /></a></div><br />When I took the natural log of that array, I had a perfectly symmetrical (though not completely linear - but close enough for the typical range of log odds ratios) -6.21 to +6.21:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-CW9v4xgk3NQ/Waghd5YujsI/AAAAAAAAJAw/t7CTNxMV-GItstwJGZnk4iDtqKn2_eEIwCLcBGAs/s1600/log_odds_ratio.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="749" data-original-width="1052" height="228" src="https://2.bp.blogspot.com/-CW9v4xgk3NQ/Waghd5YujsI/AAAAAAAAJAw/t7CTNxMV-GItstwJGZnk4iDtqKn2_eEIwCLcBGAs/s320/log_odds_ratio.png" width="320" /></a></div><br />The natural log gets rid of the inverse property of odds ratios less than 1.0, and sets the bounds from -∞ to +∞. Now you can analyze those values and, if you want to present summary statistics as odds ratios instead of log odds ratios, you can just convert them back. To do that, you raise the natural number, <i>e</i>, to the power of the log odds ratio.<br /><br />The syntax is EXP and the number you're converting:<br /><br />Excel: =EXP(log_oddsratio)<br /><br />SPSS: COMPUTE oddsratio = EXP(log_oddsratio) or select Exp from the Arithmetic functions in the Transform->Compute Variable dialog box<br /><br />R: dataframe$oddsratio<-EXP(dataframe$log_oddsratio)<br /><br />SQL: EXP(log_oddsratio)<br /><br />So you can play too, if you'd like, here's the <a href="https://www.dropbox.com/s/yyof6shbx2f4fhi/OR_LOR_demo.xlsx?dl=0">Excel file containing the raw data</a> - I've left the functions in, as well as the two charts, so you can change numbers if you'd like to play around.<br /><br />Meta-analysis isn't the only analysis that uses (log) odds ratios. The Rasch measurement model (the psychometric approach I use) is built on log odds ratios. That's part of the magic behind its ability to turn ordinal scales into interval scales of measurement. (More on that later.)<br /><br />BTW, for anyone wondering why I named this post as I did: Hopefully I'm not the only one who remembers the great Slinky parody seen on Ren & Stimpy. (In fact, when I typed "Ren and Stimpy" into Google, it auto-completed with "log," so clearly I'm not.)<br /><br />For review:<br /><br /><div style="text-align: center;"><iframe allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/-fQGPZTECYs" width="560"></iframe></div>http://www.deeplytrivial.com/2017/09/statistics-sunday-everyone-loves-log.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-5632225709082652253Fri, 01 Sep 2017 14:17:00 +00002017-09-01T09:39:10.696-05:00great minds in statisticsmath and statisticsscienceGreat Minds in Statistics: Jerzy Neyman's Confidence IntervalsWednesday, August 30th was the 80-year anniversary of the publication of Jerzy Neyman's article, <i><a href="http://static.stevereads.com/papers_to_read/outline_of_a_theory_of_statistical_estimation_based_on_the_classical_theory_of_probability.pdf">Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability</a></i>. The classical theory refers, in this case, to Neyman's work with <a href="http://www.deeplytrivial.com/2017/08/great-minds-in-statistics-happy.html">E. Pearson</a> on <a href="http://www.deeplytrivial.com/2017/07/statistics-sunday-null-and-alternative.html">null hypothesis significance testing</a> and the concepts of <a href="http://www.deeplytrivial.com/2017/04/a-is-for-alpha.html">Type I</a> and <a href="http://www.deeplytrivial.com/2017/04/b-is-for-beta.html">Type II</a> error. But this paper was groundbreaking, not just in connecting to these concepts, but in telling how we, as statisticians and scientists, should be dealing with uncertainty in the presentation of our results.<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://3.bp.blogspot.com/-SNAuCBQuyO4/WajQyyOtbPI/AAAAAAAAJDA/PAajEFE5nqw_HWnCEjOcCgmWFEL8hdBiQCLcBGAs/s1600/Jerzy_Neyman2.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="800" data-original-width="532" height="320" src="https://3.bp.blogspot.com/-SNAuCBQuyO4/WajQyyOtbPI/AAAAAAAAJDA/PAajEFE5nqw_HWnCEjOcCgmWFEL8hdBiQCLcBGAs/s320/Jerzy_Neyman2.jpg" width="212" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Jerzy Neyman, photographed while he was at UC-Berkley.<br />By Konrad Jacobs, Erlangen, Copyright is MFO - Mathematisches Forschungsinstitut Oberwolfach, <a class="external free" href="http://owpdb.mfo.de/detail?photo_id=3044" rel="nofollow">http://owpdb.mfo.de/detail?photo_id=3044</a>, <a href="http://creativecommons.org/licenses/by-sa/2.0/de/deed.en" title="Creative Commons Attribution-Share Alike 2.0 de">CC BY-SA 2.0 de</a>, <a href="https://commons.wikimedia.org/w/index.php?curid=12356228">Link</a></td></tr></tbody></table>All of this starts with the basic assumption that we are trying to estimate a population parameter, which is unknown. In theoretical work, such as Neyman's paper, this value is often represented as θ (theta). We use a sample to attempt to estimate theta; we can call that estimate T. If we're estimating a mean, we have one measure of precision already included in our analysis - our standard deviation can be thought as a measure of precision of the estimate, in that it expresses the typical variation we see in scores. And in fact, as Neyman notes in his article, prior to his introduction of confidence intervals, people would often present estimates as + or - standard deviation. But, Neyman states, it probably makes more sense conceptually to use a multiple of standard deviation, if you want to express an interval with a high likelihood of containing the actual population value.<br /><br />Why? Because the standard deviation tells us the typical spread of <i>scores </i>(the individual units that make up the mean) but that doesn't tell us the typical (expected) spread of means. Confidence intervals allow you to do that, not just for means but for a variety of aggregate statistics.<br /><br />But I'm perhaps getting ahead of myself. I dug into Neyman's paper to try to summarize it for you. I've only read Neyman's work summarized in the past. In <i><a href="http://www.deeplytrivial.com/2017/08/statistical-sins-in-history-handling.html">Fisher, Neyman, and the Creation of Classical Statistics</a></i>, I've read some of his early correspondence with E. Pearson, when Neyman was still learning English. So I wasn't completely sure what to expect when I read his confidence interval article from 1937. I highly recommend reading it, as Neyman excellently summarizes complicated mathematical concepts in plain language. His work is highly approachable and he uses lots of examples to help drive home his points.<br /><br />Neyman argued that, unless we have access to the full population we are studying, and are capable of measuring each individual in that population, there will be probabilities associated with our work; both the estimation process <i>and </i>the estimate itself should be expressed in probability. In fact, his classical approach to statistics includes probability in the estimation process, through the use of significance testing. He acknowledges that there are many different approaches to estimating values, and that while some are more right than others, none are likely to get you the exact population value, θ. They will all be estimates within a certain margin of error. Confidence intervals communicate that margin of error.<br /><br />That is, he essentially says there is disagreement on the process of estimating population values from samples, and disagreement on the use of different estimation techniques (such as maximum likelihood, developed by <a href="http://www.deeplytrivial.com/2017/04/f-is-for-ronald-fisher.html">R.A. Fisher</a>). Though some approaches may be superior - and some of Neyman's footnotes feel very directed at Fisher - we are still trying to estimate an unknown parameter, so there is really no way to <i>prove </i>one is superior. But we can perhaps identify an interval surrounding the true population value.<br /><br />That interval - the confidence interval - will be based on probability - the confidence coefficient - which is greater than 0 and less 1. The usual convention is 0.95 (95%), though he uses a variety of confidence coefficients in his paper and doesn't really settle on one as the gold standard. The 95% convention came later, probably because of it's connection to the probability we use in <a href="http://www.deeplytrivial.com/2017/04/p-is-for-p-value.html">significance testing</a>, where the convention for alpha is 0.05.<br /><br />Without knowing the precise shape of the distribution of population values, we would instead use values with "intuitive" (his word) appeal, such as, for instance, the normal distribution (either the standard normal distribution or the t-distribution). He offers a variety of equations for different scenarios, but this one - the one that works with the normal distribution, which came at the end of the paper - is probably the approach most statistics students are familiar with. That is, we can use the values associated with different proportions of the curve around the mean to generate our confidence interval. We use these values from the z or t distribution (Neyman recommends t) as the multiple for the standard deviation. The exact procedure for confidence intervals varies depending on what type of estimate you're working with. For instance, some confidence intervals use <a href="http://www.deeplytrivial.com/2017/08/statistics-sunday-introduction-to.html">standard error</a> instead. But the basic procedure of choosing a probability and combining it with results of your analysis and values from a known distribution remains.<br /><br />As the size of the sample used to estimate the population value increases, the bias (difference between the estimate and actual value) reduces toward 0. So estimates based on larger samples are more likely to be close to the true population value, and confidence intervals generated from that estimate will be narrower while still being likely to contain the actual value.<br /><br />How likely? We don't actually know. As Neyman points out continuously in his paper, the probability that a range <i>actually </i>contains the population value is 0 or 1. There is no in between when it comes to real probability; it's either there or it isn't. But when we generate a confidence interval around our estimate, we don't know if it truly contains the actual value or not. So we draw upon the <a href="http://www.deeplytrivial.com/2017/04/l-is-for-law-of-large-numbers.html">law of large numbers</a>, that over time, with repeated estimates of the population value, we'll have the real population value in our ranges a certain proportion of the time, with that proportion equal to the confidence coefficient we choose.<br /><br />Say we always use 95% confidence intervals in a certain area of study. (That's the convention, anyway.) With repeated research (conducted in an unbiased way), we'll have the real population value in our range much of the time, with the actual percentage of the time approaching 95% as the number of studies approaches infinity. As has been shown repeatedly, chance is lumpy, and a 95% change of something doesn't mean it will happen exactly 95% of the time, just like you won't have a perfect 50% heads in your coin flips.<br /><br />A glance at Neyman's reference section shows many of the greats of statistics: Fisher, Hotelling, Kolmogorov, Lévy, Markoff... Many names you'll hear again and again in these GMIS posts.http://www.deeplytrivial.com/2017/09/great-minds-in-statistics-jerzy-neymans.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-8595811813751825853Thu, 31 Aug 2017 16:19:00 +00002017-08-31T11:19:09.400-05:00scienceweatherClimate Change and the Behavior of a StormHurricane Harvey has been more devastating than most of us expected. As I stopped to grab breakfast on my way to work this morning, I saw an infographic on the front page of USA Today detailing just how bad things are in Texas in terms of the cost of the damage (to say nothing of the loss of human life):<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-I74gLHL5zMU/Wag0PJWbyuI/AAAAAAAAJBI/gyxsK5E0WMQBcgIYySkDb152b4-3lMA6gCKgBGAs/s1600/IMG_6627.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1600" data-original-width="1352" height="320" src="https://3.bp.blogspot.com/-I74gLHL5zMU/Wag0PJWbyuI/AAAAAAAAJBI/gyxsK5E0WMQBcgIYySkDb152b4-3lMA6gCKgBGAs/s320/IMG_6627.JPG" width="270" /></a></div><br />Part of the reason Harvey has been so devastating is because its behavior has been different from many previous hurricanes, and <a href="http://news.nationalgeographic.com/2017/08/hurricane-harvey-climate-change-global-warming-weather/">climate change may be to blame</a>:<br /><blockquote>In the case of Harvey, which is dumping rivers of rain in and around Houston and threatening millions of people with catastrophic flooding (<a href="http://www.nationalgeographic.com/photography/proof/2017/08/hurricane-harvey-texas-flooding/">see photos</a>), at least three troubling factors converged. The storm intensified rapidly, it has stalled out over one area, and it is expected to continue <a href="http://news.nationalgeographic.com/2017/08/hurricane-harvey-floods-historic-rainfall/">dumping record rains for days and days</a>.<br /><br />Hurricanes tend to weaken as they approach land because they are losing access to the hot, wet ocean air that gives the storms their energy. Harvey's wind speeds, on the other hand, intensified by about 45 miles per hour in the last 24 hours before landfall, according to National Hurricane Center data.<br /><br />[Kerry Emanuel, an atmospheric sciences professor at the Massachusetts Institute of Technology,] analyzed the evolution of 6,000 simulated storms, comparing how they evolved under historical conditions of the 20th century, with how they could evolve at the end of the 21st century if greenhouse gas emissions keep rising. The result: A storm that increases its intensity by 60 knots in the 24 hours before landfall may have been likely to occur once a century in the 1900s. By late in this century, they could come every five to 10 years.</blockquote>As the article points out, the big reason for all the damage is the amount of rainfall, resulting in flooding. That too is likely due to climate change. In fact:<br /><blockquote>Every scientist contacted by National Geographic was in agreement that the volume of rain from Harvey was almost certainly driven up by temperature increases from human carbon-dioxide emissions.</blockquote>This is of course exacerbated by the fact that Harvey has stalled over land. Most hurricanes break apart or move off. Interestingly enough, the article notes that most climate scientists don't think this particular stall can be attributed to climate change, just bad luck; more research is needed, though, because some say climate change could result in changes in pressure fronts, which would impact how long a storm stalls in one place.http://www.deeplytrivial.com/2017/08/climate-change-and-behavior-of-storm.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-3902062455994837666Wed, 30 Aug 2017 15:34:00 +00002017-09-02T10:16:44.726-05:00great minds in statisticsmath and statisticsstatistical sinsStatistical Sins in History: Handling and Understanding CriticismToday's <a href="http://www.deeplytrivial.com/search/label/statistical%20sins">Statistical Sins</a> will be a little bit different, using an example from history of statistics to talk about an aspect of research publication. I'm currently reading <i><a href="https://www.amazon.com/Fisher-Neyman-Creation-Classical-Statistics/dp/1441994998/ref=sr_1_1?ie=UTF8&qid=1504103864&sr=8-1&keywords=Fisher%2C+Neyman%2C+and+the+Creation+of+Classical+Statistics">Fisher, Neyman, and the Creation of Classical Statistics</a></i> by Erich L. Lehmann. I've talked before about <a href="http://www.deeplytrivial.com/2017/08/great-minds-in-statistics-happy.html">Egon Pearson</a>, who was Jerzy Neyman's long-time collaborator. The feud between Neyman and <a href="http://www.deeplytrivial.com/2017/04/f-is-for-ronald-fisher.html">Ronald Fisher</a> is legendary in statistical history, as is the feud between Karl Pearson and Fisher. But not as much attention has been given to the feud between E. Pearson and Fisher. The start of that feud - though arguably mostly caused by Fisher and K. Pearson's ongoing competition of who could be more petty - can probably be traced to a review, authored by E. Pearson, about Fisher's book, <i>Statistical Methods for Research Workers</i>.<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://3.bp.blogspot.com/-_E8jl-FWVoM/WabQ0oNieMI/AAAAAAAAI_o/LKMf3dk2KrUQ_lyYpZpB5ZmexT3pd6PHwCLcBGAs/s1600/TitlePage.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1186" data-original-width="750" height="320" src="https://3.bp.blogspot.com/-_E8jl-FWVoM/WabQ0oNieMI/AAAAAAAAI_o/LKMf3dk2KrUQ_lyYpZpB5ZmexT3pd6PHwCLcBGAs/s320/TitlePage.jpg" width="202" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">The title page from the <a href="http://psychclassics.yorku.ca/Fisher/Methods/">first edition</a></td></tr></tbody></table>The review in question was regarding Fisher's second edition of the book. It was positive overall, except for this (note: this is quoted from <i>Fisher, Neyman, and the Creation of Classical Statistics</i>; I didn't track down the original):<br /><blockquote>There is one criticism, however, which must be made from the statistical point of view. A large number of the tests developed are based... on the assumption that the population sampled is of the "normal" form. That this is the case may be gathered from a careful reading of the text, but the point is not sufficiently emphasized. It does not appear reasonable to lay stress on the "exactness" of the tests when no means whatever are given of appreciating how rapidly they become inexact as the population sampled diverges from normality... [N]o clear indication of the need for caution in their application is given.</blockquote>The issues E. Pearson is addressing here are 1) the robustness of a test and 2) determining how far a dataset needs to <a href="http://www.deeplytrivial.com/2017/05/statistics-sunday-whats-normal-anyway.html">diverge from normal</a> before it no longer satisfies the requirements of the test. These are legitimate questions, and further, there is a very good reason E. Pearson raised them. But first, the fallout.<br /><br />Fisher was pissed. He was so pissed he wrote a response to the journal that originally published the review (<i>Nature</i>). We don't know exactly what this letter said, but based on later correspondence, it appears Fisher believed the question of normality was irrelevant to the content of the book (and I'm sure there was some name-calling as well). As often occurs with a letter to the editor regarding a published paper, the editor sent it to E. Pearson and asked if he would like to respond. He wrote his response but before sending it off, showed it to <a href="http://www.deeplytrivial.com/2017/04/t-is-for-t-test.html">William Sealy Gosset</a>.<br /><br />Gosset, who had a good working relationship with Fisher, decided to serve as mediator, and wrote a letter to Fisher to try to settle the dispute. Apparently that approach worked, because Fisher decided to withdraw his letter to <i>Nature</i> (which is why we don't know what it said) and suggested Gosset should instead write a letter (on Fisher's behalf) responding to E. Pearson's review. Of course, Fisher did end up writing a response... to Gosset's letter, because Gosset agreed with E. Pearson's comment about normality, saying that, though he believed the Student distribution (which he created) could withstand "small departures from normality," we needed more research into this topic, and in the meantime, experts in statistical distributions (like Fisher) could help guide us on how to respond when our data aren't normal. Gosset knew Fisher was a better mathematician, and likely saw this as a way of asking Fisher for help in answering these questions.<br /><br />Fisher, instead, brought up the possibility of <a href="http://www.deeplytrivial.com/2017/06/statistics-sunday-parametric-versus.html">distribution-free tests</a>.<br /><br />The thing Fisher never really considered is why E. Pearson was so fixated on this issue of robustness and normality. Do you know what are two of E. Pearson's contributions to the field of statistics? Exploration into determining the best goodness of fit test (that is, the best way to determine if a set of data matches a theoretical distribution, like the normal distribution - part of his collaboration with Neyman) and the concept of robustness. In fact, he was already working on much of this when he wrote that review in 1929.<br /><br />E. Pearson was not trying to make Fisher look bad or call him dumb. On the contrary: E. Pearson was trying to connect what he was working on to Fisher's work and set the stage for his own contributions. In fact, this is often the reason researchers will criticize another researcher's work in a paper or letter to the editor: they're setting the stage for the contribution they're about to make. They're taking the opportunity to say "we need X," only to turn around and deliver X soon after.<br /><br />This is done all the time. People even do it in their own papers, when they highlight a certain shortcoming of their research in the discussion section; they're probably highlighting a flaw that they've already figured out how to fix and may already be testing in a new study. (Or they added it to make a reviewer happy.)<br /><br />Fisher's response was because he couldn't see the reason E. Pearson was criticizing him. He just saw the criticism and went into rage mode. It's easy to do. Hearing criticism sucks. And while, as researchers we frequently have to deal with criticism of our work in dissertation defenses and <a href="http://www.deeplytrivial.com/2015/08/peer-into-world-of-academic-publishing_22.html">peer reviews</a>, they are rarely so public as they are with a published book review or letter to the editor.<br /><br />I'll admit, when I received an email from a journal that someone had written a letter to the editor in response to one of my articles (and asking if I'd like to write a response), I made that sound kids make when they have a skinned knee:<br /><br /><div style="text-align: center;"><iframe allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/Y9j3heYZAk8" width="560"></iframe><br /></div><br />It took some courage to open the file and read the letter. I was amazed to see it was incredibly positive. I can only imagine what my reaction would be if it hadn't been positive.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-GbQv5k0wj38/WabY0YRwKuI/AAAAAAAAI_4/ZFP9M5tMsGk1v7qDc-osHN2AF1rVSg2QgCLcBGAs/s1600/pillow_fort.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="197" data-original-width="500" height="126" src="https://2.bp.blogspot.com/-GbQv5k0wj38/WabY0YRwKuI/AAAAAAAAI_4/ZFP9M5tMsGk1v7qDc-osHN2AF1rVSg2QgCLcBGAs/s320/pillow_fort.gif" width="320" /></a></div><br />But if we can take a step back and realize why this researcher might be waging a particular criticism, it might make it a bit easier to handle the hurt feelings. Who knows how different things would have been for the field of statistics if - instead of throwing a tantrum and writing a pissed off letter to the editor - Fisher had written E. Pearson a letter directly saying, "I think this issue of normality is irrelevant to what I was trying to do. Why do you think it's important?" Maybe we would be talking today about the amazing collaboration between E. Pearson and Fisher. (Probably not, but a girl can dream, right?)http://www.deeplytrivial.com/2017/08/statistical-sins-in-history-handling.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-767492959266716301Tue, 29 Aug 2017 18:55:00 +00002017-08-29T13:55:00.533-05:00math and statisticspoliticsThe Unpopularity of Presidential PardonsFollowing up on <a href="http://www.deeplytrivial.com/2017/08/arpaio-and-shifting-survey-responses.html">yesterday's post</a> about the Arpaio pardon, here's an <a href="https://fivethirtyeight.com/features/the-arpaio-pardon-has-plenty-of-precedents-that-got-other-presidents-in-trouble/">article from FiveThirtyEight</a> examining Presidential pardons over the year, highlighting not only the unpopularity of these pardons, but what makes Trump's pardon of Arpaio so unconventional:<br /><blockquote>Several political <a href="http://www.politico.com/story/2017/08/26/schumer-mccain-trump-arpaio-pardon-242065">allies and foes</a> <a href="http://time.com/4917014/joe-arpaio-pardon-reaction-john-mccain-jeff-flake/">immediately condemned</a> the move as inappropriate and an insult to the justice system. But most of the criticized characteristics of Arpaio’s pardon have at least some parallels to previous ones. <br /><br />The number of controversial characteristics of the Arpaio pardon, however, is unusual and raises questions about the political fallout that Trump will face. The Arpaio pardon, in other words, does have historical precedents (as Trump <a href="http://www.foxnews.com/politics/2017/08/28/trump-holds-press-conference-with-finnish-president-sauli-niinisto-live-blog.html">said on Monday</a>) — just not good ones.</blockquote><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-EXtQZkQuPaQ/WaW357PlGWI/AAAAAAAAI-4/-YK-WLuYj5keGdg2jGyKpIVcFifFleJiACLcBGAs/s1600/hiprofile_pardons.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="647" data-original-width="891" height="232" src="https://1.bp.blogspot.com/-EXtQZkQuPaQ/WaW357PlGWI/AAAAAAAAI-4/-YK-WLuYj5keGdg2jGyKpIVcFifFleJiACLcBGAs/s320/hiprofile_pardons.png" width="320" /></a></div><blockquote>“A pardon is a judgment call that the president makes, and we get to police that through the political process,” [Michigan State University law professor Brian] Kalt said. Noah Feldman, a professor at Harvard Law School, said that the fact that Arpaio was convicted for deliberately ignoring a court’s order to stop violating individuals’ constitutional rights places him in a category of his own. The only recourse for such a dramatic abuse of presidential power, according to Feldman, is impeachment. Or, short of impeachment, Kalt pointed to Ford’s pardon of Nixon: “Ford decided it was the right thing to do, and he lost the election as a result.”</blockquote>http://www.deeplytrivial.com/2017/08/the-unpopularity-of-presidential-pardons.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-2183787056005885238Tue, 29 Aug 2017 15:54:00 +00002017-08-29T10:54:33.586-05:00behaviorismcognitive biasesnonconscious processesreviewsciencesocial psychologySocial Learning and Amazon ReviewsIn my inbox this morning was a <a href="http://journals.sagepub.com/doi/full/10.1177/0956797617711291">new article from <i>Psychological Science</i></a> exploring how people use statistical and social information. And a great way to examine that is through Amazon reviews.<br /><br />Social learning - also called vicarious learning - is when we learn by watching others. One of the famous social learning studies, <a href="http://www.deeplytrivial.com/2016/04/b-is-for-bandura.html">Bandura's "bobo doll" study</a> found that kids could learn vicariously by watching a recording, showing us that it isn't necessary for the learner to be in the same room as the model. The internet has exponentially increased our access to social information. But Amazon reviews not only provide social information, but numerical information:<br /><blockquote>One can learn in detail about the outcomes of others’ decisions by reading their reviews and can also learn more generally from average scores. However, making use of this information demands additional skills: notably, the ability to make intuitive statistical inferences from summary data, such as average review scores, and to integrate summary data with prior knowledge about the distribution of review scores across products.</blockquote>To generate material for their studies, they examined data from 15 million Amazon reviews (15,655,439 reviews of 356,619 products, each with at least 5 reviews, to be exact). They don't provide a lot of detail in the article, instead referring to other sources, one of which is available <a href="https://arxiv.org/pdf/1506.08839.pdf">here</a>, to describe how these data were collected and analyzed. (tl;dr is that they used data mining and machine learning.)<br /><br />For experiment 1, people had to make 33 forced choices between two products, which were presented along with an average rating and number of reviews. Overall, the most reviewed product had 150 reviews and the least reviewed product had 25, with options fall between those two extremes. An example was shown in the article:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-YwHlVa2E3QY/WaWJt_tGO1I/AAAAAAAAI-Y/hhqSaGIwv-cyMSyO42kiImCCLgPPNQRsQCLcBGAs/s1600/example_product.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="232" data-original-width="500" height="148" src="https://2.bp.blogspot.com/-YwHlVa2E3QY/WaWJt_tGO1I/AAAAAAAAI-Y/hhqSaGIwv-cyMSyO42kiImCCLgPPNQRsQCLcBGAs/s320/example_product.gif" width="320" /></a></div><br />They found that people tended to prefer the product with more reviews more frequently than their statistical model (which factored in both number of reviews and rating) predicted. In short, they were drawn more to the large numbers than to the information the ratings were communicating.<br /><br />Experiment 2 replicated the first experiment, except this time, they had participants make 25 forced choices, and decreased the spread of number of reviews: the minimum was 6 and the maximum was 26. Once again, people were drawn more to the number of reviews than the ratings. When they pooled results from the two experiments and examined them using meta-analysis techniques, they found that people unaffected by the drastic differences in number of reviews between experiment 1 and experiment 2. As the authors state in their discussion:<br /><blockquote>In many conditions, participants actually expressed a reliable preference for more-reviewed products even when the larger sample of reviews served to statistically confirm that a poorly rated product was indeed poor.</blockquote>Obviously, crowd-sourcing information is a good thing, because, as we understand from the <a href="http://www.deeplytrivial.com/2017/04/l-is-for-law-of-large-numbers.html">law of large numbers</a>, data from a larger sample is expected to more closely reflect the true population value.<br /><br />The problem is that people fixate on the amount of information and use that <a href="http://www.deeplytrivial.com/2016/04/q-is-for-quick-v-slow-processing.html">heuristic</a> to guide their decision, rather than using what the information is telling them about quality. And there's a point of diminishing returns on sample size and amount of information. A statistic derived from 50 people is likely closer to the true population than a statistic derived from 5 people. But doubling your sample from 50 to 100 doesn't double the accuracy. There comes a point where more is not necessarily better, just, well, more. This is a more complex side of statistical inference, one the average layperson doesn't really get into.<br /><br />And while we're on the subject of Amazon reviews, there's this hilarious trend where people write joke reviews on Amazon. You can read some of them <a href="http://www.boredpanda.com/funny-amazon-reviews/">here</a>.http://www.deeplytrivial.com/2017/08/social-learning-and-amazon-reviews.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-3891871167222072555Mon, 28 Aug 2017 15:39:00 +00002017-08-28T10:39:19.679-05:00cognitive biasesmath and statisticspoliticssciencesocial psychologyArpaio and Shifting Survey ResponsesOn Friday, the President issued a <a href="http://www.phoenixnewtimes.com/news/joe-arpaio-26-days-from-conviction-to-pardon-by-donald-trump-9634570">pardon for former Maricopa County Sheriff Joe Arpaio</a>. Since that announcement, stories <a href="https://static.currentaffairs.org/2017/08/wait-do-people-actually-know-just-how-evil-this-man-is">about Arpaio and his history</a> have filled my Facebook newsfeed. Perry Bacon, Jr., at FiveThirtyEight argues that this pardon is <a href="https://fivethirtyeight.com/features/the-arpaio-pardon-encapsulates-trumps-identity-politics/">motivated by conservative identity politics</a>. In his article, he links to a <a href="https://today.yougov.com/news/2017/08/25/politics-pardoning-sheriff-joe/">YouGov poll</a>, citing that - as evidence of his arguments about conservative identity politics - opinions about the pardon fall along party lines. But there's something even more interesting in this YouGov poll: an exploration of framing and its impact on opinions:<br /><blockquote>On Thursday and Friday, before President Trump pardoned former Maricopa county Sheriff Joe Arpaio, YouGov polled 1,000 Americans about what they thought should be done. Before supplying any information about the details of the Arpaio case, 24% said they were in favor of a pardon and 37% were opposed. <br /><br />However, this is the type of question where opinion can change quickly as the public learns more about the issue. Despite widespread media coverage and Trump's hint of a pardon on Tuesday, a majority of the public said they knew "little" or "nothing at all" about the Arpaio case. To see what might happen if people were exposed to arguments for and against the pardon--as will inevitably happen--we asked our sample whether they agreed or disagreed with pro and con arguments. <br /><br />The pro-pardon wording was based on White House talking points. The anti-pardon statement mirrored language used by Arpaio’s opponents.</blockquote><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-TQP834Z6wYM/WaQ0uDscxfI/AAAAAAAAI9o/7RudV-y3ikwpDMgB2wFNa402NWc6eCkFQCLcBGAs/s1600/arpaio_statements.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="829" data-original-width="1079" height="246" src="https://4.bp.blogspot.com/-TQP834Z6wYM/WaQ0uDscxfI/AAAAAAAAI9o/7RudV-y3ikwpDMgB2wFNa402NWc6eCkFQCLcBGAs/s320/arpaio_statements.png" width="320" /></a></div>After hearing one of the two arguments, respondents were then exposed to the other, so that by the end of the poll, everyone had heard both sides. This is when the most pronounced party differences in opinion appeared:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-FySXJtOwIV8/WaQ127_8J4I/AAAAAAAAI90/hvNxAg8anlEihz-1vzuQNMvsj6T3LT0cQCLcBGAs/s1600/arpaio_final.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="338" data-original-width="696" height="155" src="https://3.bp.blogspot.com/-FySXJtOwIV8/WaQ127_8J4I/AAAAAAAAI90/hvNxAg8anlEihz-1vzuQNMvsj6T3LT0cQCLcBGAs/s320/arpaio_final.png" width="320" /></a></div><br />Specifically, most of the movement was among respondents who had selected "Not Sure" in their initial opinion. Among Democrats and, to a lesser extent, Independents, these individuals moved to "Oppose." The opposite trend is observed among Republicans, though some people who were initially "Oppose" also appear to have moved to different columns.<br /><br />This presents a problem with regard to surveying about these issues. When addressing issues that are not well known, or where limited facts are available, it makes sense to include some background in opinion polling. But this highlights an important methodological issue - the way an issue is framed will certainly have an impact on responses (<a href="http://www.deeplytrivial.com/2016/03/writing-good-survey-questions-or-why-im.html">we've known this for a while</a>), but including the "whole story" with two sides of an argument could also impact opinions, by leading to a group polarization effect. Notice what pushed many respondents to the poles of the continuum (a continuum with Oppose on one end and Favor on the other) was not that this was an issue addressed by current administration - which is in itself very divisive - but instead was the use of a partisan issue (illegal immigration) in the background information. <br /><br />As more and more <a href="http://www.deeplytrivial.com/2017/05/what-do-americans-think-about-science.html">issues are politicized</a>, we're likely to see more and more of this <a href="http://www.deeplytrivial.com/2016/12/commentary-on-american-divide-from.html">group polarization effect</a>. And that will make it even harder to find a common ground.http://www.deeplytrivial.com/2017/08/arpaio-and-shifting-survey-responses.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-2761906572202405042Sun, 27 Aug 2017 15:29:00 +00002017-08-27T23:08:01.757-05:00grad schoolmath and statisticssciencesocial psychologyStatistics SundayStatistics Sunday: Dealing with Missing DataFor my dissertation, participants read and completed a large packet. It included a voir dire questionnaire, abbreviated trial transcript, and post-trial questionnaire. Because I didn't have a grant or really any kind of externally contributed budget for the project, I copied and assembled the packets myself. To save paper (and money), I copied the materials two-sided. I put page numbers on the materials so that participants would (hopefully) notice the materials were front and back.<br /><br />Sadly, not everyone did.<br /><br />When I noticed after one of my sessions that people were not completing the back of the questionnaire, I added in arrows on the first page, to let them know there was material on the back. After that, the number of people skipping pages decreased, but still, some people would miss the back side of the pages.<br /><br />Sometimes, despite your best efforts, you end up with missing data. Fortunately, there are things you can do about it.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-nC3ISUcfB6k/WaLlFhQXIOI/AAAAAAAAI9Y/cKzVvd8SjTwzDKIVAXF-CaT25gGdRWULwCLcBGAs/s1600/got_missing.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="793" data-original-width="1600" height="158" src="https://3.bp.blogspot.com/-nC3ISUcfB6k/WaLlFhQXIOI/AAAAAAAAI9Y/cKzVvd8SjTwzDKIVAXF-CaT25gGdRWULwCLcBGAs/s320/got_missing.jpg" width="320" /></a></div><br />What you can do about missing data depends in part on what kind of missingness we're talking about. There are three types of missing data:<br /><br /><h3>Missing Completely at Random</h3>In this case, missing information is not related to any other variables. It's rare to have this type of missing data - and that's actually okay, because there's not a lot you can do in this situation. Not only do you have missing data, there's no relationship between the data that is missing and the data that is not missing, meaning you can't use what data you have to fill in missing values. But you're also statistically justified in proceeding with what data you have. Your complete data is, in a sense, a random sample of all data from your group (which includes those missing values you didn't get to measure).<br /><br /><h3>Missing at Random</h3>‘Missing at random’ occurs when the missing information is related to observed variables. My dissertation data would fall in this category - at least, on the full pages that were skipped. This is because people were skipping those questions by accident, but since those questions were part of a questionnaire on a specific topic, the items are correlated with each other.<br /><br />This means that I could use my complete data to fill in missing values. There are many methods for filling in missing values in this situation, though it should be kept in mind that any imputation method will artificially decrease <a href="http://www.deeplytrivial.com/2017/04/d-is-for-descriptive-statistics.html">variability</a>. You want to use this approach sparingly. I shouldn't use it to fill in entire pages worth of questions, but could use it if a really important question or two was skipped. (By luck alone, all of the questions I had planned to include in analyses were on the front sides, and were as a result very rarely skipped.)<br /><br /><h3>Missing Not at Random</h3>The final situation occurs when the missing information is related to the missing values themselves or to another, unobserved variable. This is when people skip questions because they don't want to share their answer.<br /><br />This is why I specified above that my data is only missing at random for those full pages. In those cases, people skipped the questions because they didn't realize they were there. But if I had a skipped question here and there (and I had a few), it could be because people didn't see it OR it could be because they don't want to share their answer. Without any data to justify one or the other, I have to assume it's the latter - if I'm being conservative, that is; lots of researchers with no data to justify it will assume data is missing at random and analyze away.<br /><br />If I ask you about something very personal or controversial (or even illegal), you might skip that question. The people who do respond are generally the people with nothing to hide. They're going to be qualitatively different from people who don't want to share their answer. Methods to replace missing values will not be very accurate in this situation. The only thing you can do here is to try to prevent missing data from the beginning, such as with language in the consent document about how participants' data will be protected. If you can make the study completely anonymous (so that you don't even know <i>who </i>participated) that would be best. When that's not possible, you need strong assurances of confidentiality.<br /><br /><h3>How Do You Solve a Problem Like Missing Data?</h3><div>First off, you can solve your missing data problems with imputation methods. Some are better than others, but I generally don't recommend these approaches because, as I said above, they artificially decrease variance. The simplest imputation method is mean replacement - you replace each missing value with the mean derived from non-missing values on that variable. This is based on the idea that <a href="http://www.deeplytrivial.com/2017/08/statistical-sins-regression-to-mean.html">"the expected value is the mean"</a>; in fact, it's the most literal interpretation of that aspect of statistical inference. </div><div><br /></div><div>Another method, which is a more nuanced interpretation of "the expected value is the mean" is to use <a href="http://www.deeplytrivial.com/2017/06/statistics-sunday-linear-regression.html">linear regression</a> to predict scores on the variable with missingness using one or more variables with more complete data. So you conduct the analysis with people who have complete data, then use the regression equation you derived from those participants to predict what the score will be for someone with incomplete data. But regression is still built on means - it's just a more complex combination of means. Regression coefficients are simply the effect of one variable on another averaged across all participants. And outcomes are simply the mean of the <a href="http://www.deeplytrivial.com/2017/04/y-is-for-y-dependent-variables.html">y variable</a> for people with a specific combination of scores on the <a href="http://www.deeplytrivial.com/2017/04/x-is-for-x-independent-or-predictor.html">x variables</a>. Fortunately, in this case, you aren't using a one-size-fits-all approach, and you're introducing some variability into your imputed scores. But you're still artificially controlling your variance by, in a sense, creating a copy of another participant.</div><div><br /></div><div>Of course, you're better off using an analysis approach that can handle missing data. Some analyses can be set up to remove people with missing data "pairwise." This means that for a portion of analysis using two variables, the program uses anyone with complete data on those two variables. People are not removed completely if they have missing data; they're just only included in the parts of the analysis for which they have complete data and dropped from parts of the analysis where they don't. This will work for simpler analyses like <a href="http://www.deeplytrivial.com/2017/04/r-is-for-r-correlation.html">correlations</a> - it just means that your correlation matrix will be based on a varying number of people, depending on which specific pair of variables you're referring to.</div><div><br /></div><div>More complex, iterative analyses can also handle some missing data, by changing which estimation method it uses. (This is a more advanced concept, but I'm planning on writing about some of the estimation methods in the future - stay tuned!) Structural equation modeling analyses, for instance, can handle missing data, as long as the proportion of missing data in the dataset doesn't get too high.</div><div><br /></div><div>And if you can use psychometric techniques with your data - that is, if your data examines measures of a latent variable - you're in luck, because my favorite psychometric technique, Rasch, can handle missing data beautifully. (To be fair, item response theory models can as well.) In fact, the assumption in many applications of the Rasch model is that you're going to have missing data, because it's often used on adaptive tests - adaptive meaning people are going to respond to different combinations of questions depending on their ability. </div><div><br /></div><div>I have a series of posts planned on Rasch, so I'll revisit this idea about missing data and adaptive tests later on. And I'm working on an article on how to determine if Rasch is right for you. The journal I'm shooting for is (I believe) open access, but I'm happy to share the article, even in draft form, to anyone who wants it. Just leave a comment below and I'll follow-up with you on how to share it.</div>http://www.deeplytrivial.com/2017/08/statistics-sunday-dealing-with-missing.htmlnoreply@blogger.com (Sara)2tag:blogger.com,1999:blog-4594832939334410220.post-7733671608203719559Fri, 25 Aug 2017 18:12:00 +00002017-08-26T09:42:00.529-05:00math and statisticsstatistical sinsStatistical Sins Late Edition: Three Things We LoveThe <a href="http://www.deeplytrivial.com/2017/08/a-little-cloudy-but.html">eclipse</a> was amazing, but after missing 2 days of work this week, playing catch-up Wednesday, and attending an all-day meeting yesterday, I was unable to get myself together and write a <a href="http://www.deeplytrivial.com/search/label/statistical%20sins">Statistical Sins</a> post for Wednesday (or even yesterday). (I did, however, get around to posting a Great Minds in Statistics post on the amazing <a href="http://www.deeplytrivial.com/2017/08/great-minds-in-statistics-fn-david.html">F.N. David</a>. I've had that post scheduled for a while now.)<br /><br />I'll admit, part of the problem, that was compounded by lack of time, was not knowing what to write about. But a story that is making the rounds again and made it's way into my news feed is a <a href="http://www.biostat.jhsph.edu/courses/bio621/misc/Chocolate%20consumption%20cognitive%20function%20and%20nobel%20laurates%20(NEJM).pdf">study from the New England Journal of Medicine</a> regarding a country's overall chocolate consumption and its number of Nobel Prize laureates.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-2KMY6xcM5Pc/WaBaDNbIipI/AAAAAAAAI8E/6WzAfPyYyScbFrp6t8JFeQyeaN4GsvzLACLcBGAs/s1600/chocolate-consumption-and-nobel-prize.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="417" data-original-width="480" height="278" src="https://4.bp.blogspot.com/-2KMY6xcM5Pc/WaBaDNbIipI/AAAAAAAAI8E/6WzAfPyYyScbFrp6t8JFeQyeaN4GsvzLACLcBGAs/s320/chocolate-consumption-and-nobel-prize.png" width="320" /></a></div><br />Apparently the <a href="http://www.deeplytrivial.com/2017/04/r-is-for-r-correlation.html">correlation</a> is a highly significant 0.791. While the authors get that this doesn't imply a causal relationship, they sort of miss the boat here:<br /><blockquote>Of course, a correlation between X and Y does not prove causation but indicates that either X influences Y, Y influences X, or X and Y are influenced by a common underlying mechanism. </blockquote>So that's three possibilities: A causes B, B causes A, or C causes A and B, or what is known as the <a href="http://www.deeplytrivial.com/2017/08/stats-note-third-variable-problem.html">third variable problem</a>. But they miss the fourth possibility: A and B are two random variables that by chance alone have a significant relationship. There might not be a meaningful C variable.<br /><br />To clarify, when I say "random variable," I mean a variable that is allowed to vary naturally - we're not actively introducing any interventions to increase the number of Nobel laureates in any country (which in light of this study would probably involve airlifting chocolate in). And when we allow variables to vary naturally, we'll sometimes find relationships between them. That could occur just by chance. In my correlation post linked above, I generated 20 random samples of 30 pairs of variables, and found 3 significant correlations (all close to r = 0.4) by chance alone.<br /><br />Sure, this is a significant relationship - a highly significant one at that - but there isn't some level of significance where a relationship suddenly goes from being potentially due to chance alone to being absolutely systematic or real. To argue that a relationship of 0.7 can't be due to chance makes no more sense than saying a relationship of 0.1 can't be due to chance. There's a chance I could create two random variables and have them correlate at 1.0, a perfect relationship. It's a small chance, but the chance is never 0. There's no magic cutoff value where we throw out the possibility of <a href="http://www.deeplytrivial.com/2017/04/a-is-for-alpha.html">Type I error</a>. And the <a href="http://www.deeplytrivial.com/2017/04/p-is-for-p-value.html">p-value</a> generated by an analysis is not the chance that a result is spurious; it's the chance we would find a relationship of that size by chance alone given what we know about the potential distribution of the variables interest - and what we know about the distribution comes from the very sample data we're speculating about. It's possible the distributions look completely different from what we expect, making the probability of Type I error higher than we realize. (In fact, see this <a href="http://www.deeplytrivial.com/2017/07/statistics-sunday-no-really-whats-bayes.html">post on Bayes theorem</a> about how the false positive rate is likely <i>much </i>higher than alpha.)<br /><br />It occurs to me that there are three consumables that people love so much, they keep looking for data that will justify our love of them. Those three things are coffee, chocolate, and bacon.<br /><br />And the greatest of these is bacon.<br /><br />It's true though. When we're not publishing stories about how <a href="http://www.healthline.com/nutrition/7-health-benefits-dark-chocolate">chocolate</a> or <a href="https://www.caffeineinformer.com/7-good-reasons-to-drink-coffee">coffee</a> benefits your health, we're attempting to disprove those evil scientists who try to convince us <a href="http://www.deeplytrivial.com/2015/11/the-importance-of-scientific-literacy.html">bacon</a> is harmful.<br /><br />Loving these things likely motivates us to study them. And sometimes that involves looking for a relationship - any relationship - with a positive outcome. Observational studies can very easily uncover spurious relationships. Increasing the distance (e.g., looking at country level data) between the effect (e.g., consumption of chocolate) and the outcome (e.g., Nobel prize) can drastically increase the probability that we find a false positive.<br /><br />I bet you can find many significant relationships - even highly significant relationships - when looking at two variables from the altitude of country-level data. More complicated relationships get washed out when viewing the relationship so far away from individual-level data. In fact, when we remove variance - either by aggregating data across many people (as occurs in country-level data) or by recoding continuous variables into <a href="http://www.deeplytrivial.com/2017/08/statistics-sunday-odds-ratios.html">dichotomies</a> - we may miss confounds or other variables that provide a much better explanation of the findings. We miss the signs that we're barking up the wrong tree.http://www.deeplytrivial.com/2017/08/statistical-sins-late-edition-three.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-7286448026146713784Wed, 23 Aug 2017 20:30:00 +00002017-08-28T09:38:32.817-05:00gendergreat minds in statisticsmath and statisticsGreat Minds in Statistics: F.N. David versus the PatriarchyHappy 108th birthday to Florence Nightingale David! F.N. David, as she is often known, was a British statistician, combinatorialist, author, and general mathematical bad ass who regularly took on the patriarchy's "This is a man's world" nonsense.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-bUS8g4_4Ah8/WZ3a2DpERpI/AAAAAAAAI7I/7Nmve5aZ_WM5K44rP22LYWoStXRWtkU1gCLcBGAs/s1600/FNDavid.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="326" data-original-width="282" height="320" src="https://1.bp.blogspot.com/-bUS8g4_4Ah8/WZ3a2DpERpI/AAAAAAAAI7I/7Nmve5aZ_WM5K44rP22LYWoStXRWtkU1gCLcBGAs/s320/FNDavid.jpeg" width="276" /></a></div><br />She was named after family friend - and self-taught statistician - Florence Nightingale. She took to math at a very early age and wanted to become an actuary. After completing her degree in mathematics in 1931, she applied for a career fellowship at an insurance firm, but was turned down. When she inquired why she was told that, despite being the most qualified candidate, she was a woman and they only hired men. In fact, she was told by many people who turned her down for a job that they didn't have any bathroom facilities for women and that was used as a reason they couldn't hire her.<br /><br />But in 1933, she was offered a job and a scholarship at University College in London, where she would study with Karl Pearson. In fact, <a href="http://www.dcscience.net/conversation-with-Florence-Nightingale-David.pdf">the way she got this opportunity</a> is pretty awesome:<br /><blockquote>I was passing University College and I crashed my way in to see Karl Pearson. Somebody had told me about him, that he had done some actuarial work. I suppose it was just luck I happened to be there. Curious how fate takes one, you know. We hit it off rather well, and he was kind to me. Incidentally, he's the only person I've ever been afraid of all my life. He was terrifying, but he was very kind. He asked me what I'd done and I told him. And he asked me if I had any scholarship and I said yes, I had. He said, "You'd better come here and I'll get your scholarship renewed," which he did.</blockquote>She worked for Pearson as a computer - literally. Her job was to generate the tables to go along with his correlation coefficient, a job that involved conducting complicated (and repetitive) analyses using a Brunsviga hand-crank mechanical calculator:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-Rl-z10CZvhQ/WZ3cg3tABSI/AAAAAAAAI7c/64trfidD-6gxLJB4KBNv832w4o3o5QoWACLcBGAs/s1600/BLM_Braunschweig_WMDE_%252864%2529.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1067" data-original-width="1600" height="213" src="https://3.bp.blogspot.com/-Rl-z10CZvhQ/WZ3cg3tABSI/AAAAAAAAI7c/64trfidD-6gxLJB4KBNv832w4o3o5QoWACLcBGAs/s320/BLM_Braunschweig_WMDE_%252864%2529.jpg" width="320" /></a></div><br />In her interview, linked above, she estimates she pulled that crank 2 million times.<br /><br />Because she found Pearson - or the "old man" as she referred to him - terrifying, she was incapable of telling him no:<br /><blockquote>On one occasion he was going home and I was going home, and he said to me, "Oh you might have a look at the elliptic integral tonight, we shall want it tomorrow." And I hadn't the nerve to tell him that I was going off with a boyfriend to the Chelsea Arts Ball. So I went to the Arts Ball and came home at 4-5 in the morning, had a bath, went to University and then had it ready when he came in at 9. One's silly when one's young. </blockquote>He also would apparently become very upset if she jammed the Brunsviga, so she often wouldn't tell him it was jammed, instead unjamming it herself with a long pair of knitting needles.<br /><br />After K. Pearson retired, she worked with Jerzy Neyman (who you can find out more about in my post on <a href="http://www.deeplytrivial.com/2017/08/great-minds-in-statistics-happy.html">Egon Pearson</a>, but look for a post on Neyman in the future!), who encouraged her to submit her 4 most recent publications as her PhD dissertation. She was awarded her doctorate in 1938.<br /><br />During World War II, she assisted with the war effort as experimental officer and senior statistician for the Research and Experiments Department. She served as a member on multiple advisory councils and committees, and was scientific adviser on mines for the Military Experimental Establishment. She created statistical models to predict the consequences of bombings, which provided valuable information on directing resources, and kept vital services going even as London was experiencing bombings. She later said that the war gave women an opportunity to contribute and believes the conditions for women improved because of it.<br /><br />She returned to University College in London after the war, first as a lecturer, then as a professor. Of course, that didn't change the fact that she was not allowed to join the school's scientific society because it only accepted men. So, she founded a scientific society of her own, that would accept both men and women. They invited many young scientists and apparently irked the "old rednecks" as a result.<br /><br />In the 1960s, she wrote a book on the history of probability, <i>Games, Gods, and Gambling </i>- I just ordered a copy this morning, so stay tuned for a review! In the late 1960s, she moved to California, where she became a Professor and - shortly thereafter - Chair in the Department of Statistics at the University of California Riverside.<br /><br />She passed away in 1993.<br /><br />In 2001, the Committee of Presidents of Statistical Societies and Caucus for Women in Statistics created <a href="http://community.amstat.org/copss/awards/fn-david">an award in F.N. David's name</a>, awarded every years to a woman who exemplifies David's contributions to research, leadership, education, and service. <br /><br />There's certainly a lot more to Florence Nightingale David than what I included in this post. I highly recommend reading the conversation with her linked above. She also receives some attention in <i><a href="http://www.deeplytrivial.com/2017/08/statistics-reading-book-review-lady.html">The Lady Tasting Tea</a></i>. For now, I'll close with a great quote from the linked interview. She commented that being influential is not her job in life. When asked what <i>is</i> her job in life, she said, "To ask questions and try to find the answers, I think."<br /><br />http://www.deeplytrivial.com/2017/08/great-minds-in-statistics-fn-david.htmlnoreply@blogger.com (Sara)0tag:blogger.com,1999:blog-4594832939334410220.post-3661403984984055147Mon, 21 Aug 2017 22:28:00 +00002017-08-21T17:28:37.088-05:00scienceA Little Cloudy But...Here's a picture we took of the totality:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-46U9gVuBnMU/WZteN4if5jI/AAAAAAAAI6M/Tn5WGsUx-20661q5l4xkzDpTt8AU4A-nQCLcBGAs/s1600/21014019_10154927132576461_8471141668491016295_o.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="864" data-original-width="1296" height="213" src="https://3.bp.blogspot.com/-46U9gVuBnMU/WZteN4if5jI/AAAAAAAAI6M/Tn5WGsUx-20661q5l4xkzDpTt8AU4A-nQCLcBGAs/s320/21014019_10154927132576461_8471141668491016295_o.jpg" width="320" /></a></div><br />We were viewing from North Kansas City, MO, at a relative's home. We got about 1 minute 40 seconds of totality. It was pretty cool how quickly it became dark and then light again.http://www.deeplytrivial.com/2017/08/a-little-cloudy-but.htmlnoreply@blogger.com (Sara)0