Monday, July 31, 2017

How You Can Science Along During the Eclipse

As you may already know, a total solar eclipse will occur on Monday, August 21, following a stripe across the US from Oregon to South Carolina:


And according to Rebecca Boyle at FiveThirtyEight, some cool science is going to happen during the eclipse:
[S]cientists who study eclipses will be buzzing around their equipment to take the measure of the sun, its atmosphere and its interaction with our own atmosphere. An event that could inspire a unique sense of cosmic communion will also answer burning questions about how our star works and how it affects us.

By blocking the sun’s blazing light, the moon unveils the solar corona, a region that scientists still struggle to understand. The sun streams radiation and charged particles through a wind that it constantly blows in all directions, and while we know the solar wind originates in the corona, scientists are not sure exactly how, or why. An eclipse is one of the only times scientists can see the corona itself — and try to understand what it is throwing at us.

With new corona observations, scientists can feed new bits into computer simulations hungry for data on coronal action. This will improve models used for predicting space weather, said Ramon Lopez, a physicist at the University of Texas at Arlington.

Scientists will scramble to study all this as the shadow of the moon races across the country at an average speed of 1,651 mph. Scientists and students at 25 locations across the country will launch more than 50 weather balloons, which will take the temperature of Earth’s atmosphere. From orbit, a slew of spacecraft and the crew aboard the International Space Station will be watching and taking pictures. And scientists will fly in at least three airplanes, including a National Science Foundation jet that will measure the sun in infrared.
And if you're planning to watch the eclipse, whether in the path of totality or not, you can get involved with some of the science that will be going on during and because of the eclipse:
With an app called GLOBE Observer and a thermometer, you can collect data during the eclipse and submit it to NASA. And Google and the University of California, Berkeley, are asking for video and images, which they’ll stitch together into an “Eclipse Megamovie.”
If you want to watch the eclipse, you'll need a special pair of glasses to protect your eyes. And here's some additional guidance if you plan on photographing or filming the eclipse.

Sunday, July 30, 2017

Statistics Sunday: Fixed versus Random Effects

As I've said many times, statistics is about explaining variance. You'll never be able to explain every ounce of variance (unless you, say, create a regression model with the same number of predictors as there are cases; then you explain all variance, but fail to make any generalizable inferences). Some variance just can't be explained except as measurement or sampling error. But...


That is, it's possible for you to have two variance components, and attempt to partition variance that appears random and variance that appears to be systematic - it has some cause that is simply an unmeasured variable (or set of variables). This is where random effects models come into play.

You may not have heard these terms in statistics classes, but you've likely done fixed effects analysis without even realizing it. Fixed effects models deal specifically with testing the effect of one or more independent variables - variables you've operationally defined and manipulated or measured. When you conduct a simple linear regression, you have your constant (the average on the Y variable when your X = 0), your (fixed effect) slope, and your (fixed effect) error. The effect you're testing is known.

But there are many other variables out there that you may not have measured. A random effects model attempts to partition variance, by seeing what residual variance in cases appear to be meaningful (that is, there are common patterns) and what appears to be just noise.

Often, we use a combination of the two, called a mixed effects model. This means we include predictors to explain as much variance as we can, then add in the random effects component, which will generate an additional variance term. It has the added bonus of making your results more generalizable, including to cases unlike the ones you included in your study. In fact, I mostly work with mixed and random effects models in meta-analysis, which add an additional variance component when generating the average effect size. In meta-analysis, a mixed effects model is used when you have strong justification that there isn't actually one true effect size, but a family or range of effect sizes, that depend on characteristics of the study. The results then include, not just an average effect size and confidence interval for that point estimate, but a prediction interval, which gives the range of possible true effect sizes. And this is actually a pretty easy justification to make.

Why wouldn't you use random effects all the time? Because it isn't always indicated, and it comes with some big drawbacks. First, this residual, random effects variance can't be correlated with any predictors you may have in the model. If that happens, you don't really have a good case for including the random effects component. The variance is related to the known predictors, not the unknown random effects variance. You're better off using a fixed effects model. And while random effects models can be easily justified, fixed effects models are easier to explain and interpret.

Additionally, the random (and mixed) effects models are more generalizable in part because they generate much wider confidence intervals. And of course, the wider the confidence interval, the more likely it is to include the actual population value you're estimating. But the wider the confidence interval, the less useful it is. There's a balance between being exhaustive and being informative. A confidence interval that includes the entire range of possible values will certainly include the actual population value. But it tells you very little.

Finally, a random effects model can reduce your power (and by the way, you need lots of cases to make this analysis work), and adding more cases - which increases power in fixed effects models - may actually decrease power (or even have no effect) because it adds more variance and also increases the size of confidence intervals. This may make it more difficult to show a value is significantly different from 0, even if the actual population value is. But as is always the case in statistics, you're estimating a value that is unknown with data that (you have little way of knowing) may be seriously flawed.

Hmm, that may need to be the subtitle of the statistics book I'm writing: Statistics: Estimating the Unknown with the Seriously Flawed.

Friday, July 28, 2017

Cool Chart, Hot Trend

Back in June, it was so hot in Arizona, mailboxes were melting and flights were unable to take off. Though people may have brushed this heatwave off as a fluke, research suggests summers are in fact getting hotter:
Extraordinarily hot summers — the kind that were virtually unheard-of in the 1950s — have become commonplace.

This year’s scorching summer events, like heat waves rolling through southern Europe and temperatures nearing 130 degrees Fahrenheit in Pakistan, are part of this broader trend.

During the base period, 1951 to 1980, about a third of summers across the Northern Hemisphere were in what they called a “near average” or normal range. A third were considered cold; a third were hot.

Since then, summer temperatures have shifted drastically, the researchers found. Between 2005 and 2015, two-thirds of summers were in the hot category, while nearly 15 percent were in a new category: extremely hot.

Practically, that means most summers today are either hot or extremely hot compared to the mid-20th century.
At the top of the article is an animation, showing the normal curve shifting to the right (toward warmer temperatures) over time. It's a great demonstration of this trend:


Thanks to my friend David over at The Daily Parker for sharing this story with me.

Thursday, July 27, 2017

All the Books!

If you live in the Chicago(land) area, you should definitely check out the Newberry Library Book Sale, in it's 33rd year! Six rooms on the lower level of the library are filled with books and CDs and books and records and books and DVDs and books and VHSs and did I mention books? I decided to walk over there after work, and I had great success in finding some awesome additions to my statistical library (plus a book about my favorite philosopher, Simone de Beauvoir):


And because I love nostalgia (Reality Bites) and was discussing one of these movies (Cry Baby) with my friend over at Is It Any Good?, I had to pick up these soundtracks:


The sale runs through Sunday! You should definitely check it out!

The System is Down

Recently, AI created by Facebook had to be shut down because it developed its own language:
An artificial intelligence system being developed at Facebook has created its own language. It developed a system of code words to make communication more efficient. The researchers shut the system down as it prompted concerns we could lose control of AI.

The observations made at Facebook are the latest in a long line of similar cases. In each instance, an AI being monitored by humans has diverged from its training in English to develop its own language. The resulting phrases appear to be nonsensical gibberish to humans but contain semantic meaning when interpreted by AI "agents."

In one exchange illustrated by the company, the two negotiating bots, named Bob and Alice, used their own language to complete their exchange. Bob started by saying "I can i i everything else," to which Alice responded "balls have zero to me to me to me…" The rest of the conversation was formed from variations of these sentences.

While it appears to be nonsense, the repetition of phrases like "i" and "to me" reflect how the AI operates. The researchers believe it shows the two bots working out how many of each item they should take. Bob's later statements, such as "i i can i i i everything else," indicate how it was using language to offer more items to Alice. When interpreted like this, the phrases appear more logical than comparable English phrases like "I'll have three and you have everything else."
The reasoning behind the "fear of losing control" is wanting to make sure the process undertaken by the AI can be understood by humans, if necessary. But the reason we develop and use AI is specifically to do things that humans can't, or that it would take humans a long time to do manually. While I can understand the need for monitoring, even if the AI bots were speaking English, it would probably still take a while to dig in and find out what they're doing. The concept of AI really seems to be about the need to put difficult processes into a black box.

So, alas, Bob and Alice (and all of their friends) are now offline.

Wednesday, July 26, 2017

Statistical Sins: Errors of Omission

For today's Statistical Sins post, I'm doing something a bit differently. The source I'm discussing doesn't necessarily commit any statistical sins, but commits sins in how he reports research and the bias he shows in what research he presents (and what he doesn't).

The source is a blog post for Psychology Today, written by Dr. Lee Jussim, a social psychologist. His post, Why Brilliant Girls Tend to Favor Non-STEM Careers, discusses the gender disparity in the STEM fields and argues that there is no bias against women in these fields.

This is a subject I feel very strongly about, so I recognize I'm probably predisposed to take isue with this post. I'm going to try to be objective, even if the article is written in a "Cash me outside, howbow da?" kind of way. Which is why it's a bit hilarious that his guidelines for commenting include not "painting groups ith a broad brush" (which he does when he quips that 95% of people are a vulnerable group by social psychology definitions) and "keep[ing] your tone civil."

He begins by accusing researchers who are selling the discrimination narrative of cherry-picking studies that support their argument. So, to argue that differences in enrollment in STEM programs and employment in STEM fields have non-discriminatory causes... he cherry-picks studies that support his argument.

He gives a little attention to one study that refutes his argument, but sets it up as a straw man due to its smaller sample size compared to the studies he cites that support his argument. In fact, he gives a one sentence summary of that study's findings and little to no detail on its methods.

I could probably do a point-by-point analysis and rebuttal of his post. But for the sake of brevity, I will confine my response to three points.

First, he writes an entire post about gender disparities in STEM without once citing stereotype threat research, which not only could explain the gap, it could explain his key argument against bias - differences in interest. He doesn't even try setting stereotype threat research up as a straw man, though he likely could - new efforts in replicating past research (and in many cases, being unable to replicate said research) means that most key social psychology findings could be fair game for criticism.

As a reminder, stereotype threat occurs when a member of a stereotyped group (e.g., women) encounters a task about which their group is stereotyped (e.g., a math test) and underperforms as a result. Stereotype threat leads members of the group to disengage and deidentify from the task area. Hence, women exeriencing stereotype threat when encountering math will decide that they're just not a math person and they're better suited for other fields. I did my masters thesis on stereotype threat, which is probably the strongest version of mesearch I've conducted.

He doesn't even say the words stereotype threat once in the article. And he's a stereotype researcher! I highly doubt he is unaware of this area of research. But because it doesn't support his argument, it essentially doesn't exist in the narrative he's selling. As he says himself in the post:
What kind of "science" are we, that so many "scientists" can get away with so systematically ignoring relevant data in our scientific journals?
Right? Seriously.

Second, his sole argument for the disparity in STEM fields is that men and women have different interests. That is, women are more drawn to non-STEM fields. But, as mentioned above, stereotype threat research could explain this observation. Women deidentify from math (and also the remaining STE fields) when encountering the frustration of stereotype threat. So, by the time we get to them in studies about subject matter interests, the stereotypes about gender have already been learned and in many cases, ingrained. Research on implicit bias shows that nearly everyone is aware of racial stereotypes, even if they don't believe in their accuracy themselves; the same is likely true for gender stereotypes.

So, okay, if women just aren't interested in the STEM fields, the question we should be asking is "Why?" What is causing this difference? Is it something unimportant, as Jussim believes? Or is it something as insidious as stereotypes? And would different practices with regard to teaching and career counseling help open women's options up to include careers in STEM?

Believe me, I'm not second wave feminist enough to believe that we should ignore choice and argue that women are brainwashed anytime they do something stereotype-compliant (yeah, I have a women's studies concentration, so knowledge of the different waves of feminism can creep out everyone once in a while), but I'm also concerned when women aren't given all the facts they need to make an informed choice. Now, to be fair, Jussim does offer one reason for the differences in interest, though there's issues there as well. Which brings me to...

Third, he conflates STEM with "working with things" and non-STEM with "working with people," and says that women are actually superior in performance in both verbal and quantitative skills, leading them to prefer jobs that use verbal skills.

There's a lot of wrong in that whole framework, so let me try to untangle it. Let's start with the verbal + quantitative = verbal career narrative. By his logic, being strong in both means women could go into a verbal-area career or a quantitative-area career. But why doesn't he discuss verbal + quantitative careers? There are many, including the career he is in as a social psychology researcher. In fact, there are few quantitative careers I can think of that don't require you to at least try to string two or more words together. Being good at both allows you to succeed even more in STEM jobs. In fact, as I think of some of the best-known scientists of today, they are all people who communicate well, and have published many books and/or articles. If I asked you to name STEM folks known by the public at large, you'd probably list some of the following people: Neil DeGrasse Tyson, Brian Greene, Stephen Hawking, Carl Sagan... All great writers. All fluent in STEM. All men.

True, I could be forgetting some of the big names of women in STEM, to sell my narrative. But these are the people I thought of off the top of my head. I'm struggling to think of women who are well known at large, though I can think of a fair number in my own field. Obviously, I remember my childhood hero, Marie Curie, the first woman to win a Nobel prize and the first person to win two in different fields. But if we could get more girls to be interested in STEM, show them that they have options to use their verbal skills as well, and also show them that they have role models of women in STEM (who preferably don't have to poison themselves to attain greatness - sorry, Marie), maybe - maybe - more will show an interest.

We do nothing by miscontruing what these jobs involve, such as the nonsense "working with things" versus "working with people" dichotomy. True, some jobs may be more thing-heavy or people-heavy, but there's never total isolation of the other side. And people who work in the STEM fields - especially people who want to be successful in STEM - absolutely work with people. And talk to people. And share findings with people. And even study people, so that people in a sense become their "things." As a social psychologist, Jussim should know better.

I'll be honest, I'm having a hard time understanding Jussim's need to so vehemently say there's nothing there. Either he is wrong, and if we stop believing there's something there, we stop doing anything to help it. Or I am wrong, and we've maybe wasted some research dollars and people's time. So what? And I swear to God, people, if I hear the "but reverse discrimination" argument against my previous statement, I'm going to scream.

In conclusion, Jussim's sins are omission and some logical fallacies. My sins are probably the same. Hopefully between these two posts, there's some balance.

Tuesday, July 25, 2017

New Gig

Big changes in my little part of the world! I didn't really discuss here on the blog, but I stopped working as a Psychometrician for Houghton Mifflin Harcourt on June 1. New leadership and changing market directions led to many positions being eliminated, including my own, and I've taken the last 1.5+ months to do some professional development and writing, along with looking for a new job.

Today I started a new job as Manager of Exam Development for the Dental Assisting National Board. I'll be doing a lot of the same things I was doing as HMH: psychometric and statistical analysis. But here, I'll have the opportunity to be involved in developing exams for dental assistants from start to finish - beginning with the literature reviews and expert panels all the way up to creating and managing the item pool. It's more similar to the work I did as a Researcher with the Department of Veterans Affairs, and I'm excited to learn some new things! For instance, I'll get to learn Facets, a Rasch approach that allows you to examine two rating scales used simultaneously - such as when you have items for which you rate both frequency (Daily, Weekly, etc.) and importance (Very Important, Somewhat Important, etc.). It's very similar to the Rasch work I know and love.

I'm sure I'll have more to say about the new job and my ongoing professional development as time goes on. And be sure to check in tomorrow for my regularly scheduled Statistical Sins post.

Monday, July 24, 2017

Dog is Love

It's no secret that dogs are man's best friend. And it's really no secret that they were bred for this purpose. The emergence of what we know as dogs happened through a process of wolf domestication, influenced by humans. What it comes down to is selective breeding - if the most friendly wolves (the ones most friendly to humans) start hanging around humans, and away from the rest of the pack, they're likely to only breed with each other and not the larger pack; keep repeating this process, involving the most friendly wolves and you end up with animals bred to be friendly with humans. In fact, dogs and humans evolved alongside each other, further strengthening the behavioral bond.

Back in 2015, there was a breakthrough in understanding how dogs emerged from wolves, when researchers from the Chinese Academy of Sciences (led by Ya-Ping Zhang) and Peter Savolainen of the KTH-Royal Institute of Technology in Sweden sequenced the genome of 12 gray wolves, 27 dogs from breed indigenous to Asia and Africa (where dogs are suspected to have first emerged from wolves), and 19 other breeds from around the world. The dogs of Asia were the closest genetic match to wolves, providing evidence that this is where dogs as a species first emerged. They also found that, though humans began breeding dogs about 33,000 years ago, they didn't really spread outside of Asia until 15,000 years ago.

Just a few days ago, the world learned of another breakthrough in understanding the dog genome, this time explaining why it is that dogs are so sociable. It apparently relates to a gene that has been linked to Williams-Beuren syndrome in humans - a disorder characterized by hyper-sociability:
A new study shows that some of the same genes linked to the behavior of extremely social people can also make dogs friendlier. The result, published July 19 in Science Advances, suggests that dogs’ domestication may be the result of just a few genetic changes rather than hundreds or thousands of them.

“It is great to see initial genetic evidence supporting the self-domestication hypothesis or ‘survival of the friendliest,’” says evolutionary anthropologist Brian Hare of Duke University, who studies how dogs think and learn. “This is another piece of the puzzle suggesting that humans did not create dogs intentionally, but instead wolves that were friendliest toward humans were at an evolutionary advantage as our two species began to interact.”

In the new study, [Bridgett] vonHoldt and colleagues compared the sociability of domestic dogs with that of wolves raised by humans. Dogs typically spent more time than wolves staring at and interacting with a human stranger nearby, showing the dogs were more social than the wolves. Analyzing the genetic blueprint of those dogs and wolves, along with DNA data of other wolves and dogs, showed variations in three genes associated with the social behaviors directed at humans: WBSCR17, GTF2I and GTF2IRD1. All three are tied to Williams-Beuren syndrome in humans.
The study is open access, so you can read the full-text here.

Sunday, July 23, 2017

Statistics Sunday: Statistics Reading Round-Up

I'm working on some future Statistics Sunday posts, so in the meantime, I thought I'd offer you some of the statistics books on my reading list these days.

Recently, I finished reading:

  • How to Lie with Statistics by Darrell Huff - a quick reading at 144 pages. Lots of good information, especially for people with little statistics knowledge, because it will help you be a better consumer of research information you encounter in the media. He doesn't really go into how probability can influence sampling, and focuses on bias from the original researchers rather than secondary sources sharing the research. But you'll still learn a lot and you can knock this book out in a couple sittings.
  • Statistics Gone Wrong: The Woefully Complete Guide by Alex Reinhart - this book grew out a project Reinhart did as an undergraduate, which he started before he had any statistics training whatsoever. He's now a PhD candidate. His statistics knowledge is a bit thin in some places, and he switches back and forth on a few issues, but his understanding of probability is excellent. I learned a lot from him.
  • Fooled by Randomness by Nassim Nicholas Taleb - this is one of the books from Taleb's Incerto series. I had two of the books on my reading list - this one and The Black Swan - so I went ahead and picked up the full set, since it was only a little more than $40. Taleb is a former trader, and talks about some of the cognitive biases that cause us to see systematic explanations for random events, using the financial industry to demonstrate key probability concepts. His writing is very readable - you feel like he's sitting across from you chatting.
I'm currently reading The Seven Pillars of Statistical Wisdom by Stephen M. Stigler. In it, Stigler breaks down the seven core concepts that influenced statistical thinking. Though they're basic concepts now, they were radical propositions at some time. For instance, the first pillar is aggregation - summarizing the data with a single number. We do this all the time now with descriptive statistics like the mean, but when it was first proposed, mathematicians saw it as throwing away data or worse, trusting bad data along with good data (good and bad of course being subjective concepts).

Once I'm finished with that, here's what's on deck:
Happy reading, everyone! Let me know if you decide to read one of these books - I'd love to hear your thoughts!

Saturday, July 22, 2017

A Long Time Ago in a Journal Far Away

Predatory journals have been around for a while, but thanks to the new availability of open access options online, they're becoming a lot harder to spot. They were once known as vanity journals - you basically pay to have your article published. With new open access options, many journals - predatory and non-predatory - routinely charge fees to offset the cost of publishing. But with predatory journals, you'll start to notice other added costs, such as fees for the review process and even sometimes a slight hint that if you pay more at this stage, your paper is more likely to get a favorable review.

Needless to say predatory journals are a huge problem. While publication bias - the tendency to only publish studies with significant results - hurts the field, a journal that doesn't even bother going through peer review can also hurt the field, by allowing garbage research to proliferate.

So one researcher decided to brilliantly strike back. I mean, we all know the odds of successfully navigating the research field are... you know what, never tell me the odds. There's no need to fear - fear is the path to the Dark Side. Do or do not, there is no try. And the force is strong with this one.

That's right - this researcher wrote a Star Wars-themed research paper about midichloria, filled with plagiarized material from Wikipedia and copied and pasted movie quotes, and it's bloody brilliant:


The paper references Force sensitivity and name drops Star Wars characters, including the "Kyloren cycle" and "midichloria DNA (mtDNRey)" and "ReyTP." At one point in the article, it switches rather abruptly to the monologue about the Tragedy of Darth Plagueis the Wise:


And here's how the article fared:
Four journals fell for the sting. The American Journal of Medical and Biological Research (SciEP) accepted the paper, but asked for a $360 fee, which I didn’t pay. Amazingly, three other journals not only accepted but actually published the spoof. Here’s the paper from the International Journal of Molecular Biology: Open Access (MedCrave), Austin Journal of Pharmacology and Therapeutics (Austin) and American Research Journal of Biosciences (ARJ) I hadn’t expected this, as all those journals charge publication fees, but I never paid them a penny.

Credit where credit’s due, a number of journals rejected the paper: Journal of Translational Science (OAText); Advances in Medicine (Hindawi); Biochemistry & Physiology: Open Access (OMICS).

Two journals requested me to revise and resubmit the manuscript. At JSM Biochemistry and Molecular Biology (JSciMedCentral) both of the two peer reviewers spotted and seemingly enjoyed the Star Wars spoof, with one commenting that “The authors have neglected to add the following references: Lucas et al., 1977, Palpatine et al., 1980, and Calrissian et al., 1983”. Despite this, the journal asked me to revise and resubmit.

At the Journal of Molecular Biology and Techniques (Elyns Group), the two peer reviewers didn’t seem to get the joke, but recommended some changes such as reverting “midichlorians” back to “mitochondria.”

Finally, I should note that as a bonus, “Dr Lucas McGeorge” was sent an unsolicited invitation to serve on the editorial board of this journal.

All of the nine publishers I stung are known to send spam to academics, urging them to submit papers to their journals. I’ve personally been spammed by almost all of them. All I did, as Lucas McGeorge, was test the quality of the products being advertised.

Friday, July 21, 2017

Heart-Breaking Stories from Mosul

This morning, I listened to The Daily, a podcast by the New York Times, which featured a story from Rukmini Callimachi out of Mosul, Iraq. After Mosul was liberated, Callimachi and Andrew Mills, a producer, interviewed a young woman named Souhayla about the 3 years she spent in captivity by the Islamic State. She shares what happened to her as well as to other women also held captive. The story will absolutely break your heart. I encourage you to listen.

Souhayla, photographed by Alex Potter for the New York Times

Thursday, July 20, 2017

What I'm Reading Today

Today is being spent writing, but I have a few tabs open to read at some point today:
  • Nate Silver writes about what the world would be like if Hillary Clinton had won
  • The Atlantic reports that only about 9% of plastic is recycled; 12% has been incinerated and the remaining 79% is in either in landfills or the natural environment (such as in an interesting mix with sand and rock some call "plastiglomerate")
  • Via Science Daily, a study finds that people can identify a manipulated image slightly better than chance - about 60% of the time - and can only say what's wrong with the image about 45% of the time
  • Also via Science Daily, a study on the effect of sleep deprivation on personality and mental health
  • And finally, an interesting book I learned about while reading A Concise History of Mathematics by Dirk J. Struik: mathematician Leonhard Euler wrote a theory of music considered too mathy for musicians and too musical for mathematicians; unfortunately for me, it's written in Latin

Wednesday, July 19, 2017

The Voices Were Telling Me to Be Skeptical

Today, the Guardian released an interesting podcast on the experience of hearing voices - specifically, experiencing auditory verbal hallucinations (AVP), a voice speaking to or about the person that others cannot hear. While AVP is known to be a symptom of schizophrenia, it can occur in many other disorders and situations.

In the writeup about the podcast, they cite survey results that up to 10% of the population reporting hearing voices no one else can hear. My initial inclination was that this information couldn't possibly be correct - auditory hallucinations have always been framed in my (relatively minimal) clinical coursework as a sign of serious psychosis, the term psychologists use to describe disorders in which the sufferer loses touch with reality.

But as I thought about it, it did make sense. The human brain is built to pay special attention to the human voice; after all, humans are a social species and being able to focus attention when others are speaking probably has a special evolutionary significance. In fact, this experience is the reason why Deke Sharon and Dylan Bell, authors of A Cappella Arranging, recommend that a cappella pieces should be shorter than other music (more like 2 to 4 minutes): because we tune in more strongly to the voice, even when it's trying to imitate instruments, a cappella music is more cognitively taxing. So it makes sense that we would tune in to anything that sounds like a voice, even if that voice is inside our head.

And as they share in the podcast, many of us (about 33%) have experienced the sensation of hearing voices when in very light stages of sleep, during which time we can experience dream-like images. I've certainly had the experience of falling asleep and being woken up after being startled by a dream-like sound or image.

The podcast discusses a lot of the new research on this topic, including insights from neuroscience and a new treatment known as Avatar Therapy.

Statistical Sins: Misleading Graphs

As you're probably aware, the Senate healthcare bill does not have the support it needs to pass. Reading stories about the bill, and the history of the Republican effort to repeal the Affordable Care Act, I'm reminded of this great (terrible) graph that was actually presented in Congressional testimony back in 2015 - a graph showing that Planned Parenthood has increased abortion services and decreased preventive services. But because preventive services still far outpaces abortions, they altered the chart:


In fact, when I decided to start the Statistical Sins series, this graph was on my mind. The chart should look more like this:


So what initially looked like a dramatic increase in abortion services is actually a nearly flat line.

The thing is, misleading graph-making happens a lot. It's not unusual for people to change the minimum value on the y-axis to zoom in on a very small trend to make it look more dramatic. In a recent book I read, Fooled by Randomness, author Nassim Nicholas Taleb shows graphs of market trends that have been cropped to make small blips look like major leaps and falls in stock prices.

The problem is that certain people will purposefully edit graphs to prove some point, so reminding people about proper graph-making isn't going to help with people who are intentionally misleading us. But there are certainly things people can do to help with unintentionally misleading graphs.

First, you should know something about the type of data you're working with. When you're working with ratio values (variables that have a meaningful 0 - that is, 0 means the absence of something), you should use 0 as the minimum value on the y-axis (the vertical axis on the side). So things like money, healthcare services, and so on should have 0 as the minimum value.

If you're working with interval data (variables that are continuous but do not have a meaningful 0), like temperature, you should choose minimum and maximum values that make sense with the scale. Each scale will be different, so there will be some judgment calls. When in doubt, talk to a stats person.

It should go without saying that placement of points on the chart should reflect the actual value of that data point, and you should always have standard scales on the axes. I've seen unbelievably wrong graphs on news programs, and chatting with people in the industry, I've learned that often, they use stock graph images and just change the values. They're not purposefully being misleading; they just don't even think about the fact that graphs are supposed to reflect real numbers. I can't even begin to say how wrong that is. You can create simple graphs in Excel. Don't reuse graphs. Ever.

And don't label points if you didn't actually measure or include them. If you look at the first chart above, you'll see each year from 2006 to 2013 marked, with a very clear linear trend on both variables (abortion services and preventive services). But I highly doubt the data were that clean. More likely, the chart only includes 2006 and 2013 data, with a line connecting them. This too is misleading, not just because the intermediate values are false, but because it removes the variance. You can't see how much these values bounce around from year to year. So an increase of 37,250 instances of abortion services might be meaningful (if these values don't usually vary) or might be just normal variation. The same goes with the drop in preventive services. It appears Vox came to the same conclusion, because they only label 2006 and 2013 on the x-axis.

I'm thinking it might not be a bad idea to write a statistics post on visualizing data. Look for that some Sunday!

Monday, July 17, 2017

A Morning of Strong Women

I'd heard about Kesha dropping a new album and have been a few stories floating around the internet, telling women that we need to support Kesha (and others like her). As you might remember, Kesha was in the news not long ago because of ongoing legal battles with Sony Records and allegations against her producer, Dr. Luke. In fact, there was an outpouring of support from other artists to simply let Kesha out of her contract or allow her to work with a different producer - something many other artists have been allowed to do without having to take it to the court.

I don't pretend to know what actually happened. The fact that other artists who have worked with Dr. Luke have had negative things to say about him makes me think there is probably truth to her allegations. In any case, I finally listened to the first single she dropped, "Praying." And OHMYGOD is it good:


Earlier this morning, I listened to her second single, "Woman," and it's already stuck in my head:


Speaking of "I'm a motherf***in' woman," you hopefully heard about the 13th Doctor, who will be played by Jodie Whittaker. I'm only a little familiar with her work - she was on a very memorable episode of a rather disturbing (but amazing) show called Black Mirror. But I can't wait to see what she brings to the role. And for all the trolls you encounter, complaining about casting, I've created this response:

Sunday, July 16, 2017

Statistics Sunday: No Really, What's Bayes' Got to Do With It?

Note: This is the second of a two-part post. Check out part 1 here.

Yesterday, I discussed what Bayes' theorem can tell us about our actual Type I error rate. Today, I'll demonstrate how to apply Bayes' theorem to research findings.

As a reminder, Bayes' theorem looks like this:


But since we want to use it to generate false positive rates, we need to rethink the equation just a bit. We want to test the probability a result is not true given it is significant - P(Tc|S). So we're essentially testing for Not A instead of A. This means that for any A in the original equation, we need to sub in Ac, and any Ac becomes A.

After that, most of the values we can pretty easily determine. P(S|T) is equal to power, for which the convention is 0.8. Occasionally, people may power their study to 0.9. We'll try both but focus mainly on the 0.8 results.

P(S|Tc) is equal to alpha. Convention is 0.05, but you might also see 0.1 or 0.01. We'll test all three but focus mainly on the 0.05 results.

The unknowns are P(T) and P(Tc). These refer to the probability of whether a hypothesis is true or not, separate from research findings. We might be inclined to use 0.5, since there are two options, True or Not True, but it's likely smaller than that. It's more likely that a broad hypothesis is true in a small set of circumstances, and of course, there could be a better explanation (a better hypothesis) out there somewhere we just haven't found yet. When a value you need isn't known, the best thing to do is test a range of values to see how that affects the results - let's do increments of 0.05, from 0.05 to 0.95.

ptrue<-c(.05,.10,.15,.20,.25,.30,.35,.40,.45,.50,.55,.60,.65,.70,
.75,.80,.85,.90,.95)
pnottrue<-1-ptrue

Cool. Now let's set up our P(S|T) and P(S|Tc) values:

psig_given_true1<-0.80
psig_given_true2<-0.90
psig_given_nottrue1<-0.05
psig_given_nottrue2<-0.01
psig_given_nottrue3<-0.10

So let's run Bayes' theorem and find out what our false positive rates are at the conventional alpha value of 0.05, and powers of 0.80 and 0.90.

bayes11<-(psig_given_nottrue1*pnottrue)/((psig_given_nottrue1*
pnottrue)+(psig_given_true1*ptrue))
bayes12<-(psig_given_nottrue1*pnottrue)/((psig_given_nottrue1*
pnottrue)+(psig_given_true2*ptrue))

Let's plot those results:

plot(bayes11, type="o", col="red", xlab="P(T)", 
ylab="False Positive Rate", main="False Positive 
Rates for Alpha of 0.05", labels=FALSE, ylim=c(0,1))
lines(bayes12, type="o", col="blue")
axis(1, at=c(1:19), labels=ptrue)
axis(2, at=c(0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1.0))
legend(12,.8, c("Power=0.9","Power=0.8"),lty=c(1,1), 
lwd=c(2.5,2.5), col=c("blue","red"))


For conventional power of 0.80 and conventional alpha of 0.05, you can see false positive rates higher than 50% when the probability of a true result is low. Results are only slightly better for power of 0.90. And if I started testing different values of power - using what people often have as their actual power, rather than the convention, the false positive rates will likely skyrocket. As I mentioned in Wednesday's Statistical Sins post, low power can lead to erroneous significant results just as easily as erroneous non-significant results. When you play the game of probability, your study either wins or it dies.

I sincerely apologize for that bastardized Game of Thrones quote.

So just for fun, let's also compute our false positive rates for two different alphas - 0.01, which is often used in studies where false positive rates can be very very bad, such as in drug trials; and 0.10, which is not usually used as an alpha, but is often used as an indicator of "marginally" significant results. First, alpha = 0.01:

bayes21<-(psig_given_nottrue2*pnottrue)/((psig_given_nottrue2*
pnottrue)+(psig_given_true1*ptrue))
bayes22<-(psig_given_nottrue2*pnottrue)/((psig_given_nottrue2*
pnottrue)+(psig_given_true2*ptrue))

Let's plot those results:

plot(bayes21, type="o", col="red", xlab="P(T)", 
ylab="False Positive Rate", main="False Positive 
Rates for Alpha of 0.01", labels=FALSE, ylim=c(0,1))
lines(bayes22, type="o", col="blue")
axis(1, at=c(1:19), labels=ptrue)
axis(2, at=c(0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1.0))
legend(12,.8, c("Power=0.9","Power=0.8"),lty=c(1,1),
lwd=c(2.5,2.5), col=c("blue","red"))


And finally, alpha of 0.10:

bayes31<-(psig_given_nottrue3*pnottrue)/((psig_given_nottrue3*
pnottrue)+(psig_given_true1*ptrue))
bayes32<-(psig_given_nottrue3*pnottrue)/((psig_given_nottrue3*
pnottrue)+(psig_given_true2*ptrue))

Let's plot those results:

plot(bayes31, type="o", col="red", xlab="P(T)", 
ylab="False Positive Rate", main="False Positive 
Rates for Alpha of 0.10", labels=FALSE, ylim=c(0,1))
lines(bayes32, type="o", col="blue")
axis(1, at=c(1:19), labels=ptrue)
axis(2, at=c(0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1.0))
legend(12,.8, c("Power=0.9","Power=0.8"),lty=c(1,1), 
lwd=c(2.5,2.5), col=c("blue","red"))


There you have it. Thanks to Bayes' theorem, we know that our actual Type I error rate can be much higher than 0.05, or whatever value we use for alpha.

Saturday, July 15, 2017

Statistics Sunday Prelude: What's Bayes' Got to Do With It?

Note: This is the first of a two-part post. Check out part 2 here.

As I mentioned in Wednesday's Statistical Sins post, I'm working on a post about how Bayes' theorem can demonstrate that Type I error is actually much higher than alpha. We know that increasing the number of tests or comparisons we make in the data biases us to commit the Type I error. But Bayes' can show us that even if we're conducting only one test, the probability of a Type I error can be high - really high.

In fact, the post is already written in the form of copious notes in my favorite notebook (which I'm using for writing toward my planned statistics book):


and an R markdown file. That's right - I have code and figures. But as I started looking over everything I've written, I realized the post will be pretty long. So today, I'm writing an explanation to set up for tomorrow's demonstration.

As a reminder, the concept of alpha (and beta for that matter) is key to a statistical approach called Null Hypothesis Significance Testing (NHST). When we set our alpha at 0.05, we're accepting that there is a 5% chance that we will conclude there is a real effect in our data when actually the null hypothesis, which says there is no effect, is true. This is Type I error. We conduct power analyses to maximize the chance that we will find a significant effect - usually we set power to 0.80, so 80% chance that, if there's an effect to find, we'll find it. But that means there's a 20% chance that we'll fail to reject the null hypothesis when it is actually false. This is Type II error.

We don't always know if we've committed a Type I or Type II error. We don't have a gameshow host to buzz us if we got the wrong answer. We just have to keep studying something in different ways and over time, we can build up results to determine once and for all which is true: the null hypothesis or the alternative hypothesis. After all, if we conduct all of our studies with an alpha of 0.05, we'll know that a body of literature is wrong if only 5% of the studies find significant results, right?

Now's the time I pull the rug out from under you. Because the Type I error rate can be much higher than 0.05.

Why? Because Bayes' theorem.

Type I error rate is the probability that something is not true given it is significant (in probability terms: P(Tc|S), where T = true, Tc = not true, and S = significant) - a false positive. This is different from alpha, which is the probability that something is significant given it is not true, or P(S|Tc).

(Yes, I know I'm contradicting a lot of statistical teaching, because everyone always says Type I error and alpha are the same thing. I said it too. I've since changed my mind, and I'm arguing that they're related but not quite the same thing. After all, NHST doesn't care about conditional probabilities in the same ways Bayesian approaches do. I accept that I could be completely off-base with using the terms in these ways, but I think my understanding of the conditional probabilities involved is sound.)

Beta, on the other hand, is the probability that something is not significant given it is true, or P(Sc|T), where Sc = not significant.

When you plug these values into Bayes' theorem, you'll find Type I error can skyrocket in the right conditions.

What are those conditions? Check back tomorrow to find out!

Thursday, July 13, 2017

The Continuing Saga of the Independent Bookstore

This morning, I listened to a great podcast by Annotated that discusses independent bookstores. Despite the closing of many small and large bookstores after the introduction of Amazon and e-books, independent bookstores are apparently making a resurgence. Not only that, they're working together and sharing best practices. Since the nature of an independent bookstore is to sell books within a neighborhood, shops in different areas have no need to feel competitive with each other, making for a more collaborative dynamic.

Guest Oren Teicher of the American Booksellers Association shares that it's a great time to get into the independent booksellers market, but cautions against one of my favorite fallacies (which I first encountered in the literature on online resources for patients): the Field of Dreams fallacy - essentially, "If you build it, they will come."

Instead, the hosts of the podcast, Jeff and Rebecca, discuss the need for community engagement and becoming a fixture of the community. People won't simply show up because a store is open for business, and especially for small shops, which people may fear will go out of business and don't want to get attached to, you need to establish the business as part of people's routines and ensure that you're the place people think to go when they realize they want to purchase a book. Even though as a nation, we are moving back to independent bookstores for some of our book needs, there will still be a lot of variation across individual communities. Simply because independent bookstores as a whole are able to compete in the current market doesn't mean that individual bookstores will be able to stay in business.

Wednesday, July 12, 2017

Statistical Sins: Too Many Tests

Today, I read the eye-catching headline: Just one night of poor sleep can boost Alzheimer’s proteins. Needless to say, I clicked on the article. And then I clicked on the original study. Unfortunately, as interesting as the topic and findings are, there are some serious statistical issues with the study. I'll get to them shortly.

First, to summarize the actual study. Apparently, a protein called amyloid-beta can build up into plaques that lead to brain cell death. The build-up of this plaque has been shown to be an early and necessary step for the development of Alzheimers disease. So if we can prevent this plaque buildup, we can potentially stave off Alzheimers. Previous research has shown a relationship between poor sleep and buildup of this protein. This study found that experimentally-induced poor sleep - specifically, poor sleep during the deep sleep stages - increases amyloid-beta levels after just one night.


An interesting study, but some key issues. First, some kudos - though the study was very small (22 participants), they did do a power analysis. This is unusual, or at least, reporting the results of a power analysis in a journal article is unusual. But they conducted the power analysis to detect what they call a moderate correlation, specifically 0.7. That's actually a huge correlation, translating to almost 50% shared variance between the two variables. Most conventions call a moderate correlation to be between 0.3 and 0.5. The results of this power analysis was that only 20 participants were needed to detect this moderate huge result. So their sample size is good then? It would be, if they hadn't had to drop data from 5 participants. In fact, they planned to stop data collection after 20 but "accidentally" collected a couple more. They built in no padding for dropouts or data problems, hence, that lovely power analysis they conducted didn't actually result in them getting the sample they needed.

They collected tons of data from these 22 participants, which is typical for these types of studies. Rather than having many participants from which you collect a small amount of data, neuroscience studies collect large amounts of data from a small number of participants. This results in 25(!) significant tests, each with an alpha of 0.05. It's not terribly surprising they found significant results. With that much Type I error rate inflation, I'd be surprised if they didn't find something.

They fortunately realize some of the problems with sample size, as can be seen in their rather long weaknesses section. But they use the old "but we found significant results so it must not have been a problem" argument. The thing is, while small sample sizes can result in low power, they can also lead to erroneous significant results because of the weird things probability can do - variance stabilizes as sample size increases, but in small samples, it can be quite volatile. If that high variance shows up in the same pattern you hypothesize, you'll get significant results. But that doesn't make them real results.

Be sure to tune in for Statistics Sunday, where I'll dig into Type I error more fully - and show what Bayes' theorem can teach us about Type I error rate!

Monday, July 10, 2017

Before He Tweets

Thank the whatever from high atop the thing for YouTuber Randy Rainbow, who helps us laugh during these difficult times. The video needs no introduction - just watch:

Sunday, July 9, 2017

Statistics Sunday: Null and Alternative Hypotheses

In my writing about statistics, there is one topic - considered basic - that I haven't covered. This is the issue of null and alternative hypotheses, which are key components in any inferential statistical analysis you conduct. The thing is, I've seen it cause nothing but confusion in both new and experienced researchers, who don't seem to understand the difference between these statistical hypotheses and the research hypotheses you are testing in your study. I've rolled my eyes through doctoral research presentations and wielded my red pen in drafts of grant applications as researchers have informed me of the null and alternative hypotheses (which are implied when you state which statistical analysis you're using) alongside their research hypotheses (which require stating).

Frankly, I've been so frustrated by the lack of understanding that I questioned whether to even address these topics. When I teach, I downplay the significance (pun fully intended) of null and alternative hypotheses. (And in fact, many in the field are trying to move us away from the so-called Null Hypothesis Significance Testing, or NHST, approach, but that's another post for another day.) In any case, I treat this topic as an annoying subject to get through before getting to the fun stuff: analyzing data. Not that I think this is a boring topic, or that I have a problem with boring topics - when you love a field, you have to love the boring stuff too.


Rather, I questioned whether the topic was even necessary. You can conduct statistical analysis without thinking about null and alternative hypotheses - I often do.

I realize now that the topic is important, but it's not really explained why. So statistics professors and textbook authors continue to address the topic without addressing the purpose. Today, I'd like to do both.

First, we need to think about what it means to be a science, or for a line of inquiry to be scientific. Science is about generating knowledge in a specific way - through systematic empirical methods. We don't want just any knowledge. It has to meet very high and specific standards. It has to be testable, and, more importantly, falsifiable. If a hypothesis is wrong, we need to be able to show that. In fact, we set up our studies with specific controls and methods so that if a hypothesis is wrong, it can show us it's wrong.

If, after all that, we find support for a hypothesis, we accept that... for now. But we keep watching that little supported hypothesis out of the corner of our eyes, just in case it shows us its true (or rather, false) colors. See, if we conduct a study to test our research hypothesis, we will use the results of the study to reject (if it's false) or support (if it doesn't appear to be false). We don't prove anything, nor do we call hypotheses true. We're still looking for evidence to falsify it. That is the purpose of science. To study something again and again, not to see if we can prove it true, but to see if we can falsify it. It's as though every time we do a study of a hypothesis that's been supported, we're saying, "Oh yeah? Well, what about if I do this?"

This is the nature of scientific skepticism. There could be evidence out there that shows a hypothesis is false; we just haven't found it yet. Karl Popper addressed this facet of science directly in the black swan problem. You can do study after study to support the hypothesis that all swans are white, but it takes only one case - one black swan - to refute that hypothesis.

Boom
So essential is this concept to science that we build it into our statistical analysis. The specifics of the null and alternative hypotheses vary depending on which statistic you're using, but the basic premise is this:

Null hypothesis: There's nothing going on here. There is no difference or relationship in your data. These are not the droids you're looking for.

Alternative hypothesis: There's something here. Whatever difference or relationship you're looking for exists in your data. These are, in fact, the droids you're looking for. Go you.

(Come to think of it, that Jedi mind trick is the perfect demonstration of a Type II error. But pretend for a moment that they really weren't the droids they were looking for.)

This is your basic reminder that we first look for evidence to falsify before we look for evidence to support your research hypothesis. We then run our statistical analysis and look at the results. If we find something - the difference or relationship we expect - we reject the null, because it doesn't apply in this situation (although because of the possibility of Type I error, we never lose the null completely). And we have support for our alternative hypothesis. If, on the other hand, we don't find a significant difference or relationship, we fail to reject the null. (Yes, that is the exact language you would use. You don't "accept" or "support" the null, because nonsignificant results could simply mean low power.)

You also use the null and alternative hypotheses to state if there is an expected direction of the effect. For example, to go back to the caffeine study example, we expect caffeine will improve test performance (this is our research hypothesis). So we would write our null and alternative hypotheses to demonstrate that direction:

Null: The mean test score of the caffeine group will be less than or equal to the mean test score of the non-caffeine group or MCaffeine ≤ MDecaf

Alternative: The mean test score of the caffeine group will be greater than the mean test score of the non-caffeine group or MCaffeine > MDecaf

Notice how the null and alternative hypotheses are both mutually exclusive and exhaustive (together they cover all possible directions). If we conducted our statistical analysis in this way, we would only support our research hypothesis if the caffeine group had a significantly higher test score. If their test score was lower - even significantly lower - we would still fail to reject the null. (In fact, if we follow this strict, directional hypothesis, finding a significantly lower score when we expected a significantly higher score would simply be considered Type I error.)

If we didn't specify the direction, we would simply state the scores will be equal or unequal:

Null: The mean test score of the caffeine group will be equal to the mean test score of the non-caffeine group or MCaffeine = MDecaf

Alternative: The mean test score of the caffeine group will be different from the mean test score of the non-caffeine group or MCaffeine ≠ MDecaf

These hypotheses are implicit when doing statistical analysis - they're for your benefit, but you wouldn't spend time in your dissertation defense, journal article, or grant application stating the null and alternative. (Maybe if you were writing an article on a new statistical analysis.) Readers who know about statistics will understand they're implied. And readers who don't know about statistics will prefer concrete differences - what you hypothesize will happen in your study, and what specific differences you found and what they mean.

As you continue learning and practicing statistics skills, you may find that you don't really think about the null and alternative hypothesis. And that's okay. In fact, I wrote two posts that tie directly into null and alternative hypotheses without once referencing these concepts. Remember alpha? And p-values? I said in these posts that these refer to probabilities of finding an effect of a certain size by chance alone. Specifically, they refer to probabilities of finding an effect of a certain size if the null hypothesis is actually true - if we could somehow pull back the curtain of the universe and discover once and for all the truth. We can't do that, of course, but we can build that uncertainty into our analysis. That is what is meant by Null Hypothesis Significance Testing.

But, as you saw, I could still describe these topics without even using the phrase "null or alternative hypotheses." As long as you stay a skeptical scientist, who remembers all findings are tentative, pending new evidence, you're doing it right.