Saturday, August 19, 2017

Countdown to the Eclipse

We're just days away from the 2017 total solar eclipse, and I'm writing this from my parents' house in Kansas City. We'll be heading north on Monday to watch the eclipse, since we won't be able to see the totality from here, and we're already equipped with our ISO-compliant eclipse glasses.

Hopefully you, dear reader, have identified where you'll be able to watch the eclipse. And if you're curious about what the eclipse will look like in different locations, Time Magazine has put together this awesome animation: enter a zip code and you'll see animation of what the eclipse will look like there. As an example, here's a GIF of what the eclipse will look like from Goreville, Illinois, which will see a full 2 and a half minutes of totality:

We've also purchased a solar filter for our camera, so we'll be able to get some pictures of the eclipse. Check in Monday for an update!

Thursday, August 17, 2017

Women in the Work Force

Via Bloomberg, the Bureau of Labor Statistics released data showing that the work force participation rate among women has increased by 0.3 percentage points since January, bringing the gap in participation rate between men and women to 13.2 percent.

This is the lowest that gap has been since 1948. However, overall participation in the U.S. is low at 62.9 percent. This is due in part to decreased participation rates among prime-age men:
The declining participation among prime-age male workers has become an area of focus for President Donald Trump’s administration. Trump campaigned on reviving traditionally male-dominated industries such as coal mining and manufacturing that have struggled against greater globalization. Amid record-high job openings, the president has emphasized that Americans need to be open about relocating for work.
You know, like how Trump has relocated for his job, and stopped spending so much time at his penthouse in New York or his resort at Mar-a-Lago.

The reason for the lower participation rate overall, and especially among men, has many potential causes:
Prohibitive childcare costs make parents’ decision to return to work more difficult, and prime-age Americans are feeling the increased burden of caring for an aging population. The opioid epidemic also helps explain why a portion of the workforce is deemed unemployable. And immigration limits imposed by the Trump administration could curb workforce growth in industries such as farming and construction that are dominated by the foreign-born.
The Bloomberg article also highlights some recent work by Thumbtack Inc., which has found increases in women-owned business in traditionally male-dominated professions:
Lucas Puente, chief economist at Thumbtack Inc., sees advances across the industries in which his company matches consumers and professional service workers. While men still make up about 60 percent of the 250,000 active small businesses listing their services on Thumbtack, women are gaining ground more quickly, even among traditionally male-dominated professions. Among the top 10 fastest-growing women-owned businesses on Thumbtack in the past year are plumbers, electricians, and carpenters, according to the company’s survey data.

Wednesday, August 16, 2017

Stats Note: The Third Variable Problem

Correlation does not imply causation. You've probably heard that many times - including from me. When we have a correlation between variable A and variable B, it could be that A caused B, B caused A, or another variable C causes both. A famous example is the correlation between ice cream sales and murder rates. Does ice cream make people commit murder? Does committing  murder make people crave ice cream? Or could it be that warm weather causes both? (Hint: It's that last one.)

The problem is that when people see a correlation between two things, and get confused about causality, they may intervene to change one thing in the hopes of changing the other. But that's not how it works. For a comedic example, this Saturday Morning Breakfast Cereal comic:

The cartoon references the famous Stanford "Marshmallow Study," which examined whether children could delay gratification. If you'd like to learn even more, the principal investigator, Walter Mischel, wrote a book about it.

Statistical Sins: Reinventing the Wheel - Some Open Data Resources

For today's Statistical Sins post, I'm doing things a little differently. Rather than discussing a specific study or piece of media about a study, I'm going to talk about general trend. There's all this great data out there that could be used to answer questions, but I still see study after study collecting primary data.

Secondary data is a great way to save resources, answer questions and test hypotheses with large samples (sometimes even random samples), and practice statistical analysis.

To quickly define terms, primary data is the term used to describe data you collect yourself, then analyze and write about. Secondary data is a general term for data collected by someone else (that is, you weren't involved in that data collection) that you can use for your own purposes. Secondary data could be anything from a correlation matrix in a published journal article to a huge dataset containing responses from a government survey. And just primary data can be qualitative, quantitative, or a little of both, so can secondary data.

We really don't have a good idea of how much data is floating around out there that researchers could use. But here are some good resources that can get you started on exploring what open data (data that is readily accessible online or that can be obtained through an application form) are available:
  • Open Science Framework - I've blogged about this site before, that lets you store your own data (and control how open it is) and access other open data
  • - The federal government's open data site, which not only has federal data, but also links to state, city, and county sites that offer open data as well
  • Global Open Data Index - To find open data from governments around the world
  • Open Data Handbook - This site helps you understand the nature of open data and helps you to make any data you've collected open, but there's also a resources tab that offers some open data sources
  • Project Open Data - Filled with great resources to help you on your open data journey, including some tools to convert data from one form (e.g., JSON files) to an easier-to-use form (e.g., CSV)
  • Open Access Button - Enter in a journal article you're reading, and this site will help you find or request the data
  • GitHub Open Data - Another open data option for some fun datasets, such as this dataset of Scrabble tournament games
And there's also lots of great data out there on social media. Accessing that data often involves interacting with the social media platform's API (application program interface). Here's more information about Twitter's API; Twitter, in general, is a great social media data resource, because most tweets are public. I highly recommend this book if you want to learn more about mining social media data:

Tuesday, August 15, 2017

Every Now and Then: Total Eclipse of the Sun

We're less than a week away from the total solar eclipse that will make its way across the United States from Oregon to South Carolina. It seems that everyone is getting in on the fun. For instance, the most recent XKCD:

Sky and Telescope provides this list of apps to use the day of the eclipse.

Unfortunately, some companies are taking advantage of the eclipse frenzy by selling counterfeit glasses - glasses that fail to comply with the proper standards. Amazon has been issuing refunds to people who purchased glasses that may not meet the proper standards. The American Astronomical Society published this list of reputable vendors.

I plan to watch the eclipse from St. Joseph, MO, which is close to where I grew up in Kansas City, KS. (I even applied for and almost accepted a job in St. Jo back in 2010, but opted to work for the VA instead.)

Monday, August 14, 2017

On Charlottesville and Trump

As you probably already know, a rally calling itself "Unite the Right" convened this weekend in Charlottesville, VA, to protest the removal of a monument to Robert E. Lee. The rally quickly turned violent when a car was driven into an anti-racism protest organized as a response to the Unite the Right rally; 19 were injured and 1 was killed. Two state police officers called to assist with maintaining order also died in a helicopter crash.

Many were calling for the President to respond to the rally.

When the President eventually did respond, he failed to distance himself from these individuals and the organizations they represent, and emphasized that there was violence and hatred on many sides:
We condemn in the strongest possible terms this egregious display of hatred, bigotry and violence, on many sides. On many sides. It's been going on for a long time in our country. Not Donald Trump, not Barack Obama. This has been going on for a long, long time.
As Julia Azari of FiveThirtyEight points out, though Presidential responses to racial violence have always been rather weak, Trump's are even weaker.

I walk by Trump Tower in Chicago every day on my way to work. Here's what I saw in front of the building today:

Sunday, August 13, 2017

Statistics Sunday: How Does the Consumer Price Index Work?

You may have heard news stories about how much consumer prices have risen (or fallen) in the last month, like this recent one. And maybe, like me, you've wondered, "But how do they know?" It's all thanks to the Consumer Price Index, released each month by the Bureau of Labor Statistics. The most recent CPI came out Friday.

The CPI is a great demonstration of sampling and statistical analysis, so for today's Statistics Sunday, we'll delve into the history and process of the CPI.

What is the Consumer Price Index?

The CPI is based on prices of a representative sample (or what the Bureau of Labor Statistics calls a "basket") of goods and services - the things that the typical American will buy. These prices, which are collected in 87 urban areas, from about 23,000 retail and service establishments and 50,000 landlords and tenants, are collected each month, then weighted by total expenditures (how much people typically spend on each) from the Consumer Expenditure Survey. What they get as a result is a measure of inflation: how much the price of this sample of goods and services has changed over time. The CPI can also be used to correct for inflation (when making historical comparisons) and to adjust income (for industries that have wages tied to the CPI through a collective bargaining agreement).

What's In the Basket?

The basket is determined from the results of the Consumer Expenditure Survey - the most recent one was in 2013 and 2014. These data are collected through a combination of interviews (often computer-guided, where an interviewer contacts the interviewee in person or over the phone, and asks a series of questions) and diary studies (in which families track their exact expenditures over a two-week period). The interviews and diaries assess over 200 categories of goods and services, that they organize into 8 broad categories:
  • Food and beverages - things like cereal, meat, coffee, milk, and wine
  • Housing - rent, furniture, and water or sewage charges
  • Apparel - clothing and certain accessories, like jewelry
  • Transportation - cost of a new car, gasoline, tolls, and car insurance
  • Medical care - prescriptions, cost of seeing a physician, or glasses
  • Recreation - television, tickets to movies or concerts, and sports equipment
  • Education and communication - college tuition, phone plans, and postage
  • Other goods and services - a catch-all for things that don't fit elsewhere, like tobacco products or hair cuts

How is this Information Collected?

Believe it or not, the people who collect data for the CPI either call or visit establishments to get the prices. The data are sent to commodity experts at the Bureau, who review the data for accuracy, and may make changes to items in the index through direct changes or statistical analysis. For instance, if an item on the list, like a dozen eggs, changes in some way, such as stores selling eggs in packs of 10 instead, the commodity experts have to determine if they should change the index or conduct analysis to correct for changing quantity. This is a pretty easy comparison to make (10 eggs versus 12 eggs), of course, but when the analysts start dealing with two products that may be very different in features (such comparing two different computers or tuition from different colleges), the analysis to equalize them for the index can become very complex. So not only are items weighted to generate the full index, but statistical analysis can occur throughout data preparation for generating the index.

Data for the three largest metropolitan areas - LA, New York, and Chicago - are collected monthly. Data for other urban areas are every other month, or twice a year.

History of the CPI

The history of the CPI can be traced back to the late 1800s. The Bureau of Labor, which later became the Bureau of Labor Statistics, did its first major study from 1888 to 1891. This study was ordered by Congress to assess tariffs they had introduced to help pay off the debt from the Civil War. They were interested in key industrial sectors: iron and steel, coal, textiles, and glass. This is one of the first examples of applying indexing techniques to economic data.

From then on, the Bureau often did small statistical studies to answer questions for Congress and the President. In 1901 to 1903, they broadened their scope by doing a study of family expenditures, as well as analysis of costs from retailers, and applied the indexing techniques they had developed for industry to retail and living expenses. They published the results in a report called Relative Retail Price of Food, Weighted According to the Average Family Consumption, 1890 to 1902 (base of 1890–1899). Despite seeming quite dull from the title and subject matter, this report was actually quite controversial, because it highlighted a gap in growth in wages versus increase in cost of living - that is, wages had grown more than costs, resulting in increased purchasing power. But it was released during a banking crisis, where many people were laid off and wages were cut, so the Bureau was accused of being politically motivated in their research and conclusions.

As a result of the outcry, and budget concerns, research by the Bureau was halted in 1907, and was very limited in scope when it returned in 1911, assessing fewer items and using mail surveys from retailers rather than visits by Bureau staff.

New leadership in the Bureau and the beginning of World War I rekindled research efforts. They began publishing a retail price index twice a year in 1919. But the Bureau got a major redux thanks to the efforts of FDR's Secretary of Labor, Frances Perkins. She made efforts to modernize the organization and recruit experts in the fields of methodology and statistical analysis. Two major contributors were American economist and statistician Helen Wright and British statistician Margaret Hogg. In fact, Hogg conducted analysis that demonstrated the current weights used for the index were biased, by overstating the importance of food, and understating the importance of other goods and services, in the index. When they also made changes to the sample of prices to include, they had to hire more staff to go out and collect price data.

Other major changes in the history of the CPI included introducing an index specific to "lower-salaried workers in large cities" in the early 1940s, a gradual shift from a constant-goods (where the same basket is always used) to a constant-utility (where goods for the basket are determined by level of utility or satisfaction - that is, new useful goods can be added) framework from the 1940s to 1970s, and a partnership with the U.S. Census Bureau in the late 1970s. The first collective bargaining agreements - in which companies agreed to link workers' wages to the CPI to prevent strikes - occurred in the late 1940s and early 1950s.

Summing It All Up

Not only is the CPI an index of inflation - it represents cultural shifts in how we think about and consume goods and services. The shifting basket over time reflects changes in our day-to-day lives, the birth and/or death of different industries, and the changes in technology.

I'll admit, I wasn't really that interested in the CPI until I learned about the contributions of statisticians over the years. And it's an example of women making strong contributions to economic and statistical thought, so it's a shame that we don't hear more about it. In fact, statistician Dr. Janet Norwood, who joined the Bureau in 1963, and served as commissioner from 1979 to 1991, made some very important changes in her time there. For instance, a representative of the policy arm of the Department of Labor sat in on meetings about research results and press released from the Bureau - until Dr. Norwood stopped this practice to make sure economic information was seen as accurate and nonpartisan.

If you're now as fascinated as me, you can learn more about the CPI and its data here.

Saturday, August 12, 2017

Another Response to the Google Memo

On Wednesday, I wrote my own response to the "Google memo" in which I focused on the (pseudo)science used in the memo. I had such a great time writing that post and in chatting with people after that I'm working on another writing project along those lines. Stay tuned.

But I'm thankful to Holly Brockwell, for focusing on the history of women in tech in her response. Because as she points out, women were there all along:
The viewpoint Damore is espousing is known as biological essentialism. It’s used by people who have been told all their lives that they’re special and brilliant, and in moments of insecurity or arrogance, seek to prove this with junk science. Junk science like “women are biologically unsuited to technical work”, which – despite all his thesaurus-bothering, pseudoscientific linguistic cladding (see, I can do it too) – is the reductive crux of his argument.

Damore clearly thinks he’s schooling the world on biology, but it’s actually history he should have been paying attention to. Because he either doesn’t know or has chosen to forget that women were the originators of programming, and dominated the software field until men rode in and claimed all the glory.
Ada Lovelace, author of the first computer algorithm
The fact is, programming was considered repetitive, unglamorous “women’s work”, like typing and punching cards, until it turned out to be a lucrative and prestigious field. Then, predictably, the achievements of women were wiped from the scoreboard and men like James Damore pretended they were never there.

Marie Hicks, author of Programmed Inequality – How Britain Discarded Women Technologists and Lost Its Edge in Computing, believes the subordination of women in computer science has limited progress for everyone.

“The history of computing shows that again and again women’s achievements were submerged and their potential squandered – at the expense of the industry as a whole,” she explains. “The many technical women who were good at their jobs had the opportunity to train their male replacements once computing began to rise in prestige – and were subsequently pushed out of the field.

“These women and men did the same work, yet the less experienced newcomers to the field were considered computer experts, while the women who trained them were merely expendable workers. This has everything to do with power and cultural expectation, and nothing to do with biological difference.”

It might be comforting for mediocre men to believe that they’re simply born superior. That’s what society’s been telling them all their lives, and no one questions a compliment. But when they try to dress up their insecurities as science, they’d better be ready for women to challenge them on the facts. Because really, sexism is just bad programming, and we’d be happy to teach you how to fix it.
In fact, some of the first women to contribute to statistics did so as human computers, who worked for many hours repeating calculations on mechanical calculators to fill in the tables of critical values and probabilities to accompany statistical tests.

Friday, August 11, 2017

Made for Math

Via NPR, research suggests that we're all born with math abilities, which we can hone as we grown:
As an undergraduate at the University of Arizona, Kristy vanMarle knew she wanted to go to grad school for psychology, but wasn't sure what lab to join. Then, she saw a flyer: Did you know that babies can count?

"I thought, No way. Babies probably can't count, and they certainly don't count the way that we do," she says. But the seed was planted, and vanMarle started down her path of study.

What's been the focus of your most recent research?

Being literate with numbers and math is becoming increasingly important in modern society — perhaps even more important than literacy, which was the focus of a lot of educational initiatives for so many years.

We know now that numeracy at the end of high school is a really strong and important predictor of an individual's economic and occupational success. We also know from many, many different studies — including those conducted by my MU colleague, David Geary — that kids who start school behind their peers in math tend to stay behind. And the gap widens over the course of their schooling.

Our project is trying to get at what early predictors we can uncover that will tell us who might be at risk for being behind their peers when they enter kindergarten. We're taking what we know and going back a couple steps to see if we can identify kids at risk in the hopes of creating some interventions that can catch them up before school entry and put them on a much more positive path.

Your research points out that parents aren't engaging their kids in number-learning nearly enough at home. What should parents be doing?

There are any number of opportunities (no pun intended) to point out numbers to your toddler. When you hand them two crackers, you can place them on the table, count them ("one, two!" "two cookies!") as they watch. That simple interaction reinforces two of the most important rules of counting — one-to-one correspondence (labeling each item exactly once, maybe pointing as you do) and cardinality (in this case, repeating the last number to signify it stands for the total number in the set). Parents can also engage children by asking them to judge the ordinality of numbers: "I have two crackers and you have three! Who has more, you or me?"

Cooking is another common activity where children can get exposed to amounts and the relationships between amounts.

I think everyday situations present parents with lots of opportunities to help children learn the meanings of numbers and the relationships between the numbers.

Great Minds in Statistics: Happy Birthday, Egon Pearson!

Today would have been Egon Pearson's 112th birthday. So happy birthday, Egon Pearson and welcome to the first Great Mind in Statistics post!

So just to be clear:

Not that Egon
Not that Pearson - this is Karl Pearson
Egon Pearson
Egon Pearson was born August 11, 1895, the middle child of Karl Pearson and Maria (Sharpe) Pearson. His father, K. Pearson, was a brilliant statistician who also brought petty to a new level; look for a profile of him later. But young Pearson contributed to classical statistics - and helped originate an approach called null hypothesis significance testing - while avoiding the pettiness of his father, and the ongoing feud between Jerzy Neyman (Egon's frequent collaborator) and Ronald Fisher (who was also a frequent thorn in Karl Pearson's side). Rather, Egon tried to avoid these feuds, though he sometimes got caught up in them - after all, Fisher could be petty too.

Unfortunately, Egon is often forgotten in the annals of statistics history. In fact, his name is either inexorably tied to Neyman's - as in their collaboration together - or left out, and Neyman is discussed alone. Unlike his father, Egon was shy and meticulous, and avoidant of conflict. His collaboration with Neyman began when they met in 1928. And shortly after meeting, Egon proposed a problem to Neyman.

Karl developed the goodness of fit test, which examines whether observed data fit a theoretical distribution, such as the normal distribution (also see here). But up to that point, there were many different approaches to this test and no best practice or standard procedure. Egon posed the question to Neyman: how should one proceed if one test indicates good fit and another poor fit? Which one should be trusted?

Together, they tackled the problem, and even incorporated Fisher's likelihood function, then published the first of their joint papers in which they examined the likelihood associated with goodness of fit.

You'd think building on his dad's work would have made papa proud, but apparently, Egon was so concerned about angering his father by incorporating Fisher's work (like I said, K. Pearson was petty), he and Neyman actually started a new journal, Statistical Research Memoirs, rather than publish in K. Pearson's journal Biometrika. But don't worry; Egon took over Biometrika as editor when his father retired. He also inherited his father's role of Department Head of Applied Statistics at University College London.

He didn't always live in K. Pearson's or Neyman's shadows. He contributed great deal to the statistical concept of robustness - a statistical analysis is robust if you can still use despite departures from assumptions of normality - and even proposed a test for normality based on skewness and kurtosis. His work on the statistics of shell fragmentation was an important contribution to efforts during World War II, and he received a CBE (Commander of the Most Excellent Order of the British Empire) for his service. He presided as President of the Royal Statistical Society from 1955 to 1956, and was elected a Fellow with the society (a high honor) in 1966.

Egon Pearson died on June 12, 1980 in Midhurst, Sussex, England.

Thursday, August 10, 2017

The Great Minds in Statistical Thinking

Starting tomorrow, I'll be writing up profiles of some of the great minds in statistics, who have contributed to today's understanding of statistics and probability. Though I considered making this a weekly post, a) I'm already doing 2 of those and b) this project is going to take a while. So I've decided to post on key dates in statistics history - birthdays of great statistics minds, dates of famous publications, and so on.

And I have a long-term goal for all of this. I came up with this idea while reading The Seven Pillars of Statistical Thinking, which deals with statistics history, and listening to a podcast about building a hypothetical new Mount Rushmore. I started wondering who I would put on a statistics-themed Mount Rushmore. Who are the top 4 minds? Who shaped statistics as we know it today? For that part, I'll need your help, but not just yet.

First, I need to give you the people. Later, I'll have a survey to pick the top 4. Stay tuned and help out by reading along with the profiles!

First up, Egon Pearson. Check back tomorrow to find out more about him!

Wednesday, August 9, 2017

Statistical Sins: Women in Tech (Here It Goes Again)

Two days ago, a Google employee was fired for writing a memo explaining why the gender disparity in tech was nothing to worry about and we should all just go about our business where the men fix the stuff and women fix the people.

Man, this sounds familiar. Could it... oh, hell, this nonsense again. And in fact, Dr. Lee Jussim, social psychology's stereotype accepter, makes an appearance in this memo, for his work stating that stereotypes are created because they're true.

The memo reads like a college student persuasive paper that was written in an energy drink-fueled binge, in the middle of the night when the library was closed and that's his excuse for the fact that his only search strategy was Google and Wikipedia, even though he would have done that anyway. But, hey, he gave us a TL;DR section. Wasn't that nice of him?:
  • Google’s political bias has equated the freedom from offense with psychological safety, but shaming into silence is the antithesis of psychological safety.
  • This silencing has created an ideological echo chamber where some ideas are too sacred to be honestly discussed.
  • The lack of discussion fosters the most extreme and authoritarian elements of this ideology.
    • Extreme: all disparities in representation are due to oppression
    • Authoritarian: we should discriminate to correct for this oppression
  • Differences in distributions of traits between men and women may in part explain why we don't have 50% representation of women in tech and leadership.
  • Discrimination to reach equal representation is unfair, divisive, and bad for business.
Let me explain. No, there is too much. Let me sum up. Dude bro is upset because his dude bro friends in tech can't take all the jobs at Google because of these diversity programs that try to get more women and minorities into tech. Dude bro is mad because these diversity programs include mentorship programs and club meetings that are for women and minorities trying to get into tech and he doesn't like being excluded from things. He thinks sex differences are universal across cultures (they're not), often have clear biological causes (rarely do they have clear anything causes), and are highly heritable (again, they're not). Oh, and he repeatedly conflates personality differences with differences in occupational interests. Yes, personality relates to these things, but moderately at best.

His arguments are convoluted and at times, contradictory. For instance, he claims diversity programs over-emphasize empathy, and that over-dependence on empathy causes us to favor individuals similar to ourselves. Wait, so doesn't that mean empathy-driven programs would lead the male-dominated tech world to favor other men? Hmm. He also says that while we're working to fix the gender disparity in tech, we would never feel the need to fix the gender disparity in prisons, homelessness, and school dropouts. (Oh, come on.)

And he uses overlaid normal curves to make some point, but I'm at a loss to figure out what that point is.

The main scientific (i.e., not Wikipedia) source he cites is an article by Schmitt, Realo, Voracek, and Allik (2008) (full text here), which is a rather impressive large-scale survey of the Big Five personality traits in 55 countries. Okay, I'll bite, but as I said before, the relationship between personality and occupational interests is weak to moderate. They expressed their results - difference between women and men - in Cohen's d, which, as you may recall, are standardized mean differences: the mean of women minus the mean of men divided by the standard deviation. In this case, the result is positive if women have a higher mean and negative if men have a higher mean.

Overall, they found the following Cohen's ds:
  • Neuroticism: 0.40
  • Extraversion: 0.10
  • Openness to Experience: -0.05
  • Agreeableness: 0.15
  • Conscientiousness: 0.12
They don't give standard deviations for each measure, but give an overall mean of 8.99. Let's just round that up to 9 because it doesn't make much difference. They essentially found that average scores for these 5 scales differ by, respectively, approximately 4 points, a little less than 1 point, less than 1 point, slightly more than 1 point, and slightly more than 1 point. Sure, these differences are statistically significant, but are they practically significant? These differences could have been created by different answers to a single question, a question that may have had biased wording. The study is impressive in that every participant was given the exact same measure (though, of course, it was translated to the native language), but that means that a biased question existed in all 55 samples. Had this been a meta-analysis across different cultures, we wouldn't have that methodological concern.

If we only look at the United States data, the results aren't much different (and before you say, "Hey, universal across cultures," keep in mind that participants from the US made up 16% of the sample and the US sample was over twice as large as the next largest sample):
  • Neuroticism: 0.53
  • Extraversion: 0.15
  • Openness to Experience: -0.22
  • Agreeableness: 0.19
  • Conscientiousness: 0.20
The average SD is 8.49, so to make this painfully clear, these 5 scales differ by, respectively, 4.5 points, 1.3 points, 1.9 points, 1.6 points, and 1.7 points. Even the authors themselves state that these differences are weak except for neuroticism, which they call moderate. 

These are probably the most interesting results, mainly because the authors go on to do an analysis in which they average together 4 of the Big 5 traits and run correlations with a measure of general sex differences. Can you tell me what is measured by an average of scores on Neuroticism, Extraversion, Agreeableness, and Conscientiousness? Because I sure can't. This is a major violation of scale unidimensionality (that a scale should only assess one clearly defined construct). Not to mention, the resulting score is meaningless. So the rest of the paper discusses statistical analysis that violates many key assumptions of scales.

So the strongest piece of evidence this Google employee has is a study that found, at best, moderate sex differences in Neuroticism. I'm not sure he should be hanging his women-can't-handle-tech hat on this.

Tuesday, August 8, 2017

Statistics Reading Book Review: The Lady Tasting Tea

A few days ago, I finished reading The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century by David Salsburg.

Since I shared in my statistics reading roundup that I would be reading this book, I thought I'd take a moment to write a quick review.

First, the positives. I learned a lot about the history of statistics, much of which I didn't previously know, and I learned some new names and approaches to look into. I liked that Dr. Salsburg tried to include statistical thinkers from multiple nations, including India and other countries that are often ignored in the more Western-focused discussion of the history of statistics, as well as women who have contributed in different ways. I was thrilled to read more about a person I had no idea was a self-trained statistician until I read about her in The Seven Pillars of Statistical Wisdom: Florence Nightingale. Not only did she use data to try to improve conditions for soldiers, her close family friends, the Davids, named their daughter (Florence Nightingale David) after her. F.N. David went on to make major contributions to statistics as a field.

Now, the not-so-positives. While the book was interesting, it lacked a coherent narrative, and had a tendency to jump around timewise and personwise. By the end, it felt like he was just cramming in names, statistical approaches, and events he couldn't find a place for, and it felt like I was being bombarded with stories that had little in common. Sometimes he would show different things that were happening at once in the same chapter and sometimes in different chapters, showing how other people worked their way into a situation previously discussed. This worked at times, but not always. I'm not saying the book would be better if it were more linear but a little more attention to timelines and organization might help make the story easier to follow, especially if the point of the book isn't really to divide the timeline up or advance some alternative interpretation of statistics as a field.

Also, I admit Dr. Salsburg said from the beginning that he wanted to write a book about the history of statistics for people who weren't statisticians (or even mathematicians), but in many cases he identified a revolutionary concept (e.g., degrees of freedom, maximum likelihood estimation) without describing even the basics of what it was to show why it was so revolutionary. Even though I think this book will mostly be picked up by people who are at least a little statistically savvy, some description of what these different terms mean seems warranted.

Still, if you'd like to learn more about the history of statistics, this book is a good place to start, but you'll need to look elsewhere to get a full sense of it.

Monday, August 7, 2017

The Politicization of Worry

As I've commented before, there is a pattern in the US today for issues that were once non-partisan to become, well, partisan, and new research from Gallup suggests Republicans and Democrats are becoming more divided on issues like immigration, gun control, global warming, and satisfaction with K-12 education.

The gap has narrowed on issues of satisfaction with healthcare, though the reason for the shift is purely partisan: satisfaction is up for Democrats and down for Republicans, making them about equal. And currently Democrats and Republicans feel about the same way about the need for a third political party in our country, and whether smoking in public should be illegal. Further, the gap has not changed for several issues, including having children outside of wedlock (Democrats are more accepting, but now the majority of both parties are accepting of it), whether divorce is acceptable, and whether marijuana should be legalized.

But that's not all that's become political. According to a separate Gallup poll, Americans are worried, and though worry has decreased since the inauguration, it is still higher than pre-election levels for Democrats:

Cats Are A**holes (And Have Been for a While)

Via a Facebook group I belong to, Reviewer 2 Must Be Stopped - this article about the history of cats in academia:
A couple of years ago, Emir Filipović, from the University of Sarajevo, was trawling through the Dubrovnik State Archives when he stumbled upon a medieval Italian manuscript (dated 1445) marked with four very clear paw prints.
It could have been worse: around 1420, one scribe found a page of his hard work ruined by a cat that had urinated on his book. Leaving the rest of the page empty, and adding a picture of a cat (that looks more like a donkey), he wrote the following:

Here is nothing missing, but a cat urinated on this during a certain night. Cursed be the pesty cat that urinated over this book during the night in Deventer and because of it many other cats too. And beware well not to leave open books at night where cats can come.

For us humans though, the most interesting, and pressing, cat research investigates their propensity for spreading mind-controlling parasites. You see, cats carry a parasite called Toxoplasma gondii, which alters the behaviour of animals to make them less afraid of predators (and therefore more likely to be killed, eaten and used as a conduit for further propagation of the parasite).

The slightly eccentric Czech scientist Jaroslav Flegr has made such research his life’s work, and since a “light bulb” moment in the early 1990s he has been investigating the link.

We’ve known for years that infection is a danger during pregnancy and a major threat to people with weakened immunity, however the research of Flegrs and others suggests that infected humans are more likely to be involved in car crashes caused by dangerous driving and are more susceptible to schizophrenia and depression. And all that is to say nothing of the as-yet-unexplained link between cat bites and depression.
It's not all anti-cat. There's a great story about a cat who was named coauthor on a paper in physics, and as a result, got many invitations to speak at conferences and even an academic appointment.

Sunday, August 6, 2017

Statistics Sunday: Introduction to Effect Sizes

I've blogged previously about standard deviation and also why the denominator for sample standard deviation is N-1. Standard deviation tells us the typical spread of individual scores, and we can use the central limit theorem and various tables to determine percentiles associated with different scores. If we want to convert an individual score to a Z-score, we subtract the population mean and divide by the population standard deviation.

But when we want to engage in hypothesis testing of means, say two sample means, we need to use a different metric. Standard deviation tells you how much scores typically vary. But when we are engaging in hypothesis testing of two means, we are interested in how much means typically vary. So if I collected a bunch of samples of a specific size and measured them on the same thing, I want a metric that tells me how much I can expect the means of those different samples to vary. We use a metric based on standard deviation that gives you credit for the size of the sample, because means are going to be more stable, as in closer to the population mean, when sample sizes are bigger.

That metric is called the standard error of the mean, which I've sometimes seen abbreviated as SEM, but that can get really confusing when you start getting into the other abbreviation with use for SEM: structural equation modeling. I usually just see it called SE for standard error.

When we run a t-test, we subtract one mean from the other want to get our mean difference, and then we divide by standard error: how much we expect means of groups to typically vary. The bigger the sample size, the smaller the standard error.

But don't worry: standard deviation is still very important. Because when we run a hypothesis test using standard error, we're testing the hypothesis that these means are more different from each other than they should be by chance alone. But "more different than we would expect by chance alone" is not the same thing as saying a big difference. There is a difference between statistical significance and the size of the effects.

As you can see the standard error can get very small if you have a very large sample size. Even trivial differences between your means can becomes statistically significant simply because the sample size was large. But does that mean difference actually mean something?

That's why we use a measure called effect size, which tells us about the magnitude of the difference. There are many different effect sizes, depending on what statistical analysis you're using, and I'm going to start writing about some of the different effect sizes in these posts. In fact, we've already discussed one: correlation is an effect size, because it tells the strength of the relationship between two variables.

But I wanted to start by introducing the concept, and also introducing one effect size that is really related to this concept of standard error versus standard deviation. And that would be Cohen's d.

Cohen's d tells us by how many standard deviation units two sample means differ - standard deviation, not standard error. Standard error is directly impacted by sample size; standard deviation is not (not directly anyway). Getting two sample means to differ by a certain number of standard errors isn't a difficult task when sample sizes are large - but getting them to differ by a certain number of standard deviation units is quite a feat.

The formula for Cohen's d is simply the mean difference divided by the pooled standard deviation. And to show how much of a feat it is to differ by a measure of standard deviation units, Cohen considered a difference of 0.8 of a standard deviation to be a large effect (also, he called 0.2 small and 0.5 medium). If you found two sample means differed by a full standard deviation, you've found an extremely large effect.

I'll be doing some posts on different effect sizes soon, in part because I decided to spoil myself and bought the second edition of a fantastic book, Effect Sizes for Research. I have copious notes and photocopies from a borrowed copy of the first edition, and this one has expanded to include more multivariate effect sizes. It isn't an expensive book - I simply say spoil and discuss notes & photocopies from the previous one because it came out when I was a broke-ass grad student. But I highly recommend it for anyone who uses statistics regularly; effect sizes are being demanded more and more in research - for a good reason I'll discuss more about later!

Friday, August 4, 2017

The Rise of Craft Beer

Yesterday, CityLab posted a story by Joe Eaton on one of my favorite topics: beer. Part of the story he writes about Missoula, Montana, which has seven local breweries:
Good beer has joined fly-fishing and other forms of outdoor adventure as a cornerstone of the area’s tourism promotions and is increasingly central to the town’s branding as a hip Western recreational outpost. Accordingly, encouraging the growth of the local brewing industry has become a focus of the region’s economic development efforts.

Across the country, it’s the same story: Craft beer is on a tear, and cities as diverse as Bend, Oregon and Grand Rapids, Michigan have become destinations for connoisseurs of local suds. As Curbed’s Patrick Sisson recently detailed, small breweries have been heralded as economic drivers than can breathe life into the boarded-up downtowns of rural America and inject a hipster vibe—and money—into struggling neighborhoods of larger cities.

But despite the fizzy rise of craft beer in America, there are limits on the power of the industry to drive local economies. Just where that line lies is something that Missoula may soon be discovering.
Breweries have the power to bring jobs back to small towns - especially for people who previously worked in manufacturing, and were forced to take lower paying jobs when manufacturing dried up. As the article states, local taprooms also have the power to build a community, a place where people can meet and interact.

But, as the article also points out, there's a lot of competition for business, and other than tourists coming to town to visit the taprooms, there's a limit to how much beer a town can consume. If an operation is very small and doesn't go into bottling and distribution, they depend on people coming to their taproom and not somewhere else. Although there's argument about whether this market is saturated, in some towns, it may be. Chicago can sustain multiple breweries, but maybe not Missoula, a town of 70,000 that, in addition to the seven breweries noted above, is getting two more.

Taprooms are also attacked for taking business away from bars, and strong voices in the community against bringing a taproom to their neighborhood has resulted in some silly compromise policies (including in Missoula): a 3-pint maximum and 8 PM last call.

Far from discouraging any readers who are thinking of setting up their own small brewery, the author discusses that some markets may have reached (or are in danger of reaching) saturation, but there are other parts of the country lacking in small breweries.

BTW, in reference to my previous post about Certified Independent Craft Beer, I was at Sketchbook Brewing in Evanston, IL earlier this week and saw this hanging on the wall:

Wednesday, August 2, 2017

Statistical Sins: Regression to the Mean

As I was looking for something to blog about for today's Statistical Sins post, I happened upon an article about a new study examining bee behaviors and potential genetics links to autism-like disorders. The researchers exposed bees to two social situations, one that should elicit an aggressive response (an outsider bee in the hive) and one that should elicit a parental response (appearance of a queen larva). Though most bees reacted to at least one of the situations, 14% had no reaction at all.

They then went on to examine the genetic profiles of the bees that responded to one of the situations (either "guards" or "nurses") and the bees that didn't respond at all. Specifically, they examined the "mushroom bodies" - a part of an insect's brain involved in integrating sensory information as well as social behavior.

They compiled a list of all genes that expressed differently (which they call DEGs, differently expressed genes) between the responders and the non-responders, and used an analysis, called principal components analysis, that looks for common patterns among variables to identify groups that "hang together" (that is, they appear to have a similar underlying cause). Then they further examined the 50 DEGs that loaded most highly on the analysis (hang most strongly together). Finally, they compared these genes to a list of genes implicated in autism-spectrum disorder in humans and found significant overlap.

I should emphasize that I'm not an entomologist, geneticist, or clinical psychologist with expertise in autism, so this research is outside of my area. But it seemed like a great way to introduce a concept that could have impacted their results: regression to the mean. In fact, interestingly, this statistical concept was first introduced by Francis Galton, who used genetic information - which he did mainly to try to explain a contradiction in a theory from his cousin, who you might have heard of: Charles Darwin.

Darwin's theory of natural selection explains why evolution occurs. Within a species, we have genetic mutations. Some mutations are good and are "selected," meaning they increase probability of survival. These genes are then passed on to their children. Over time, new species emerge when a good trait is selected for again and again, to the point that organisms with this trait are very different from organisms without it. So genetic variation is good. But, Galton observed, if organisms keep changing through genetic variation, how do stable species emerge? Why don't they just keep changing?

So Galton looked at data from a large sample of parents and their children. He averaged the two parents' heights together, then compared those meta-parent heights to the heights of their children. Height, like many variables, is normally distributed, so you can probably imagine what the distribution looked like. He found that short meta-parents tended to have taller children (closer to the mean) and tall meta-parents tended to have shorter children (again, closer to the mean). So there was a trend for parents who were in the extremes to have children closer to the middle of the distribution.

Remember that the mean is the expected value. In a normally distributed variable, it is also the most frequent value (mode) and the value that divides the distribution in half (median). 68% of people will fall within 1 SD, so the majority of people will fall in or around the middle. The probability that a child, regardless of how tall his or her parents are, will fall around the average is high. And people who fall in the extremes are unusual, potentially even a fluke - a fluke that is unlikely to be repeated. There's just not that many of them, so even if they do pass on their extreme height (very tall or very short), that's going to result in a small group of extreme children. There's a lot more average people, who will generally produce average children, but may produce the occasional extreme case. So over time, we see a distribution of height that remains pretty stable.

This principle, which Galton referred to as regression, has been observed in many situations. Basically, this phenomenon occurs when a case is selected because of its extreme value on some variable. But because the mean is much more likely, the next time you measure that variable, the value will be closer to the mean. Extreme values are unlikely, and thus, less likely to occur again. This concept has been used to explain the "Sports Illustrated Effect" - when athletes at the top of their game see a decline in performance after being featured on the cover of Sports Illustrated. Since sports performance is very much driven by probability, it makes sense that people selected because they fall in the extreme of the sports performance distribution will move to a less extreme score.

How does this concept relate here? The researchers showed a tendency to choose cases or variables because they fell in the extremes. They picked the bees as nonresponders for falling in the extremes, then genes to further examine because they showed stronger relationships, and so on. To be fair, they also included less extreme bee cases, and later on, did some analysis of the genes that did not emerge as strong contenders. Still, you have to be careful when selecting a case simply because it is extreme, especially if you plan on doing something to the case (e.g., an intervention) to make it less extreme - like selecting low performers on an achievement test to receive extra training. Chances are, their score would have increased either way.

Tuesday, August 1, 2017

A Reading Guide for Unnamed Sources

Trump and other members of his administration have repeatedly complained about leaks - though past experience suggests Trump might be one of the sources, and others have speculated that the leaks are coming from Jared Kushner and Ivanka Trump. Why would they be responsible for leaks? Because leaks are not just gossip about people in the administration turning on each other; leaks can be a way to test the waters regarding some policy or as a way to kill something before it can even get started. To help you explore these possibilities, Perry Bacon Jr. provided a two-part article about reading articles with unnamed sources:

Trump's Opioid Commission Finally Delivers

In a move that apparently shocked Kathryn Casteel at FiveThirtyEight, Trump's Opioid Commission did what public health experts recommended:
After missing its deadline twice, Trump’s Commission on Combating Drug Addiction and the Opioid Crisis on Monday presented an interim report of policy recommendations for handling the nation’s opioid epidemic. The commission’s preliminary recommendations are largely in line with those of many public health advocates: The report emphasizes treatment over law enforcement; backs the use of medical alternatives to heroin such as methadone; and makes no mention of Trump’s border wall, which the president has often touted as a way to stop the flow of drugs into the country. Perhaps most significantly, the commission called on the president to declare a national emergency under either the Public Service Health Act or Stafford Act. Doing so would give the government the power to respond more aggressively to the crisis, including by modifying requirements for health care programs like Medicaid and Medicare to make it easier for patients to seek treatment for addiction.
The report also emphasizes the need for better data, specifically better sharing of data, to combat "doctor shopping" and over-prescribing.

It's refreshing to read something thoughtful and relevant coming from an administration whose prior comments on the opioid addiction crisis were about border control with Mexico. It will be interesting to see how much they act on - such as whether Trump will agree to the recommendation of the federal government intervening on the price of treatments or whether Jeff Sessions will accept the recommendation of treatment over legal action (considering that he wants to go after anyone using marijuana, even for medical purposes).

Monday, July 31, 2017

How You Can Science Along During the Eclipse

As you may already know, a total solar eclipse will occur on Monday, August 21, following a stripe across the US from Oregon to South Carolina:

And according to Rebecca Boyle at FiveThirtyEight, some cool science is going to happen during the eclipse:
[S]cientists who study eclipses will be buzzing around their equipment to take the measure of the sun, its atmosphere and its interaction with our own atmosphere. An event that could inspire a unique sense of cosmic communion will also answer burning questions about how our star works and how it affects us.

By blocking the sun’s blazing light, the moon unveils the solar corona, a region that scientists still struggle to understand. The sun streams radiation and charged particles through a wind that it constantly blows in all directions, and while we know the solar wind originates in the corona, scientists are not sure exactly how, or why. An eclipse is one of the only times scientists can see the corona itself — and try to understand what it is throwing at us.

With new corona observations, scientists can feed new bits into computer simulations hungry for data on coronal action. This will improve models used for predicting space weather, said Ramon Lopez, a physicist at the University of Texas at Arlington.

Scientists will scramble to study all this as the shadow of the moon races across the country at an average speed of 1,651 mph. Scientists and students at 25 locations across the country will launch more than 50 weather balloons, which will take the temperature of Earth’s atmosphere. From orbit, a slew of spacecraft and the crew aboard the International Space Station will be watching and taking pictures. And scientists will fly in at least three airplanes, including a National Science Foundation jet that will measure the sun in infrared.
And if you're planning to watch the eclipse, whether in the path of totality or not, you can get involved with some of the science that will be going on during and because of the eclipse:
With an app called GLOBE Observer and a thermometer, you can collect data during the eclipse and submit it to NASA. And Google and the University of California, Berkeley, are asking for video and images, which they’ll stitch together into an “Eclipse Megamovie.”
If you want to watch the eclipse, you'll need a special pair of glasses to protect your eyes. And here's some additional guidance if you plan on photographing or filming the eclipse.

Sunday, July 30, 2017

Statistics Sunday: Fixed versus Random Effects

As I've said many times, statistics is about explaining variance. You'll never be able to explain every ounce of variance (unless you, say, create a regression model with the same number of predictors as there are cases; then you explain all variance, but fail to make any generalizable inferences). Some variance just can't be explained except as measurement or sampling error. But...

That is, it's possible for you to have two variance components, and attempt to partition variance that appears random and variance that appears to be systematic - it has some cause that is simply an unmeasured variable (or set of variables). This is where random effects models come into play.

You may not have heard these terms in statistics classes, but you've likely done fixed effects analysis without even realizing it. Fixed effects models deal specifically with testing the effect of one or more independent variables - variables you've operationally defined and manipulated or measured. When you conduct a simple linear regression, you have your constant (the average on the Y variable when your X = 0), your (fixed effect) slope, and your (fixed effect) error. The effect you're testing is known.

But there are many other variables out there that you may not have measured. A random effects model attempts to partition variance, by seeing what residual variance in cases appear to be meaningful (that is, there are common patterns) and what appears to be just noise.

Often, we use a combination of the two, called a mixed effects model. This means we include predictors to explain as much variance as we can, then add in the random effects component, which will generate an additional variance term. It has the added bonus of making your results more generalizable, including to cases unlike the ones you included in your study. In fact, I mostly work with mixed and random effects models in meta-analysis, which add an additional variance component when generating the average effect size. In meta-analysis, a mixed effects model is used when you have strong justification that there isn't actually one true effect size, but a family or range of effect sizes, that depend on characteristics of the study. The results then include, not just an average effect size and confidence interval for that point estimate, but a prediction interval, which gives the range of possible true effect sizes. And this is actually a pretty easy justification to make.

Why wouldn't you use random effects all the time? Because it isn't always indicated, and it comes with some big drawbacks. First, this residual, random effects variance can't be correlated with any predictors you may have in the model. If that happens, you don't really have a good case for including the random effects component. The variance is related to the known predictors, not the unknown random effects variance. You're better off using a fixed effects model. And while random effects models can be easily justified, fixed effects models are easier to explain and interpret.

Additionally, the random (and mixed) effects models are more generalizable in part because they generate much wider confidence intervals. And of course, the wider the confidence interval, the more likely it is to include the actual population value you're estimating. But the wider the confidence interval, the less useful it is. There's a balance between being exhaustive and being informative. A confidence interval that includes the entire range of possible values will certainly include the actual population value. But it tells you very little.

Finally, a random effects model can reduce your power (and by the way, you need lots of cases to make this analysis work), and adding more cases - which increases power in fixed effects models - may actually decrease power (or even have no effect) because it adds more variance and also increases the size of confidence intervals. This may make it more difficult to show a value is significantly different from 0, even if the actual population value is. But as is always the case in statistics, you're estimating a value that is unknown with data that (you have little way of knowing) may be seriously flawed.

Hmm, that may need to be the subtitle of the statistics book I'm writing: Statistics: Estimating the Unknown with the Seriously Flawed.

Friday, July 28, 2017

Cool Chart, Hot Trend

Back in June, it was so hot in Arizona, mailboxes were melting and flights were unable to take off. Though people may have brushed this heatwave off as a fluke, research suggests summers are in fact getting hotter:
Extraordinarily hot summers — the kind that were virtually unheard-of in the 1950s — have become commonplace.

This year’s scorching summer events, like heat waves rolling through southern Europe and temperatures nearing 130 degrees Fahrenheit in Pakistan, are part of this broader trend.

During the base period, 1951 to 1980, about a third of summers across the Northern Hemisphere were in what they called a “near average” or normal range. A third were considered cold; a third were hot.

Since then, summer temperatures have shifted drastically, the researchers found. Between 2005 and 2015, two-thirds of summers were in the hot category, while nearly 15 percent were in a new category: extremely hot.

Practically, that means most summers today are either hot or extremely hot compared to the mid-20th century.
At the top of the article is an animation, showing the normal curve shifting to the right (toward warmer temperatures) over time. It's a great demonstration of this trend:

Thanks to my friend David over at The Daily Parker for sharing this story with me.

Thursday, July 27, 2017

All the Books!

If you live in the Chicago(land) area, you should definitely check out the Newberry Library Book Sale, in it's 33rd year! Six rooms on the lower level of the library are filled with books and CDs and books and records and books and DVDs and books and VHSs and did I mention books? I decided to walk over there after work, and I had great success in finding some awesome additions to my statistical library (plus a book about my favorite philosopher, Simone de Beauvoir):

And because I love nostalgia (Reality Bites) and was discussing one of these movies (Cry Baby) with my friend over at Is It Any Good?, I had to pick up these soundtracks:

The sale runs through Sunday! You should definitely check it out!

The System is Down

Recently, AI created by Facebook had to be shut down because it developed its own language:
An artificial intelligence system being developed at Facebook has created its own language. It developed a system of code words to make communication more efficient. The researchers shut the system down as it prompted concerns we could lose control of AI.

The observations made at Facebook are the latest in a long line of similar cases. In each instance, an AI being monitored by humans has diverged from its training in English to develop its own language. The resulting phrases appear to be nonsensical gibberish to humans but contain semantic meaning when interpreted by AI "agents."

In one exchange illustrated by the company, the two negotiating bots, named Bob and Alice, used their own language to complete their exchange. Bob started by saying "I can i i everything else," to which Alice responded "balls have zero to me to me to me…" The rest of the conversation was formed from variations of these sentences.

While it appears to be nonsense, the repetition of phrases like "i" and "to me" reflect how the AI operates. The researchers believe it shows the two bots working out how many of each item they should take. Bob's later statements, such as "i i can i i i everything else," indicate how it was using language to offer more items to Alice. When interpreted like this, the phrases appear more logical than comparable English phrases like "I'll have three and you have everything else."
The reasoning behind the "fear of losing control" is wanting to make sure the process undertaken by the AI can be understood by humans, if necessary. But the reason we develop and use AI is specifically to do things that humans can't, or that it would take humans a long time to do manually. While I can understand the need for monitoring, even if the AI bots were speaking English, it would probably still take a while to dig in and find out what they're doing. The concept of AI really seems to be about the need to put difficult processes into a black box.

So, alas, Bob and Alice (and all of their friends) are now offline.

Wednesday, July 26, 2017

Statistical Sins: Errors of Omission

For today's Statistical Sins post, I'm doing something a bit differently. The source I'm discussing doesn't necessarily commit any statistical sins, but commits sins in how he reports research and the bias he shows in what research he presents (and what he doesn't).

The source is a blog post for Psychology Today, written by Dr. Lee Jussim, a social psychologist. His post, Why Brilliant Girls Tend to Favor Non-STEM Careers, discusses the gender disparity in the STEM fields and argues that there is no bias against women in these fields.

This is a subject I feel very strongly about, so I recognize I'm probably predisposed to take isue with this post. I'm going to try to be objective, even if the article is written in a "Cash me outside, howbow da?" kind of way. Which is why it's a bit hilarious that his guidelines for commenting include not "painting groups ith a broad brush" (which he does when he quips that 95% of people are a vulnerable group by social psychology definitions) and "keep[ing] your tone civil."

He begins by accusing researchers who are selling the discrimination narrative of cherry-picking studies that support their argument. So, to argue that differences in enrollment in STEM programs and employment in STEM fields have non-discriminatory causes... he cherry-picks studies that support his argument.

He gives a little attention to one study that refutes his argument, but sets it up as a straw man due to its smaller sample size compared to the studies he cites that support his argument. In fact, he gives a one sentence summary of that study's findings and little to no detail on its methods.

I could probably do a point-by-point analysis and rebuttal of his post. But for the sake of brevity, I will confine my response to three points.

First, he writes an entire post about gender disparities in STEM without once citing stereotype threat research, which not only could explain the gap, it could explain his key argument against bias - differences in interest. He doesn't even try setting stereotype threat research up as a straw man, though he likely could - new efforts in replicating past research (and in many cases, being unable to replicate said research) means that most key social psychology findings could be fair game for criticism.

As a reminder, stereotype threat occurs when a member of a stereotyped group (e.g., women) encounters a task about which their group is stereotyped (e.g., a math test) and underperforms as a result. Stereotype threat leads members of the group to disengage and deidentify from the task area. Hence, women exeriencing stereotype threat when encountering math will decide that they're just not a math person and they're better suited for other fields. I did my masters thesis on stereotype threat, which is probably the strongest version of mesearch I've conducted.

He doesn't even say the words stereotype threat once in the article. And he's a stereotype researcher! I highly doubt he is unaware of this area of research. But because it doesn't support his argument, it essentially doesn't exist in the narrative he's selling. As he says himself in the post:
What kind of "science" are we, that so many "scientists" can get away with so systematically ignoring relevant data in our scientific journals?
Right? Seriously.

Second, his sole argument for the disparity in STEM fields is that men and women have different interests. That is, women are more drawn to non-STEM fields. But, as mentioned above, stereotype threat research could explain this observation. Women deidentify from math (and also the remaining STE fields) when encountering the frustration of stereotype threat. So, by the time we get to them in studies about subject matter interests, the stereotypes about gender have already been learned and in many cases, ingrained. Research on implicit bias shows that nearly everyone is aware of racial stereotypes, even if they don't believe in their accuracy themselves; the same is likely true for gender stereotypes.

So, okay, if women just aren't interested in the STEM fields, the question we should be asking is "Why?" What is causing this difference? Is it something unimportant, as Jussim believes? Or is it something as insidious as stereotypes? And would different practices with regard to teaching and career counseling help open women's options up to include careers in STEM?

Believe me, I'm not second wave feminist enough to believe that we should ignore choice and argue that women are brainwashed anytime they do something stereotype-compliant (yeah, I have a women's studies concentration, so knowledge of the different waves of feminism can creep out everyone once in a while), but I'm also concerned when women aren't given all the facts they need to make an informed choice. Now, to be fair, Jussim does offer one reason for the differences in interest, though there's issues there as well. Which brings me to...

Third, he conflates STEM with "working with things" and non-STEM with "working with people," and says that women are actually superior in performance in both verbal and quantitative skills, leading them to prefer jobs that use verbal skills.

There's a lot of wrong in that whole framework, so let me try to untangle it. Let's start with the verbal + quantitative = verbal career narrative. By his logic, being strong in both means women could go into a verbal-area career or a quantitative-area career. But why doesn't he discuss verbal + quantitative careers? There are many, including the career he is in as a social psychology researcher. In fact, there are few quantitative careers I can think of that don't require you to at least try to string two or more words together. Being good at both allows you to succeed even more in STEM jobs. In fact, as I think of some of the best-known scientists of today, they are all people who communicate well, and have published many books and/or articles. If I asked you to name STEM folks known by the public at large, you'd probably list some of the following people: Neil DeGrasse Tyson, Brian Greene, Stephen Hawking, Carl Sagan... All great writers. All fluent in STEM. All men.

True, I could be forgetting some of the big names of women in STEM, to sell my narrative. But these are the people I thought of off the top of my head. I'm struggling to think of women who are well known at large, though I can think of a fair number in my own field. Obviously, I remember my childhood hero, Marie Curie, the first woman to win a Nobel prize and the first person to win two in different fields. But if we could get more girls to be interested in STEM, show them that they have options to use their verbal skills as well, and also show them that they have role models of women in STEM (who preferably don't have to poison themselves to attain greatness - sorry, Marie), maybe - maybe - more will show an interest.

We do nothing by miscontruing what these jobs involve, such as the nonsense "working with things" versus "working with people" dichotomy. True, some jobs may be more thing-heavy or people-heavy, but there's never total isolation of the other side. And people who work in the STEM fields - especially people who want to be successful in STEM - absolutely work with people. And talk to people. And share findings with people. And even study people, so that people in a sense become their "things." As a social psychologist, Jussim should know better.

I'll be honest, I'm having a hard time understanding Jussim's need to so vehemently say there's nothing there. Either he is wrong, and if we stop believing there's something there, we stop doing anything to help it. Or I am wrong, and we've maybe wasted some research dollars and people's time. So what? And I swear to God, people, if I hear the "but reverse discrimination" argument against my previous statement, I'm going to scream.

In conclusion, Jussim's sins are omission and some logical fallacies. My sins are probably the same. Hopefully between these two posts, there's some balance.