Sunday, March 24, 2019

Statistics Sunday: Blogging A to Z Theme Reveal

I'm excited to announce this year's theme for the Blogging A to Z challenge:

I'll be writing through the alphabet of psychometrics with the Rasch Measurement Model approach. I've written a bit about Rasch previously. You can find those posts here:

Looking forward to sharing these posts! First up is A for Ability on Monday, April 1!

Sunday, March 17, 2019

Statistics Sunday: Standardized Tests in Light of Public Scandal

No doubt, by now, you've heard about the large-scale investigation into college admissions scandals among the wealthy - a scandal that suggests SAT scores, among other things, can in essence be bought. Eliza Shapiro and Dana Goldstein of the NY Times ask if this scandal is "the last straw" for tests like the SAT.

To clarify in advance, I do not nor have I ever worked for the Educational Testing Service or for any organization involved in admissions testing. But as a psychometrician, I have a vested interest in this industry. And I became a psychometrician because of my philosophy: that many things, including ability, achievement, and college preparedness, can be objectively measured if certain procedures and methods are followed. If the methods and procedures are not followed properly in a particular case, the measurement in that case is invalid. That is what happens when a student (or more likely, their parent) pays someone else to take the SAT for them, or bribes a proctor, or finds an "expert" willing to sign off on a disability the student does not have to get extra accommodations.

But because that particular instance of measurement is invalid doesn't damn the entire field to invalidity. It just means we have to work harder. Better vetting of proctors, advances in testing like computerized adaptive testing and new item types... all of this is to help counteract outside variables that threaten the validity of measurement. And expansions in the field of data forensics now include examining anomalous patterns in testing, to identify if some form of dishonesty has taken place - allowing scores to be rescinded or otherwise declared invalid after the fact.

This is a field I feel strongly about, and as I said, really sums up my philosophy in life for the value of measurement. Today, I'm on my way to the Association of Test Publishers Innovations in Testing 2019 meeting in Orlando. I'm certain this recent scandal will be a frequent topic at the conference, and a rallying cry for better protection of exam material and better methods for identifying suspicious testing behavior. Public trust in our field is on the line. It is our job to regain that trust.

Tuesday, March 12, 2019

Are Likert Scales Superior to Yes/No? Maybe

I stumbled upon this great post from the Personality Interest Group and Espresso (PIG-E) blog about which is better - Likert scales (such as those 5-point Agree to Disagree scales you often see) or Yes/No (see also True/False)? First, they polled people on Twitter. 66% of respondents thought that going from a 7-point to 2-point scale would decrease reliability on a Big Five personality measure; 71% thought that move would decrease validity. But then things got interesting:
Before I could dig into my data vault, M. Brent Donnellan (MBD) popped up on the twitter thread and forwarded amazingly ideal data for putting the scale option question to the test. He’d collected a lot of data varying the number of scale options from 7 points all the way down to 2 points using the BFI2. He also asked a few questions that could be used as interesting criterion-related validity tests including gender, self-esteem, life satisfaction and age. The sample consisted of folks from a Qualtrics panel with approximately 215 people per group.

Here are the average internal consistencies (i.e., coefficient alphas) for 2-point (Agree/Disagree), 3-point, 5-point, and 7-point scales:

And here's what they found in terms of validity evidence - the correlation between the BFI2 and another Big Five measure, the Mini-IPIP:

FYI, when I'm examining item independence in scales I'm creating or supporting, I often use 0.7 as a cut-off - that is, items that correlate at 0.7 or higher (meaning 49% shared variance) are essentially measuring the same thing and violate the assumption of independence. The fact that all but Agreeableness correlates at or above 0.7 is pretty strong evidence that the scales, regardless of number of response options, are measuring the same thing.

The post includes a discussion of these issues by personality researchers, and includes some interesting information not just on number of response options, but also on the Big Five personality traits.

Monday, March 11, 2019

Statistics Sunday: Scatterplots and Correlations with ggpairs

As I conduct some analysis for a content validation study, I wanted to quickly blog about a fun plot I discovered today: ggpairs, which displays scatterplots and correlations in a grid for a set of variables.

To demonstrate, I'll return to my Facebook dataset, which I used for some of last year's R analysis demonstrations. You can find the dataset, a minicodebook, and code on importing into R here. Then use the code from this post to compute the following variables: RRS, CESD, Extraversion, Agree, Consc, EmoSt, Openness. These correspond to measures of rumination, depression, and the Big Five personality traits. We could easily request correlations for these 7 variables. But if I wanted scatterplots plus correlations for all 7, I can easily request it with ggpairs then listing out the columns from my dataset I want included on the plot:


(Note: I also computed the 3 RRS subscales, which is why the column numbers above skip from 112 (RRS) to 116 (CESD). You might need to adjust the column numbers when you run the analysis yourself.)

The results look like this:

Since the grid is the number of variables squared, I wouldn't recommend this type of plot for a large number of variables.

Thursday, March 7, 2019

Time to Blog More

My blogging has been pretty much non-existent this year. Without getting too personal, I've been going through some pretty major life changes, and it's been difficult to focus on a variety of things, especially writing. As I work through this big transition, I'm thinking about what things I want to make time for and what things I should step away from.

Writing - especially about science, statistics, and psychometrics - remains very important to me. So I'm going to keep working to get back into some good blogging habits. Statistics Sunday posts may remain sporadic for a bit longer, but look for more statistics-themed posts very soon because...

That's right, it's time to sign up for the April A to Z blogging challenge! I'll officially announce my theme later this month, but for now I promise it will be stats-related.

Thursday, February 28, 2019

A New Trauma Population for the Social Media Age

Even if you aren't a Facebook use, you're probably aware that there are rules about what you can and cannot post. Images or videos that depict violence or illegal behavior would of course be taken down. But who decides that? You as a user can always report an image or video (or person or group) if you think it violates community standards. But obviously, Facebook doesn't want to traumatize its users if it can be avoided.

That's where the employees of companies like Cognizant come in. It's their job to watch some of the most disturbing content on the internet - and it's even worse than it sounds. In this fascinating article for The Verge, Casey Newton describes just how traumatic doing such a job can be. (Content warning - this post has lots of references to violence, suicide, and mental illness.)

The problem with the way these companies do business is that, not only do employees see violent and disturbing content; they also don't have the opportunity to talk about what they see with their support networks:
Over the past three months, I interviewed a dozen current and former employees of Cognizant in Phoenix. All had signed non-disclosure agreements with Cognizant in which they pledged not to discuss their work for Facebook — or even acknowledge that Facebook is Cognizant’s client. The shroud of secrecy is meant to protect employees from users who may be angry about a content moderation decision and seek to resolve it with a known Facebook contractor. The NDAs are also meant to prevent contractors from sharing Facebook users’ personal information with the outside world, at a time of intense scrutiny over data privacy issues.

But the secrecy also insulates Cognizant and Facebook from criticism about their working conditions, moderators told me. They are pressured not to discuss the emotional toll that their job takes on them, even with loved ones, leading to increased feelings of isolation and anxiety.

The moderators told me it’s a place where the conspiracy videos and memes that they see each day gradually lead them to embrace fringe views. One auditor walks the floor promoting the idea that the Earth is flat. A former employee told me he has begun to question certain aspects of the Holocaust. Another former employee, who told me he has mapped every escape route out of his house and sleeps with a gun at his side, said: “I no longer believe 9/11 was a terrorist attack.”
It's a fascinating read on an industry I really wasn't aware existed, and a population that could be diagnosed with PTSD and other responses to trauma.

Thursday, February 21, 2019

Replicating Research and "Peeking" at Data

Today on one of my new favorite blogs, EJ Wagenmakers dissects a recent interview with Elizabeth Loftus on when it is okay to peek at data being collected:
Claim 4: I should not feel guilty when I peek at data as it is being collected

This is the most interesting claim, and one with the largest practical repercussions. I agree with Loftus here. It is perfectly sound methodological practice to peek at data as it is being collected. Specifically, guilt-free peeking is possible if the research is exploratory (and this is made unambiguously clear in the published report). If the research is confirmatory, then peeking is still perfectly acceptable, just as long as the peeking does not influence the sampling plan. But even that is allowed as long as one employs either a frequentist sequential analysis or a Bayesian analysis (e.g., Rouder, 2014; we have a manuscript in preparation that provides five intuitions for this general rule). The only kind of peeking that should cause sleepless nights is when the experiment is designed as a confirmatory test, the peeking affects the sampling plan, the analysis is frequentist, and the sampling plan is disregarded in the analysis and misrepresented in the published report. This unfortunate combination invokes what is known as “sampling to a foregone conclusion”, and it invalidates the reported statistical inference.
Loftus also has many opinions on replicating research, which may in part be driven by the fact that recent replications have not been able to recreate some of the major findings in social psychology. Wagenmakers shares his thoughts on that as well:
I believe that we have a duty towards our students to confirm that the work presented in our textbooks is in fact reliable (see also Bakker et al., 2013). Sometimes, even when hundreds of studies have been conducted on a particular phenomenon, the effect turns out to be surprisingly elusive — but only after the methodological screws have been turned. That said, it can be more productive to replicate a later study instead of the original, particularly when that later study removes a confound, is better designed, and is generally accepted as prototypical.
The whole post is worth a read and also has a response from Loftus at the end.