Monday, September 18, 2017

Words, Words: From the Desk of a Psychometrician

I've decided to start writing more about my job and the types of things I'm doing as a psychometrician. Obviously, I can't share enough detail for you to know exactly what I'm working on, but I can at least discuss the types of tasks I encounter and the types of problems I'm called to solve. (And if you're curious what it takes to be a psychometrician, I'm working on a few posts on that topic as well.)

This week's puzzle: examining readability of exam items. Knowing as much as we do about the education level of our typical test-taker - and also keeping in mind that our exams are supposed to measure knowledge of a specific subject matter, as opposed to reading ability - it's important to know how readable are the exam questions. This information can be used when we revise the exam, and could also be used to update our exam item writing guides (creating a new guide is one of my ongoing projects).

Anyone who has looked at the readability statistics in Word knows how to get Flesch-Kinkaid statistics: reading ease and grade level. Reading ease, which was developed by Rudolph Flesch, is a continuous value based on the average number of words per sentence and average number of syllables per word; higher scores mean the text is easier to read. The US Army, led by researcher John Kinkaid, created grade levels based on the reading ease metric. So the grade-level you receive through your analysis reflects the level of education necessary to comprehend that text.

And to help put things in context, the average American reads at about a 7th grade level.

The thing about Flesch-Kinkaid is that it isn't always well-suited for texts on specific subject matters, especially those that have to use some level of jargon. In dental assisting, people will encounter words that refer to anatomy or devices used in dentistry. These multisyllabic words may not be familiar to the average person, and may result in higher Flesch-Kinkaid grade levels (and lower reading ease), but when placed in the context for practicing dental assistants - who would learn these terms in training or on-the-job - they're not as difficult. And as others have pointed out, there are common multisyllabic words that aren't difficult. Many people - even people with low reading ability - probably know words like "interesting" (a 4-syllable word).

So my puzzle is to select readability statistics that are unlikely to be "tricked" by jargon, or at least find some way to take that inflation into account. I've been reading about some of the other readability statistics - such as the Gunning FOG index, where FOG stands for (I'm not kidding) "Frequency of Gobbledygook." Gunning FOG is very similar to Flesch-Kinkaid: it also takes into account average words per sentence and, instead of average syllables, looks at average number of complex (3+ syllables) words. But there are other potential readability statistics to explore. One thing I'd love to do is to generate a readability index for each item in our exam pools. The information, along with difficulty of the item and how it maps onto exam blueprints, could become part of item metadata. But that's a long-term goal.

To analyze the data, I've decided to use R (though Python and its Natural Language Processing tools are another possibility). Today I discovered the koRpus package (R package developers love capitalizing the r's in package names). And I've found the readtext package that can be used to pull in and clean text from a variety of formats (not just txt, but JSON, xml, pdf, and so on). I may have to use these tools for a text analysis side project I've been thinking of doing.

Completely by coincidence, I also just started reading Nabokov's Favorite Word is Mauve, in which author Ben Blatt uses different text analysis approaches on classic and contemporary literature and popular bestsellers. In the first chapter, he explored whether avoidance of adverbs (specifically the -ly adverbs, which are despised by authors from Ernest Hemingway to Stephen King) actually translates to better writing. In subsequent chapters, he's explored differences in voice by author gender, whether great writers follow their own advice, and how patterns of word use can be used to identify authors. I'm really enjoying it.

Edit: I realized I didn't say more about getting Flesch-Kinkaid information from Word. Go to Options then Proofing and select "Show readability statistics." You'll receive a dialogue box with this information after you run Spelling and Grammar Check on a document.


  1. Hello just saw your blog and can I just say I love and appreciate it so much! Your passion really comes through and you make learning statistic seem very approachable (i mean we know otherwise but ...) but its so commendable how you can distill complex topics into its essence!

    One question tho, is this blog somehow linked to twitter? Many great psychologists, psychometricians and methodologists are there like a playground of adventurous statistics!

    1. Wow, thank you! Happy to know you're enjoying the posts - and it's always wonderful to hear I can make these topics approachable. I'm hoping to have more of a social media presence soon - just have a few things in the works first. I'll be sure to share that information on the blog!

    2. Looking forward! For now binging on your previous blog posts (why did I just see this now?!) all the way down. I'm sure you must get this a lot, but I'll be the first to order your book when you decide to write and publish one! You'd be a perfect author for a high-level but approachable stats tome <3