Sunday, December 31, 2017

Year in Review: 2017

I'm looking back over 2017, and pulling together some metrics to answer the question posed by Rent: How do you measure a year? Here are a few ways:

Books read: 52 (for a total of 16,906 pages)
Blog posts written: 365 (counting this one)
Jobs: 2, and fortunately, less than 2 months of unemployment
Concerts Performed: 7, plus a very successful benefit for my choir
Movies Seen in Theatre: 14
Plus another NaNoWriMo win!

I also decided to check out my 5 most popular blog posts from the year, based on page views, and all of them are about statistics (go figure):
  1. Statistical Sins: Stepwise Regression
  2. Statistics Sunday: What Are Degrees of Freedom? (Part 1)
  3. Statistics Sunday: Free Data Science and Statistics Resources
  4. Statistics Sunday: What is Bootstrapping?
  5. Statistical Sins: Know Your Variables (A Confession)
I've already shared some of my writing goals for 2018. I'm putting together some additional goals and resolutions for 2018, and I'll share those soon!

Sound off, readers - how do you measure your year? And what are your goals for 2018? Feel free to describe in the comments or share a link to your own blog posts on the subject.

By the way, you might also enjoy Google's Year in Search 2017, which gives some of the highlights for the year:

Statistics Sunday: Different Means

While reading a history of mathematics book earlier this year, I was surprised to learn about the number of means one can compute to summarize values. I was familiar with the common descriptive statistics for central tendency - mean, median, and mode - and was aware that this particular mean is also called the arithmetic mean, so I suspected there were more kinds of means. But I wasn't exposed to any of them beyond arithmetic mean in my statistics classes.

I've started to do some research into the different kinds of means to share here. Today, I'll start with the geometric mean.

The arithmetic mean, of course, is calculated by adding together all values then dividing by the number of values. But there are many cases where the arithmetic mean isn't really an appropriate measure. If you're dealing with values that are serially correlated - there is shared variance between values of a variable over time - the arithmetic mean may not be the best descriptive statistic.

For instance, say you're tracking return on an investment over time. Those values will be correlated across time and you'll have compounding that must be taken into account. The geometric mean is well-suited for this situation - in fact, it's frequently used among investment professionals.

The geometric mean is calculated by multiplying n values together, then taking the nth root. As you can imagine, for a few values, this could easily be calculated by hand. For instance, to demonstrate, let's say I have 5 values - return over the last 5 years:

Year 1 - 1%
Year 2 - 7%
Year 3 - -2%
Year 4 - 6%
Year 5 - 3%

For a $100 investment, the value over the 5 years would be:

Year 1 - $100 * 1.01 = $101.00
Year 2 - $101 * 1.07 = $108.07
Year 3 - $108.07 * 0.98 = $105.91
Year 4 - $105.91 * 1.06 = $112.26
Year 5 - $112.26 * 1.03 = $115.63

The arithmetc mean of these 5 return rates (1.01, 1.07, 0.98, 1.06, and 1.03) would be 1.03, or a 3.0% rate of return. The geometric mean would be the product of these 5 values (approximately 1.156) taken the 5th root = 1.029 or 2.9%. Pretty close. If we had more values and/or more volatility in those values, we might see more of a discrepancy between the values. One thing to note, though, is that the geometric mean will be less than or equal to the arithmetic mean; it won't be greater than it.

For large ns, you'll want to have a computer do this calculation for you. (The same could be said for the arithmetic mean, I suppose.)

Fortunately, many data analysis programs offer a geometric mean calculation. You can compute this in Excel for up to 255 values using the GEOMEAN function. SPSS offers geometric mean in the Analyze->Compare Means->Means option. And the psych package for R offers a geometric.mean function.

Today, I'm looking back over my own data for the year - books read, blog posts written, writing accomplished, and so on - and generating some metrics to describe 2017 for me. I may not need to take the geometric mean of anything, but it's always good to have different descriptive statistics in your back pocket. You never know when they might be useful. Look for a "measurement" post sometime today or tomorrow.

Happy new year, everyone! Have a fun celebration tonight, stay safe, and I'll see you in 2018!

A few edits: 1) As Jay points out below, geometric mean isn't so great if a value is 0 or negative. Any value times 0 is of course 0, and any positive value times a negative value is negative. So your result will be meaningless if you have 0s or negatives.
2) My friend, David, over at The Daily Parker let me know that this isn't how he learned to compute rate of return in business school. This is probably a good demonstration that there are many ways to summarize a set of values, and also a demonstration that I don't really know a lot about investment or economics. I love numbers, but not so much numbers with $ signs in front of them. Mostly I wanted to share how to compute geometric mean and I based my example of a few different examples I saw on the internet. (Yes, I know, just because it's on the internet doesn't mean it's correct.) So if the investment example is incorrect or meaningless, I'm okay with that. But the geometric mean can be well-suited for other applications, as long as you watch out for #1 above.
Thank you both for your feedback!

Saturday, December 30, 2017

Movie Review: Pitch Perfect 3

Tonight I went to see the movie that made Pitch Perfect a trilogy. I'll be honest: I loved the first one, and I was very disappointed with the second. So I didn't have high expectations for the third - and that's probably why I liked it. With some caveats, of course.

If you're not familiar, Pitch Perfect was the story of a scrappy group of college a cappella singers who prove to the world that all-woman groups can do good a cappella music. Pitch Perfect 2 was the story of a disgraced group of college a cappella singers who prove that a cappella groups can perform original music. Pitch Perfect 3 is the story of down-on-their-luck college graduates, who prove that great a cappella singers sometimes perform with instruments.

The movie opens with an awesome arrangement of Britney Spears's "Toxic." I'll be honest, this just made me happy, because I love that song.

It's interrupted by Fat Amy busting through the ceiling with a fire extinguisher, dousing their captors as the others escape. Then Fat Amy and Beca jump off the boat as it explodes.

I literally said, "What?" at this point in the movie.

A subtitle tells us we're now going back 3 weeks to show what led to these insane events. Beca hates her job as a music producer, trying to make crappy music sound good only to be thrown under the bus, so she quits. Fat Amy and Chloe are Beca's roommates, and after Beca shares her news of being unemployed, the three meet up with their fellow singers for a Barton Bella reunion... only to discover, to their disappointment, that they won't be singing, just watching the current Barton Bellas.

It's here that we learn the girls are going through what these days is called (sadly, without a hint of irony) a quarter-life crisis. So they jump at a chance to perform together as a group again, on a USO tour, while competing for a spot as the opening act for DJ Khaled.

This story borrows pretty heavily from the previous two movies, with other groups throwing all kinds of shade at the Bellas, including a chick group called (I kid you not) Ever Moist - who are just Ever Salty. Fat Amy has moved on from her constant ginger jokes of the first movie to constant dumb Emily jokes. They're not as funny, but I'm thrilled they moved away from the Fat Amy schtick of Pitch Perfect 2, which consisted of, "Hey, Fat Amy is so fat. Just look at how fat she is. Isn't that funny? Oh by the way, she's really large." Even better, they dropped the unbelievably stereotypical jokes from (about) Florencia, their Guatemalan classmate.

Plus, Fat Amy redeems herself by showing she's pretty bad ass, as the B story deals with her reunion/conflict with her father (played by John Lithgow), a smuggler. Elizabeth Banks and John Michael Higgins return in their roles as a cappella commentators. Ruby Rose is quite good as the lead singer of Ever Moist, and I was thrilled to see Andy Allo as a member of Ever Moist - I saw her perform at City Winery recently, and was really impressed with her. And I was almost super-excited when I saw music producer Theo, mainly because when he first appeared on screen, I thought it was Adam Scott. I was sad when I got another look and realized, alas, it was not. But the actual actor had a cute accent, so there's that.

So the movie was over-the-top and much-recycled. But it was cute, it had good vocal arrangements, it had some really funny moments but didn't seem to take itself too seriously. Overall, if you loved Pitch Perfect, you'll probably want to see this movie. It's not profound, it's not even terribly clever, but it's fun.

Thursday, December 28, 2017

Bad Lip Reading of the Trumps

I know I still owe my readers a Statistical Sins post. I'm busy at work doing - what else - statistical analysis. But in the meantime, enjoy this new Bad Lip Reading of the Trumps:

Many snort-worthy moments. And I laughed out loud at Melania's "Please help me."

Wednesday, December 27, 2017

Trivial Only Post: Reasons I Don't Like Knitting in Public

I'm working on a knitted gift for a handmade gift exchange I'm going to tomorrow night. As I scramble to finish, I find myself needing alone time to knit. Why? Because I've realized I hate knitting in public, and here's why:
  1. Random strangers trying to start conversations with me while I knit:
    • "What are you making?" The least obnoxious of random comments, but still frustrating if I'm trying to concentrate
    • Comments about my age, e.g., I'm too young to be a knitter
    • "Have you ever made a ________?" Which results in a request for me to list out all the things I've made
    • "Are you on Etsy?" Aw, you're sweet, but I'm a slow knitter, and have to have a pattern to make anything. I'm fast and able to improvise with crochet, but not knitting
    • "That seems really hard. I bet you couldn't teach me how to do it." You're right, I couldn't teach you, because I don't know you.
  2. There's a surprising amount of swearing coming out of my mouth when I knit - basically every time I struggle to add a stitch, reduce a stitch, drop a stitch, or otherwise screwed up from poor counting

Monday, December 25, 2017

Twelve Days of Christmas

Merry Christmas! I've been spending time in the Upper Peninsula of Michigan. Back to Chicago tomorrow.

Tonight, I'm at a Christmas carol sing-a-long. And as today is the first day of Christmas, we had to sing the Twelve Days of Christmas.

After we sang, I was reminded of this post from my friend David over at The Daily Parker about how someone might react to receiving those gifts.

Sunday, December 24, 2017

Statistics Sunday: Introduction to Quantiles

During April A to Z, I devoted one of my posts to descriptive statistics - ways of summarizing your data, with statistics like the mean (a measure of central tendency) and standard deviation (a measure of variability). And I devoted another post to the histogram, which displays the distribution of a variable.

The histogram is built with frequencies: all of the values for a given variable in your dataset, and counts of the number of cases with that value. For instance, using the Caffeine study file (a randomly generated dataset to go with the fictional study I first discussed here), I could generate frequencies like this:

caffeine<-read.delim("caffeine_study.txt", header=TRUE, sep="\t")

64 69 71 72 73 75 76 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 94 

 1  2  2  2  1  4  4  7  1  3  2  2  3  6  1  4  5  2  2  3  1  1  1 

Not a very fancy-looking table, but it lets me see the frequency of each score in my dataset. (Note: There are ways of creating a much prettier table in R, but that's not the point of today's post.) As you can see, there are some possible scores missing, because the frequency of those values are 0. All of this information would be easier to see in a chart, of course. But frequencies are a good place to start to make sure you don't have any weird values.

But another way to describe data, which is based on frequencies, is in terms of percentiles - dividing the scores up into groups that reflect a certain percentage of the scores. We call these quantiles, and you can define those with whatever percentiles you want. But those values will be based on the scores that are actually in your dataset. Those scores with 0 frequency aren't counted toward the percentiles.

For example, let's say I want to divide my dataset up into 4, so each group encompasses 25% of the scores in my dataset. These are called quartiles. I can get these in R easily:

quantile(caffeine$score, c(.25,.50,.75,1)) 

 25%   50%   75%  100% 
76.00 82.00 86.25 94.00

What this tells me is that 25% of the scores are at or below 76, 50% are at or below 82, 75% are at or below 86.25, and 100% are at or below 94. Now, you might notice that the data are all whole numbers. So how can one of the results include a decimal? That's because the data didn't perfectly split into quartiles. There must have been some scores that straddled the line between one quartile and the next. In fact, if you're really curious how this works, you can compute quartiles pretty easily by hand. All you'd need to do is write out each score in numerical order, then divide it into 4 equal parts. The frequency table above would give you a start, but you'd need to add in the numbers with a frequency greater than 1. There are 60 scores, so each quartile would include 15 scores.

We deal with percentiles pretty regularly in our daily life, from a child's height and weight to performance on a test. All of these percentiles tell us the same thing - the percentage of scores at or below a certain (usually your) score.

Quantiles are a great way to summarize data, and they can be especially useful when summarizing data with a wide range. There's a great approach to linear regression that uses quantiles; look for a future blog post about quantile regression!

Saturday, December 23, 2017

What to Expect When You're Finishing a PhD

Via The Daily Parker, Carter Page, Trump's former foreign policy advisor apparently failed his dissertation defense twice:
Carter Page, Donald Trump’s former foreign policy adviser, accused his British examiners of “anti-Russian bias” after they took the highly unusual step of failing his “verbose” and “vague” PhD thesis, not once but twice.

[Professor Gregory] Andrusz said he had expected it would be “easy” to pass Page, a student at the School of Oriental and African Studies (Soas). He said it actually took “days and days” to wade through Page’s work. Page “knew next to nothing” about social science and seemed “unfamiliar with basic concepts like Marxism or state capitalism,” the professor said.

Their subsequent report was withering. It said Page’s thesis was “characterised by considerable repetition, verbosity and vagueness of expression”, failed to meet the criteria required for a PhD, and needed “substantial revision”. He was given 18 months to produce another draft.

Page resubmitted in November 2010. Although this essay was a “substantial improvement” it still didn’t merit a PhD.
But this, I think, is the best part:
After this second encounter, Andrusz and [Dr. Peter] Duncan both resigned as Page’s examiners. In a letter to Soas, they said it would be “inappropriate” for them to carry on following Page’s “accusation of bias” and his apparent attempts to browbeat them. Andrusz said he was stunned when he discovered Page had joined Trump’s team.

Soas refuses to identify the academics who eventually passed Page’s PhD thesis, citing data protection rules.
Getting a PhD isn't easy. There are multiple checks along the way to ensure anyone who makes it to the point of a final dissertation draft is ready. In my program, we had the following steps along the way: 30 course credit hours plus a thesis to earn the master's degree, then an additional 30 credit hours, 1000 hours at an internship (which you could also complete by teaching courses as an adjunct), 4 candidacy exams, and a dissertation proposal and proposal defense before we could even begin working on the dissertation itself.

At any point in the program, people might burn out and leave OR be asked to leave. The point is that, if you survive all of those milestones and make it through the dissertation proposal defense, you're probably ready.

BUT the dissertation is a lot of work, possibly as much as work as the previous steps combined. And making it that far is no guarantee that you'll finish. That would be like giving the PhDs away after making it through the other, less demanding steps. Because the dissertation is not simply a rite of passage. If you're going into a research field, the dissertation is practice for every research project you'll undertake after your degree. So it's not terribly surprising to hear someone made it that far and failed. What's more surprising, to me, is that he didn't burn out or be asked to leave along the way. Based on his prosaic responses to his examiners, I'm amazed he had the drive and abilities to survive all of the other requirements of the degree.

For me, the dissertation proposal defense was much more grueling than the dissertation defense itself. That was the time my committee really expected me to justify my proposed methods, and if I couldn't justify them completely, to accept their suggestions for changes. That was the time they made sure I was tackling the problem in the right way.

At the dissertation defense, they wanted to make sure I had done what I said I would do, and if I did anything differently, that I had a good justification for it. Don't get me wrong, the defense wasn't easy, but there are checks along the way to make sure you only get to that point if you're ready. Obviously, as Page's experiences demonstrate, sometimes people slip through the cracks.

Friday, December 22, 2017

Travel Day Links

I finally saw The Last Jedi last night and loved it. I'll try to have more reactions soon. For now, I'll say I loved and I'm so happy to no longer have to dodge spoilers.

I'm heading out of town for the holidays later on this morning/afternoon. I have a few articles up to read:

Happy holidays, everyone! I'm driving into cold temperatures and lots of snow, so I'm packing a ton of books and my laptop (and lots of sweaters and yoga pants), and planning to spend much of my time reading and writing. 

Thursday, December 21, 2017

Decoding Fowl Language

The University of Georgia and Georgia Institute of Technology recently collaborated on a fascinating project - using AI to decode chicken language:
In a series of studies published between 2014 and 2016, Georgia Tech research engineer Wayne Daley and his colleagues exposed groups of six to 12 broiler chickens to moderately stressful situations—such as high temperatures, increased ammonia levels in the air and mild viral infections—and recorded their vocalizations with standard USB microphones. They then fed the audio into a machine-learning program, training it to recognize the difference between the sounds of contented and distressed birds.

“It’s really interesting work, fairly ingenious and logical,” says Wallace Berry, a poultry scientist at Auburn University’s College of Agriculture who was not involved in the studies. “Chickens are a very vocal species, and as a poultry farmer you can always use more data to make better decisions. This is a great way to continuously filter all the information available in a chicken house and learn as soon as possible that something is wrong.”
Next step - learn how to speak to chickens, and ask them the question we've asking for years:

The Evolution of Ghosts

From Hamlet to American Horror Story, ghosts have been part of our entertainment and even our darkest fears. The first (known) ghost stories date back to the Roman Empire, when Pliny the Younger wrote about an old man with chains who haunted his home. People visit grave sites, not just to pay respects to the person buried there, but sometimes to encounter the ghost of that person as well. And people report sightings of famous people who have died, like Anne Boleyn, Benjamin Franklin, and Abraham Lincoln.

But you may be surprised to realize that the way ghosts are portrayed in pop culture has changed over time. And these changes have likely also changed people's reported experiences of the paranormal. (Which, in my opinion, lends some credence to more mundane explanations for supernatural experiences.)

A new book, The Ghost: A Cultural History by Susan Owens, explores how portrayals of ghosts have changed over time:
Susan Owens begins not with the specters of Halloween or some drafty Victorian haunted house, but with this scene where Scrooge is visited by his former partner.

Charles Dickens described Jacob Marley as “transparent,” and laden with “cash-boxes, keys, padlocks, ledgers, deeds, and heavy purses wrought in steel”; otherwise he had “the same face” and garb. Ghosts in the late 18th century and into the 19th century became translucent in part due to new optical shows (like phantasmagoria) and lantern-slides that projected luminous images, as well as the increased use of watercolors in art. “When Dickens made Marley’s ghost see-through in A Christmas Carol, he was drawing on a convention that had only relatively recently been established,” she writes.

Owens is formerly a curator of paintings at the Victoria and Albert Museum in London, and she explains that as she began to research ghosts in art and literature, she found written records dating back to the eighth century.
Owens book explores a variety of cultural and doctrinal forces that shaped our concept of ghosts, such as the English Reformation - rather than considering ghosts to be souls in purgatory, the belief that souls went directly to heaven or hell meant that wandering spirits had to be something else.

As a lover of horror movies and ghost stories, I will definitely check this book out.

Wednesday, December 20, 2017

Maybe Music is the Universal Language

Do you need cheering up today? Check out this absolutely adorable video of a small brass band that performed for a beluga whale:

Statistical Sins: Algorithmic Bias

Many aspects of modern life are determined by algorithms. When you apply for a job, chances are it's not a person who first sees your resume and cover letter; there's an algorithm for that. Algorithms can also dictate who gets additional security screening (or who gets assigned to have less scrutiny, as happened to me once when flying back to Chicago), or whether an email goes to your inbox or straight to your spam folder. They help make our lives more efficient.

But they can also cause harm. Since algorithms are, by nature, designed to work without human intervention (in fact, that's their entire purpose), that also means if there's a problem with the algorithm, it might not be spotted until multiple negative outcomes have already occurred.

Though there is evidence that algorithms - even when they show bias - are far superior to human decision-making, people often feel more comfortable knowing that a person, and not a computer, made a decision. For instance, I briefly worked on research for a new masters program. Since we had so many qualified candidates, many admission decisions were made by lottery. But because of past negative responses from the people who weren't chosen for other programs, this information was not widely known. In my current job, where scoring of exams is done by computer, we still do some quality control by hand, to make sure nothing went wrong - and this is viewed as essential by examinees and accreditors, especially in cases of high stakes testing. So it seems very likely that people might perceive decisions made by algorithms as unfair, and decisions made by people as more fair, even when they're not.

At the same time, there may be bias in variables measured and selected for algorithms, because at some point, that decision was made by a person. And algorithms that perpetuate discrimination can result in an endless feedback loop or a sort of self-fulfilling prophecy.

This may be the reason that New York City recently passed a bill to examine algorithmic bias in city government agencies:
The bill, which was signed by Mayor Bill de Blasio last week, will assign a task force to examine the way that New York City government agencies use algorithms to aid the judicial process.

According to ProPublica, council member James Vacca sponsored the bill as a response to ProPublica's 2016 account of racially-biased algorithms in the American criminal justice system. The investigative story revealed systemic digital bias within judicial risk assessment programs that favored the release of white inmates on the grounds of future good behavior over the release of black defendants.

Algorithmic source code is typically private, but issues of bias have called for increased transparency. The ACLU has spoken out on behalf of the bill passing, and it described access to institutionalized algorithmic source code as a fundamental step in ensuring fairness within the criminal justice system.
What are your thoughts on this issue? Should we always follow the algorithm's data driven decisions, even when those decisions are biased against a certain group? Or should we allow human intervention, even when that risks introducing more bias?

Tuesday, December 19, 2017

True Colors

Two graphs are currently open in my browser today, both of which deal with color - one deals with political affiliations (red Republicans and blue Democrats) while the other deals with lightsabers.

First, the graph of political affiliation, shared via Andrew Gelman:

Specifically, this chart displays results from two datasets: one from the National Provider Index, which lists physicians in the United States; and state voter files, which provide voter registration data including political affiliation. They were able to match these two files to create one dataset of political affilitation for 55,000 physicians residing in 29 states. As you can see in the graph above, there appears to be differences in political affiliation by physician specialty, hence the title of the original article: Your Surgeon Is Probably a Republican, Your Psychiatrist Probably a Democrat.

The second graph just brings me joy, because I'm super-excited to see The Last Jedi Thursday (also: tired of dodging spoilers). In honor of the new Star Wars movie, FiveThirtyEight gives us this graph of number of lightsabers by color in both canon and extended universe:

Movie Review: The Disaster Artist

It's been a while since I've done a movie review and I just saw The Disaster Artist last night - so here we are! A movie review, and not even a superfluous one at that.

The Disaster Artist tells the story of Tommy Wiseau (played by James Franco) and his best friend/fellow actor Greg Sestero (Dave Franco), as they try to make it in the crazy world of show business. Tommy has a mysterious past: no one knows how old he is, where he's from (he claims to be from New Orleans despite an unidentified/unidentifiable Eastern European accent), or where his vast wealth comes from.

Tommy and Greg meet in Jean Shelton's acting class, where Jean (in a great cameo from Melanie Griffith) offers both actors feedback after poorly executed scenes. Despite the fact that Tommy's rendition of Tennessee Williams's A Streetcar Named Desire was simply him shouting, "Stella!" over and over again while climbing the walls and writhing on the floor, Greg is impressed with Tommy's fearlessness. He asks Tommy to do a scene with him. Tommy encourages Greg to overcome his stage fright, and their friendship is born.

Together they move to L.A., staying in an apartment Tommy owns but rarely visits. They go through the various steps of becoming professional actors: taking headshots, going to auditions, and (at least in Greg's case) finding an agent - the great Iris Burton (another great cameo, this one by Sharon Stone), who is famous for representing child actors like River Phoenix, Kirsten Dunst, Drew Barrymore, and Fred Savage.

But despite all that, Tommy and Greg both struggle to get cast. Greg laments that maybe they should just make their own movie. Tommy, with his "bottomless" bank account, thinks that's a great idea, and writes a screenplay for The Room. When Tommy makes the unbelievable decision to purchase, rather than rent, cameras and equipment, he is given access to a studio and some of the best technicians in show business to help him make his movie. And, as history has shown us, that awful, incoherent movie went on to become a cult film, playing to sold out crowds.

The Disaster Artist does for The Room what Ed Wood did for Plan 9 From Outer Space - the "so bad it's good" movie that was already loved by many gets elevated and understood at a new level. The story behind the movie helps you to appreciate, even admire, the movie, regardless of its inept writing and poor execution. In fact, I'm a fan in general of "making of" movies, like Shadow of a Vampire (portraying the making of Nosferatu) and RKO 281 (portraying the making of Citizen Kane); regardless of whether the movie they portray is a masterpiece (as Nosferatu and Kane are considered to be) or garbage (like The Room and Plan 9) these making-of films are like love letters to the original movies and to cinema in general.

I'm about halfway through the book on which The Disaster Artist is based, written by Greg Sestero and author Tom Bissell, both of whom have cameos in this movie. (Also, be sure to hang around after the credits for a great scene with the real Tommy Wiseau.) So it's difficult for me not to compare the book and the movie, and - as is so often the case - feel bummed that they cut out some great parts from the book. For the most part, I felt this was a good adaptation of the source material, and fit a lot of Greg's and Tom's insights into The Room and Tommy himself into the movie. The beginning, though, felt a bit rushed - I didn't feel like it established why Greg and Tommy became friends, and why Greg was willing to do and give up so much to make Tommy happy. But the actors in the film all did an excellent job at bringing the material to life. James Franco was incredible as Tommy, and Dave Franco perfectly captured Greg's sweet, naïve charm.

Probably the part I'm most bummed they cut out was Greg's story about the movie Home Alone. In the film version, Greg shares that he became interested in acting because Home Alone changed his life. It's a throwaway line that makes Greg look like a lovable idiot. But in the book, he explains what he means by that, and it's actually a really sweet story. After Greg saw Home Alone, he went home and wrote a screenplay for Home Alone 2. Greg, being a child at the time (he's about my age), probably didn't write a professional-quality screenplay, but nonetheless, he found the address for John Hughes's production company, and sent his screenplay in. Not long after, he got his screenplay back in the mail, along with a note from John Hughes, telling him: "Believe in yourself, have patience, and always follow your heart." He says after reading that note, he found his calling.

Having seen The Room helps to understand many of the funny moments in The Disaster Artist. I'm fairly certain the others in theatre with me had never seen The Room because I was often the only laughing. But regardless of whether you've seen The Room, I highly recommend checking The Disaster Artist out. And if you're a fan of The Room or just bad movies in general, I also highly recommend reading the book.

Monday, December 18, 2017

Leaving His Mark

A British surgeon just pleaded guilty to two counts of assault for - I kid you not - writing his initials on patients' livers with an argon beam:
In a hearing at Birmingham crown court on Wednesday, Simon Bramhall admitted two counts of assault by beating relating to incidents on 9 February and 21 August 2013. He pleaded not guilty to the more serious charges of assault occasioning actual bodily harm.

The renowned liver, spleen and pancreas surgeon used an argon beam, used to stop livers bleeding during operations and to highlight an area due to be worked on, to sign his initials into the patients’ organs. The marks left by argon are not thought to impair the organ’s function and usually disappear by themselves.

The 53-year-old was first suspended from his post as a consultant surgeon at Birmingham’s Queen Elizabeth hospital in 2013 after a colleague spotted the initials “SB” on an organ during follow-up surgery on one of Bramhall’s patients.

Speaking after Bramhall’s suspension, Joyce Robins, of Patient Concern, said: “This is a patient we are talking about, not an autograph book.”
BTW, anyone else feel it's been a strange year?

Statistics Sunday: Mediation versus Moderation

I had a wonderful but very busy weekend, performing Händel's Messiah twice. Unfortunately, this means I didn't have a chance to sit down and write my Statistics Sunday post until, well, Monday. But hey, the holidays are coming soon, many of my university friends are wrapping up their semesters, and a lot of my coworkers are off this week because their kids are home. So it's kind of virtual Sunday, right?

Today, I wanted to write about two misunderstood concepts: mediation and moderation. Both deal with relationships among 3 (or more) variables, but they tell you very different things and are tested in different ways.

I've blogged before about mediation. Mediation can be thought of as another term for "caused by" or "explained by." You have mediation when the relationship between your independent and dependent variables is caused by or explained by their relationships with a third variable. Specifically, it means your independent variable causes the mediator, which in turn causes the dependent variable. It's like a chain reaction. (Note that you also need to have specific methods to get at this notion of cause, so I'm using these terms more loosely than I should be. But when introducing the concept of mediation, I find it easiest to frame it in terms of cause.)

There are two big ways to measure mediation. One is through 3 linear regressions: 1) effect of independent variable on dependent variable, 2) effect of independent variable on mediator, and 3) effect of both independent variable and mediator on dependent variable. If you observe the following:

  1. Independent variable has a significant effect on the dependent variable (equation 1)
  2. Independent variable has a significant effect on the mediator (equation 2)
  3. Independent variable no longer has a significant effect on the dependent variable, but the mediator has a significant effect on the dependent variable (equation 3)

you have evidence of mediation. Fortunately, you don't have to just eyeball your regression results. You would use the results of these regressions to conduct a Sobel test: check out this great website and online calculator to help with understanding and testing mediation.

The other way to test mediation is structural equation modeling. This would work for simple mediations, like the one described above, but is probably more useful when testing complex mediation - for instance, when you have multiple mediators in your chain reaction.

Moderation, on the other hand, is another term for "depends on." That is, the precise impact your independent variable has on your dependent variables depends on where you fall on the moderator. When I used to teach research methods, I'd often have students discuss what effect they think a certain independent variable would have on a dependent variable.

One example I used was divorce: what impact do they think divorce would have on a child's well-being? (I have to thank a past student for suggesting this topic, since they thought it was something most people have encountered: either directly because their parents are divorced, or indirectly because friends' parents might be divorced.) Partway through discussion, I would ask them what they think that impact depends on; what might change that impact? They always have lots of ideas. It might depend on age - it could have a stronger impact on younger children but less of an impact on high school or college-aged children. It might depend on whether the child has siblings - they thought it would be harder on an only child. As the list grew, I would explain to them that these are moderators. And we would say it as, for example, the effect of divorce on a child's well-being depends on their age.

Moderation is tested with interactions, which you can conduct with a factorial ANOVA or multiple regression, where you would create interaction terms. I usually use the latter method, because it gives you the same results as an ANOVA when all of your variables are discrete, and also can be used with continuous variables, while ANOVA cannot. If you're using the latter, I highly recommend this book by Aiken and West - kind of the bible on interactions in multiple regression.

So, as you can (hopefully) see, moderation and mediation reflect different kinds of relationships. (And if this explanation is unclear or you still have questions, please share them in the comments!) And because these are different kinds of relationships, there are situations where you could test both. Yes, crazy as it sounds, there are such things as moderated mediation and mediated moderation. A post for another day!

Friday, December 15, 2017

The Power of the Human Voice

Human beings are drawn to the sound of human voices. It's why overhearing half of a conversation can be so distracting. It's why DJs will talk over the intro of the song, but make sure they stop before the singer comes in. It's why Deke Sharon and Dylan Bell, two a cappella arrangers, recommend arrangements be kept short (less than 4 minutes).

And new research shows yet another way a human voice can have a powerful impact - it keeps us from dehumanizing someone we disagree with:
[F]ailing to infer that another person has mental capacities similar to one’s own is the essence of dehumanization—that is, representing others as having a diminished capacity to either think or feel, as being more like an animal or an object than like a fully developed human being. Instead of attributing disagreement to different ways of thinking about the same problem, people may attribute disagreement to the other person’s inability to think reasonably about the problem. [W]e suggest that a person’s voice, through speech, provides cues to the presence of thinking and feeling, such that hearing what a person has to say will make him or her appear more humanlike than reading what that person has to say.
They conducted four experiments to test their hypotheses: that dehumanization is less likely to occur when we hear the person speaking their thoughts, rather than simply reading them. It wasn't even necessary to see the person doing the talking - that is, video and audio versus audio only did not result in reliably different evaluations. The authors conclude:
On a practical level, our work suggests that giving the opposition a voice, not just figuratively in terms of language, but also literally in terms of an actual human voice, may enable partisans to recognize a difference in beliefs between two minds without denigrating the minds of the opposition. Modern technology is rapidly changing the media through which people interact, enabling interactions between people around the globe and across ideological divides who might otherwise never interact. These interactions, however, are increasingly taking place over text-based media that may not be optimally designed to achieve a user’s goals. Individuals should choose the context of their interactions wisely. If mutual appreciation and understanding of the mind of another person is the goal of social interaction, then it may be best for the person’s voice to be heard.

This research inspires some interesting questions. For instance, what about computer-generated voices? We know we're getting better at generating realistic voices, but what is the impact when you know the voice is generated by a machine and not another human being? Also, the researchers admit that they couldn't test the impact of visual and audio cues separately. But what if you had an additional condition where you see the person, but their words are displayed as captions instead?

What are your thoughts on this issue? And where would you like to see this research go in the future?


Concert Weekend is Almost Here

Tomorrow and Sunday, I'll be performing Händel's Messiah for the 25th and 26th time with my choir, the Apollo Chorus of Chicago. We're getting some great attention in anticipation of our concerts:
You can learn a lot more about Händel and Messiah at a pre-concert talk before Sunday's performance.

And you just might leave the performance happier than when you went in: in psychological research on the effect of mood, we usually play clips of music that reliably put people in either a good or bad mood. One frequently used song to put people in a good mood comes from Händel's Messiah - the Pastoral Symphony, also known as the Pifa, which sets the scene of the shepherds in the field who are about to be visited by angels:

Thursday, December 14, 2017

Statistical Sins: Not Double-Checking Results

In a previous Statistical Sins post, I talked about the importance of knowing one's variables. Knowing the range and source of your variables is necessary to make sure you're using the correct variables in your results. This is an important step in quality control, and really should be done first, prior to running analyses.

But good quality control shouldn't stop there. Results should be double-checked, and compared to each other, to make sure it all makes sense. This sounds painfully obvious, but unfortunately, this step is skipped too often. For instance, check out the results of the Verge technology survey, and specifically one of the glaring issues pointed out by Kaiser Fung on Junk Charts:
Last time, I discussed one of the stacked bar charts about how much users like or dislike specific brands such as Facebook and Twitter. Today, I look at the very first chart in the article.

This chart supposedly says users trust Amazon the most among those technology brands, just about the same level as customers trust their bank.

The problems of this chart jump out if we place it side by side with the chart I discussed last time.

Now, the two charts use different data - the first chart is a "trust" rating scale, while the second is a "like" rating scale. But notice that in the first chart, yellow is said to stand for "No opinion or don't use," while in the second chart, that category is reflected in gray. It seems highly unlikely that people have an opinion on liking something but not trusting that same institution. The two scales would likely be highly correlated with each other. Also, the chart on the left is missing the "somewhat" category, making the rating scale asymmetrical.

What probably happened is that the "no opinion" category was inadvertently dropped from the chart on the left, a mistake that should (could) have been noticed with a thorough review of the results.

I remember getting ready for a presentation once, and going over my slides when I noticed my standard deviations made no sense - they were too small. Cue a mini-panic attack, since I was presenting in 15 or so minutes at that point. I pulled out the printout of my results and noticed I'd accidentally used standard error instead of standard deviation. Fortunately, the room I was presenting in was not being used, and I was able to use the computer to pull up my file and change the values in tables.

When I first started working as a psychometrician, I was introduced to a very involved process of quality control - including having two people start with the same raw data, and going through the whole process of cleaning, creating new variables, and analyzing results, preferably with different analysis programs. Since R was my program of choice, I would usually use that, while my counterpart in quality control would often use SAS or SPSS.

Mistakes happen. This is one reason we have errata published in journals. And online articles can be easily corrected. The Verge would probably do well to fix some of these mistakes.

Wednesday, December 13, 2017

Harry Potter and the Gloriously Unhinged Story

Via Mashable, Botnik Studios, a creative community, just gave us a new Harry Potter chapter, that was written using a predictive algorithm trained on the seven Harry Potter books:

And it's hilarious. Here are a few excerpt:
"What about Ron magic?" offered Ron. To Harry, Ron was a loud, slow, and soft bird. Harry did not like to think about birds.

The password was "BEEF WOMEN," Hermione cried.

"Voldemort, you're a very bad and mean wizard," Harry savagely said. Hermione nodded encouragingly. The tall Death Eater was wearing a shirt that said, 'Hermione Has Forgotten How To Dance,' so Hermione dipped his face in mud.

The pig of Hufflepuff pulsed like a large bullfrog. Dumbledore smiled at it, and placed his hand on its head: "You are Hagrid now."

Tuesday, December 12, 2017

Roy Moore's Interview

Each day, we're hearing of more men and women coming forward to talk about inappropriate behavior from some of the most powerful men in the country. And while in many cases, those accusations are being treated as serious, in one instance, the reaction is just getting more and more tone-deaf. (Or perhaps I should say "Moore and Moore tone-deaf.")

In a move that I was absolutely certain was satire when I first heard about it, Roy Moore sat down with 12-year-old Millie March for an interview. The interview was arranged by a Pro-Trump group created by former Breitbart staffers. The goal of the move is to show that Moore can be in the same room as a child and not be creepy or assault her, right?

Dear god, where to begin on this one? Sure, Moore is on his best behavior when the cameras are rolling. But the issue brought forward with all of these accusations is a penchant for these powerful men to treat women like objects, to use them as means to an end. Is that any different than what is happening in this interview? Millie isn't being treated as a person; she's a prop. A bargaining chip used to get what Moore and this Pro-Trump group want - for Moore to be elected. Sure, he didn't assault or harass her. But he and everyone else involved in setting up that interview still objectified her.

Thankfully, I'm not the only one who is disgusted by this stunt:
On Twitter and elsewhere, people were quick to point to the uncomfortable decision to use a 12-year-old girl for a campaign push.

Democratic strategist Paul Begala called it “appalling” and “shocking.”

“The fact that he’s accused of sexual assaulting a 14-year-old girl, would sit down and do an interview with a 12-year-old, when he’s not talking to any journalists—it’s like he’s rubbing Alabamians’ noses in it,” he said.
In summation, I leave you with this brilliant tweet by Franchesca Ramsey:

Monday, December 11, 2017

Follow-Up on "Cat Person"

On Saturday, I shared a story published in The New Yorker: Cat Person by Kristen Roupenian. It's an excellent read I highly recommend.

Today, I discovered someone set up a Twitter account that just retweets negative reactions to the story by men. It's glorious.

And yes, before you say it, I know #NotAllMen hated this story. And I would imagine, many of these men who are responding negatively to the story are self-professed nice guys - in my estimation, probably the ones who say idiotic expressions like YOLO and "nice guys finish last" completely in earnest. But if words like "whore" and "cunt" and "bitch" are right on the tip of your tongue when a woman doesn't respond in the way you'd like, sorry, but you're not a nice guy. And if you find yourself rooting for a guy who calls a woman a whore just because she isn't interested in seeing him, I suggest you take a good long look at yourself: you're part of the problem.

Mama Always Told Me Not to Look into the Eyes of the Sun

What happens to your eyes when you look directly at the sun? A woman in her 20s, the solar eclipse, and 6 seconds have helped us get the answer to that question:
By the time the 20-something woman in today’s case study — published in the journal JAMA Ophthalmology — looked at the sun, it was already 70 percent covered by the moon. Three days later, she headed to the Mount Sinai’s New York Eye and Ear Infirmary, where doctors informed her that she had damaged her retinas by looking into a giant ball of glowing gas that emits radiation that burns your eyes.

Images of her eyes are the first time we’ve been able to see such detailed pictures, thanks to advances in optics. These showed that both eyes were affected, with the left eye especially having damaged photoreceptors and a lesion. Unfortunately, no treatment for eye damage from staring at the sun — technically called solar retinopathy — currently exists. 
It only took 6 seconds for her to do permanent and serious damage. Here are the images of her retina published in the article:

Hopefully you got to see the eclipse. And hopefully, you used the proper eye protection so you'll be able to see the next one.

Writing Goals for the New Year

It's only December 11, but I'm already thinking about next year and what I'd like to accomplish. And just as I did last December, many of my goals have to do with writing.

First, I'd like to write more short stories. I used to write them constantly, but now I've been focused more on writing blog posts and, during the month of November at least, novels. But short fiction is a great way to practice and improve, and can lead to ideas for longer fiction. Though I'd love to follow Ray Bradury's recommendation of writing a short story a week, that might be a lofty goal alongside writing regular blog posts and finishing NaNoWriMo novels. I plan on setting a number goal, but it will be less than 52.

Second, I want to participate in more writing contests. I just found this great curated list of writing contests for 2018, which includes many free contests. I had a lot of fun participating in the NYC Midnight short story contest earlier this year, and just registered to do that contest again. And if I'd like to participate in one before 2018, Writer's Digest has one running until this Friday, December 15. We'll see if I have time to sit down and write 1500 words between now and then.

My third goal is to get better about writing down ideas. I always say I'll remember them, and then later I can't. This may mean I'll have to stop in the middle of a conversation with someone to pull out my notebook or get out of bed in the middle of the night to jot something down. I'd love to see if I'm a more productive writer as a result.

Last year, I made the goal to average out to 1 blog post a day. Here I am, close to the end of the year, and I'm scrambling just a bit to make up a deficit. I'm going to try to make it, but I've decided not to set a number goal for blog posts in 2018. More doesn't necessarily equal better. I'll have some blogging goals, and one of those will definitely be to participate in April A to Z again (you can check out 2016 and 2017), though I'm not sure what my theme will be. More specific goals later.

Sunday, December 10, 2017

Statistics Sunday: Bayesian Inference in a Galaxy Far Far Away

I was recently rewatching Rogue One with a friend the other day. Since this is part of the Star Wars universe, it of course had to have some of the usual Star Wars elements: strange-looking aliens, someone uttering the line "I've got a bad feeling about this," and droids rambling off odds of different outcomes. Always bad outcomes - seriously, why don't the droids ever feel the need to say, "The odds are 50 to 1 that everything is going to turn out okay," or "There are puppies ahead; 200 to 1 odds of many puppy snuggles"?

But I digress. Because what I really want to talk about are those odds, and why they tell us something about the droids. True, they're sprinkled into the movies mainly as jokes. We don't really need to pay attention to the odds, other than to be impressed when the bad thing the droid was calculating on about doesn't end up happening. For instance, from The Empire Strikes Back:

Or this one, from Rogue One:

The information from the droid isn't actually that important. The point is that the line should make you laugh. But I was thinking about how this information is used in the Star Wars universe, and more importantly, where it could be derived from. And I came to an important realization:

These droids must be using Bayesian inference.

It's incredibly unlikely that these probabilities are empirically derived (BTW, this approach of using completely empirical data to derive probabilities is called Frequentism). C-3PO, for instance, says the odds of successfully navigating an asteroid field are 3,720 to 1. What that means is he has to have data on at least 3,721 attempts at navigating an asteroid field. And of course, you'd want more data than that. Just because 1 attempt out of the 3,721 was successful doesn't mean those are the true odds. It's possible the odds are actually 10,000 to 1. You need a lot of data to empirically derive the probability of something.

And what about K-2SO simply saying the probability that Jyn will use the weapon against Cassian is "very high"? It doesn't actually matter what the probability is, but where does that value come from? Sure, it's possible that K-2SO is simply using the probability that an escaped convict would use a weapon on another person, but still, it doesn't seem like there would be a lot of data just laying around. And if K-2SO prefers to use data specific to the situation, he'd need data on the outcome of a very specific situation, one that has likely never happened.

But it isn't unusual for people/droids/whatever to want to know the odds of something that might never have happened before - an event so rare it's impossible to observe it naturally but that you need to be prepared for in the unlikely event that it happens. Insurance companies need to know the potential risks of taking on a new account. Governments need to prepare for potential wars. And scientists need to be able to make causal inferences from their data, sometimes data not collected in such a way to infer cause. To a classical statistician, those puzzles would be difficult, maybe impossible. But to a Bayesian, it is completely possible to generate odds on a thing that has never happened before.

(If you need to refresh your memory on Bayes' Theorem, check out posts here, herehere, and here. And as soon as I learn how to invent more free time, I'm going to sit down and learn Bayesian statistics so I can stop Dunning-Kruger-ing my way through it.)

What K-2SO and C-3PO are generating are conditional probabilities - the probability of something happening given known probabilities about the present situation. These known probabilities are called "priors," and the droid could draw on whatever priors make sense. So C-3PO might be drawing on data about the maneuverability of the Millennium Falcon, the probability of crashes while being pursued, size and motion of the asteroids, and even observations about Hahn Solo himself. Using those conditions, C-3PO can calculate the probability that they'll make it out of the asteroid field alive.

(Side note: Successfully navigating an asteroid field actually wouldn't be that difficult. Check out this post from The Math Dude at Quick and Dirty Tips.)

And just as with the asteroids, K-2SO doesn't need to have the empirical odds that Jyn will use her "found" blaster on Cassian. Instead, he could use known information on Jyn's proclivity toward violence, rates at which convicted criminals use guns, and even probability of a weapon being fired in emotional situations or probability that Cassian will piss Jyn off somehow. K-2SO could use whatever priors make sense, and use that information to derive this "very high" probability.

Hopefully you're as excited as I am about seeing The Last Jedi!

May the Force be with you.

Saturday, December 9, 2017

Must Read: "Cat Person" by Kristen Roupenian

A friend and fellow writer shared this story on Facebook: "Cat Person" by Kristen Roupenian, published in The New Yorker. It's excellently written, and captures some feelings I imagine are very close to home for many women, myself included.

The author, Kristen Roupenian

And if you enjoyed the story as much as I did, I recommend also checking out Deborah Treisman's Q&A with Roupenian:
Your story in this week’s issue, “Cat Person,” is both an excruciating bad-date story and, I think, a kind of commentary on how people get to know each other, or don’t, through electronic communication. Where did the idea for the story come from?

Especially in the early stages of dating, there’s so much interpretation and inference happening that each interaction serves as a kind of Rorschach test for us. We decide that it means something that a person likes cats instead of dogs, or has a certain kind of artsy tattoo, or can land a good joke in a text, but, really, these are reassuring self-deceptions. Our initial impression of a person is pretty much entirely a mirage of guesswork and projection. When I started writing the story, I had the idea of a person who had adopted all these familiar signifiers as a kind of camouflage, but was something else—or nothing at all—underneath.

Do you think that the connection that these two form through texting is a genuine one?

I think it’s genuine enough as far as it goes, but it doesn’t go very far. That Robert is smart and witty is true, but does the fact that someone’s smart and witty mean that he won’t murder you (as Margot wonders more than once), or assault you, or say something nasty to you if you reject him? Of course it doesn’t, and the vertigo that Margot feels at several points in the story is the recognition of that uncertainty: it’s not that she knows that Robert is bad—because if she knew that she would be on solid ground—but that she doesn’t know anything at all.

Anyone Else Want to Be a Cop in New Zealand Now?

New Zealand has just released a hilarious police recruitment video:

The best part? The video features actual New Zealand police officers:
Constable Zion Leaupepe gets the first speaking role in the video, and she’s joined by more than 70 of her colleagues, with police commissioner Mike Bush making a quick appearance as himself. There’s even a police cat briefly glimpsed in the climatic chase for a surprising four-legged crook. And in an especially cheeky touch, the end credits refuse to name the members of the AOS (Armed Offenders Squad) who took part in the shoot.

Friday, December 8, 2017

Videos to Help You Get Through Your Friday

It's been a long week. So I've got some videos lined up to help me get through the day. First up, Postmodern Jukebox brings us this awesome Motown-style cover of "Tomorrow" from Annie; the amazingly talented Shosanna Bean sings lead, and Toni Scruggs and Tiffany Smith provide backup (plus a short interjection of "Hard Knock Life"):

The Room gets songifyed:

Two members of my choir talk about singing Messiah (our first concert is a week from tomorrow!):

Jenny Nicholson talks about playing The Last Jedi Bingo:

Thursday, December 7, 2017

Today's Links

I've got a long day ahead of me today, including a conference call this evening until around 7:30. But here are the links I have sitting open that I'll read/watch/do later:

And I Say I Never Win Anything

Yesterday at work, our building had an ugly sweater party. I entered a drawing, thinking I wouldn't win. But today, I got to work to find a voicemail and email telling me I'd won a gingerbread house! Here it is:

It smells amazing.

Wednesday, December 6, 2017

And the Winner Is...

Time magazine announced the winner of the 2017 Person of the Year: the #MeToo movement:

Time refers to the women behind the movement as "The Silence Breakers." And though this movement has received widespread attention this year, the hashtag was actually started 10 years ago by Tarana Burke.
#MeToo rose to prominence as a social media campaign in the wake of high-profile accusations against Hollywood producer Harvey Weinstein. After actress Alyssa Milano popularized the hashtag, thousands of women began sharing their stories about the pervasive damage wrought by sexual harassment and by "open secrets" about abuse.

The movement's empowering reach could be seen in the platform on which Time announced its choice: the Today show. It was just one week ago that NBC fired the morning program's longtime and powerful co-host, Matt Lauer, over a detailed complaint of "inappropriate sexual behavior in the workplace."

While the most high-profile #MeToo stories have come from women and men who work in the movies and media, the Time article also features women who work hourly jobs, some of whom want to remain anonymous. The magazine's cover portrait includes strawberry picker Isabel Pascual, lobbyist Adama Iwu and former Uber engineer Susan Fowler along with Ashley Judd and Taylor Swift.

Winning Books on Goodreads

Goodreads just announced their winners of Best Books 2017:
There is surprisingly little overlap between this list and Amazon's Top 100. You might remember I lamented that Amazon didn't include The Radium Girls, which won History & Biography here, or What It Means When a Man Falls From the Sky, which was nominated for Best Fiction here, but did not win. In fact, the only books from this list that made Amazon's list were Little Fires Everywhere (2), The Sun and Her Flowers (60), and The Hate U Give (21).

I've added many of these books to my reading list. In fact, The Radium Girls has been sitting on my to-read shelf for months now, and I keep picking up Sleeping Beauties in the bookstore, only to put it back down and tell myself not to buy it until I'm ready to read it. 

I may have a book problem, but I'm kind of okay with it...

Monday, December 4, 2017

The End of Classical Statistics As We Know It? Probably Not

Brian Caffo just released a video discussing something I've been thinking about as I learn more about Artificial Intelligence - will AI take over classical statistics? Do I need to be worried about my job being performed by AI in the future? Check the video out here:

There are a variety of reasons why I think statistics and statisticians/psychometricians will remain useful, even as technology advances. Not only are statistical results more parsimonious and easier to wrap one's head around than machine learning results, but many of the decisions statisticians and psychometricians make are still somewhat subjective.

If I give a dataset to 12 statisticians, along with the questions I want to answer and hypotheses I want to test, I'm likely to get 12 different sets of analyses. Some of the decisions we make are more art than science, or even simply a matter of preference. Some statisticians put more trust in the robustness of statistical results to deviations from the key assumptions. There are a variety of explanations as to why the 12 statisticians would come back with 12 different analysis approaches, though some approaches may be more justifiable than others.

Enjoy the video and let me know your thoughts and reactions!

Sunday, December 3, 2017

Statistics Sunday: Practice Effects and Modern Testing Approaches

In their ground-breaking book, Cook and Campbell introduced us to threats to validity. Remember that validity refers to truth and comes in a four flavors: internal, external, construct, and statistical conclusion. You can learn more about validity at the link, above but as a brief refresher:

  • Internal validity - The effects (dependent variable) observed in the study are caused by the independent variable. Maximizing internal validity means isolating these two variables, so that you can show a true causal relationship between them.
  • External validity - The findings of the study can be generalized. The more control you have over the situation, the higher your internal validity; but this results in lower external validity, because it's difficult to generalize from a highly controlled environment to a less controlled environment.
  • Construct validity - The variables measured in the study actually represent the underlying constructs. We can't hold a tape measure up to your brain to find out your cognitive ability; we have to give you a measure of cognitive ability, which may or may not truly measure the underlying construct.
  • Statistical conclusion validity - The statistical analyses used to draw conclusions have been correctly applied and interpreted. If, for instance, you don't quite meet the assumptions of a test, you weaken your statistical conclusion validity. (That doesn't mean your findings aren't true, just that the probability that they're true is lower than if you fully met the assumptions.)
It would be impossible to design a study that maximizes all four types of validity. You could probably maximize a couple of them at once, but internal and external validity, for instance, involve trade-offs. And any methodological approach you take is going to impact one or more types of validity. These aspects that decrease a type of validity are called threats to validity.

Usually, what we want to get at in our study is to establish a causal relationship between two things. At least, any of us from a field that focuses on experimentation (where independent variables can be manipulated) is interested in establishing causal relationships. We have different methods that we use to try to establish cause. One way that we do this is by taking a sample of people, randomly assigning them to some level of our independent variable and measuring the effect of the IV on the dependent variable.

The problem with this approach, of course, is that the people in the 2 or more experimental groups are different from each other. We do many things to try to equalize groups, but we can never truly know our groups are equivalent.

So another way to handle this is by having one group of people, delivering the different levels of the IV in a randomized order, and measuring the dependent variable after delivering each IV. The problem with this approach is that we could have carryover effects. We don't know if the dependent variable we observe is due to the intervention they just received, or the one they received before that. We can't wipe a person's memory between each segment.

In fact, any time you expose a person to the same measure more than once, you're going to see differences in scores, due to practice. If your intervention is meant to improve performance, simply being exposed to a measure will result in improvements regardless of whether the person received some intervention.

However, there could be a way around this using modern testing approaches, specifically computer adaptive testing (CAT). I'm planning on writing a longer post describing how CAT work, but the short answer is that CATs determine the next item based on your response to the previous item. If you get the item correct, you get a more difficult item. If you get the item incorrect, you get an easier item. 

CATs also use large item banks, so the easy item you receive might be totally different than the easy item I receive, even if they are at the same level of difficulty. What this means is that you're highly unlikely to see the same item twice. And if your ability goes up (or down), you are really unlikely to see the same item twice. That's not to say you might not still observe practice effects, but CAT helps reduce those effects.

Obviously, CAT can't be used for everything and to use it requires many things like access to computers (which may limit how many people you can test at once, depending on available resources), ability to install programs on such computers to deliver CATs, and a large bank of items. But I imagine, over time, as CAT becomes more and more common, we're liable to see more studies using it.

What has been your experience with computer adaptive testing? Or practice effects?