Thursday, September 28, 2017

Statistical Sins: Nicolas Cage Movies Are Making People Drown and More Spurious Correlations

As I posted yesterday, I attended an all-day data science conference online. I have about 11 pages of typed notes and a bunch of screenshots I need to weed through, but I'm hoping to post more about the conference, my thoughts and what I learned, in the coming days.

At work, I'm knee-deep in my Content Validation Study. More on that later as well.

In the meantime, for today's (late) Statistical Sins, here's a great demonstration of why correlation does not necessarily infer anything (let alone causation). I can't believe I didn't discover this site before now: Spurious Correlations. Here are some of my favorites:

As I mentioned in a previous post, a correlation - even a large correlation - can be obtained completely by chance. Statistics are based entirely on probabilities, and there's always a probability that we can draw the wrong conclusion. In fact, in some situations, that probability may be very high (even higher than our established Type I error rate). 

This is a possibility we always have to accept; we may conduct a study and find significant results completely by chance. So we never want to take a finding in isolation too seriously. It has to be further studied and replicated. This is why we have the scientific method, which encourages transparency of methods and analysis approach, critique by other scientists, and replication.

But then there's times we just run analyses willy-nilly, looking for a significant finding. When it's done for the purpose of the Spurious Correlation website, it's hilarious. But it's often done in the name of science. As should be demonstrated above, we must be very careful when we go fishing for relationships in the data. The analyses we use will only tell us the likelihood we would find a relationship of that size by chance (or, more specifically, if the null hypothesis is actually true). It doesn't tell us if the relationship is real, no matter how small the p-value. When we knowingly cherry pick findings and run correlations at random, we invite spurious correlations into our scientific undertaking. 

This approach violates a certain kind of validity, often called statistical conclusion validity. We maximize this kind of validity when we apply the proper statistical analyses to the data and the question. Abiding by the assumptions of the statistic we apply is up to us. The statistics don't know. We're on the honor system here, as scientists. Applying a correlation or any statistic without any kind of prior justification to examine that relationship violates assumptions of the test.

So I'll admit, as interested as I am in the field of data science, I'm also a bit concerned about the high use of exploratory data analysis. I know there are some controls in place to reduce spurious conclusions, such as using separate training and test data, so I'm sure as I find out more about this field, I'll become more comfortable with some of these approaches. More on that as my understanding develops.


  1. The causation in these correlations are quite obvious to me.
    1. The more nerdy kids who play arcade video games, the more those kids get interested in computers and seek degrees. Although I suppose there would be a lag there.
    2. The more films Cage puts out, the more people are either so distracted thinking about them or feel impervious to harm that they fall into ponds.
    3. The longer you stand still, the more likely a spider will bite you. So if you are standing still longer to spell longer words...well, clearly spelling bees are dangerous.

  2. And that is where movies are a sweet reminder of what we are and what we have. Let us take "Forrest Gump" for instance that portrays the innocence of a guy, with the spirit to make something out of himself despite all the

  3. The film was the first computer-animated family film to be produced, and was the first feature film in history to be made entirely using Computer-Generated Imagery (CGI). It also features voice acting by several Hollywood stars family film movies

  4. Because of the limited number of movie screens available, getting a theatrical release is difficult, with strong competition from large Hollywood studios for those limited screens. bluray 1080p quality

  5. Can I say that of a relief to uncover somebody that actually knows what theyre preaching about on the web. You actually realize how to bring a concern to light and make it crucial. Workout . should check this out and see why side in the story. I cant believe youre not more well-liked as you definitely develop the gift. easter day slaughter

  6. Tree Hotel associate exuberant love of Indian street food and a kitsch sense of humor. The decoration is bright and spirited with Indian vogue decorations, tawdrily coloured chairs. The food is equally exuberant with a variety of assorted indian dishes to settle on from, some gently and exquisitely spiced, others fiery and made. The conjointly do a good curry and conjointly western offerings (t-bone cut of meat and a mix grill) for those not within the mood for mix and match dishes.

    Hotel Oxford
    Oxford Curry in Oxford

  7. Princesses, fairies and mermaids, I am sure you all picture girls parties where it's all glitz and glam, but is it really so? Do girls only care about Princesses, fairies and mermaids, or does their need for exploration and imagination play a role? Well, read on and you shall find out. We proudly present our list of the 10 most popular birthday party themes for girls and we hope this list will ease your task of organising your children's birthday party. stag parties

  8. Printable multiplication table PDF has various sums for practicing different questions. It enables every kid with different abilities to learns equally.

  9. This is a list of the top ten kung fu movies of all time. This top ten kung fu movie list includes karate, and martial arts, from Bruce Lee, Jet Li, and Jackie Chan to Billy Jack and the Karate Kid. Each of these films has a special place in martial arts history. Fearless, with Jet Li is considered one of the best representations of classic Kung Fu. What are your top ten kung fu movies? koktale

  10. No more s***. All posts of this qaulity from now on saregamapa 2017 winner

  11. I came onto your blog while focusing just slightly submits. Nice strategy for next, I will be bookmarking at once seize your complete rises...
    jadwal film bioskop

  12. To really show you how brilliant his performance was, the video clip of real-life Dicky during the credits showed you just how well Bale nailed that performance. It had to almost be like looking in a mirror for Dicky.
    moviebox download