Friday, November 16, 2018

Great Post on Using Small Sample Sizes to Make Decisions

It's been a busy month with little time for blogging, but I'm planning to get back on track soon. For now, here's a great post on the benefits of using small samples to inform decisions:
When it comes to statistics, there are a lot of misconceptions floating around. Even people who have scientific backgrounds subscribe to some of these common misconceptions. One misconception that affects measurement in virtually every field is the perceived need for a large sample size before you can get useful information from a measurement.

[I]f you can learn something useful using the limited data you have, you’re one step closer to measuring anything you need to measure — and thus making better decisions. In fact, it is in those very situations where you have a lot of uncertainty, that a few samples can reduce uncertainty the most. In other words, if you know almost nothing, almost anything will tell you something.
The article describes two approaches - the rule of five (taking a random sample of 5 to draw conclusions) or the urn of mystery (that a single case from a population can tell you more about the makeup of that population). The rule of five seems best when trying to get a continuous value (such as, in the example from the post, the average commute time of workers in a company), while the urn of mystery seems best when trying to determine if a population is predominantly one of two types (in the post, the example is whether an urn of marbles contains predominantly marbles of a certain color).

Obviously, there are times when you need more data. But if you're far better off making decisions with data (even very little) than with none at all.

1 comment:

  1. I appreciated the link to Hubbard's blog post. His work is new to me.

    Per his article, I'm not sure I want to trust most conclusions derived from 5:10,000 people, or in such an instance say I know more about whatever topic is under investigation than I did prior to the "analysis." I believe the danger arises when that's all (or most) of the quantitative information incorporated in a decision-making process, or when those 5 people/sources don't meet the criterion of strict random selection--which Hubbard points out is not a high probability event when humans do the picking. I doubt, for example, that picking 5 Facebook posts about a specific topic (e.g., human-caused climate change) is likely to tell me much about the median "factfulness" of 10,000 Facebook posts on that topic.

    There are more than a few politicians who already use numbers inanely. They justify their pronouncements even though they "know almost nothing [while asserting] almost anything will tell you something." I prefer politicians (or others; think trolls) not add to the chaos. All that accomplishes is reduction of the utility of valid inferences drawn from carefully selected samples (see

    Overall, I would say Hubbard's antecedent ("[I]f you can learn something useful using the limited data you have ...") has, in practice, a limited degree of attachment to the consequent ("thus making better decisions.") I fear most people blithely leave out the middle step ("you're one step closer to measuring anything you need to measure.")

    In another vein, here's a quote from a comment on an article (about the utility of numbers and judgment in science, interestingly) linked to via Andrew Gelman's blog yesterday. I thought you might enjoy it: "For a lot of people writing is an agony; it’s a part of what we do as scholars that they least enjoy. For me writing is a complete and total joy, and if I’m not writing I’m miserable. I have always written a lot. For years, before I wrote for The New Yorker, I wrote an op-ed every day as practice and shoved it in a drawer. It’s not about being published, it’s about the desire to constantly be writing. It’s such a strongly felt need that if it was something socially maladaptive it would be considered a vice." Jill Lepore.

    Thanks for blogging,