Wednesday, August 16, 2017

Statistical Sins: Reinventing the Wheel - Some Open Data Resources

For today's Statistical Sins post, I'm doing things a little differently. Rather than discussing a specific study or piece of media about a study, I'm going to talk about general trend. There's all this great data out there that could be used to answer questions, but I still see study after study collecting primary data.

Secondary data is a great way to save resources, answer questions and test hypotheses with large samples (sometimes even random samples), and practice statistical analysis.


To quickly define terms, primary data is the term used to describe data you collect yourself, then analyze and write about. Secondary data is a general term for data collected by someone else (that is, you weren't involved in that data collection) that you can use for your own purposes. Secondary data could be anything from a correlation matrix in a published journal article to a huge dataset containing responses from a government survey. And just primary data can be qualitative, quantitative, or a little of both, so can secondary data.

We really don't have a good idea of how much data is floating around out there that researchers could use. But here are some good resources that can get you started on exploring what open data (data that is readily accessible online or that can be obtained through an application form) are available:
  • Open Science Framework - I've blogged about this site before, that lets you store your own data (and control how open it is) and access other open data
  • Data.gov - The federal government's open data site, which not only has federal data, but also links to state, city, and county sites that offer open data as well
  • Global Open Data Index - To find open data from governments around the world
  • Open Data Handbook - This site helps you understand the nature of open data and helps you to make any data you've collected open, but there's also a resources tab that offers some open data sources
  • Project Open Data - Filled with great resources to help you on your open data journey, including some tools to convert data from one form (e.g., JSON files) to an easier-to-use form (e.g., CSV)
  • Open Access Button - Enter in a journal article you're reading, and this site will help you find or request the data
  • GitHub Open Data - Another open data option for some fun datasets, such as this dataset of Scrabble tournament games
And there's also lots of great data out there on social media. Accessing that data often involves interacting with the social media platform's API (application program interface). Here's more information about Twitter's API; Twitter, in general, is a great social media data resource, because most tweets are public. I highly recommend this book if you want to learn more about mining social media data:

No comments:

Post a Comment