Sunday, October 1, 2017

Statistics Sunday: Free Data Science and Statistics Resources

I'm working on building up a list of some free resources for data science and/or statistics. Through the data science conference and a book I recently finished, I've learned about some awesome resources already - I know there's more out there, but I wanted to share what I've found so far. This will remain a living document that I'll continue to update as I discover more resources.


Data Science E-Books 

Many of these books are statistically-oriented, but then a big part of data science involves drawing conclusions from the data. Hence, the line between the list below and the next list on statistics resources may be a bit blurry.
  • Analyze Survey Data for Free edited by Anthony Joseph Damico - this edited online resource, which assumes knowledge of R, offers step-by-step instructions for exploring online survey data; entries are contributed by different users and some entries are still awaiting a contributor if you're so inclined!
  • Think Python by Allen B. Downey - an introduction to one of the most popular programming languages for data science, Python
  • Think Stats: Exploratory Data Analysis in Python by Allen B. Downey - an intro to stats and probability using Python, written by the same author as Think Python above; while this book is meant to introduce statistics to programmers, it could also be a good way for statisticians to get their feet wet in Python
  • Deep Learning by Ian Goodfellow, Yoshua Bengio, & Aaron Courville - a free e-book on machine learning, specifically deep learning
  • R for Data Science by Garrett Grolemund & Hadley Wickham - this book teaches you how to pull data into R, and clean, model, and visualize; this book was definitely talked up at the data science conference (thanks to a reader for sharing the link to the free e-book version!)
  • Ten Signs of Data Science Maturity by Peter Guerra & Kirk Borne - Borne's was one of my favorite presentations from the data science conference I attended; this e-book highlights what indicates an organization is ready to venture into data science 
  • The Elements of Statistical Learning Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, & Jerome Friedman - predictive modeling and machine learning approaches
  • An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, & Robert Tibshirani - covers many of the same topics as Elements above, but geared more toward beginners in statistical learning; if these are new concepts for you, read this book before Elements of Statistical Learning
  • Python Programming WikiBook - another introduction to Python, which also includes extensions into other programming languages and additional resources/links
  • R Programming WikiBook - an introduction to programming in R, another popular programming language for data science
  • School of Data Handbook - this handbook, which goes along with the courses available through School of Data, offers recipes for scraping, cleaning, and filtering data to get you started on your data science journey
Also, there's a new R package - dslabs, described here - designed to help teach data science.

Statistics E-Books and Resources

  • Peter Miksza has created some Shiny apps to visualize statistical concepts, and manipulate different factors (e.g., type of data) to see how it affects the resulting visualization.
  • Correlation and Causation: The Trouble with Story Telling by Lee Baker - a sort of follow-up to my previous discussion of spurious correlations, this book discusses the notion of probability and alternative explanations for correlations
  • The Probability Cheatsheet by William Chen - technically not an e-book; it's a short PDF document that summarizes key probability concepts, like Simpson's paradox, the Law of Large Numbers, and conditional probability
  • OpenIntro Statistics by David M. Diaz, Christopher D. Barr, & Mine Çetinkaya-Rundel - a free introductory statistics textbook and additional statistical resources
  • Think Bayes: Bayesian Statistics Made Simple by Allen B. Downey - yet another free e-book from Downey (see Think Python and Think Stats above), introducing Bayes in mathematical notation (if you prefer mathematical notation when learning stats; not everyone does); it also uses Python for computer-aided analysis, so this book also straddles the statistics-data science line
  • Research and Statistical Support Services Short Courses by Richard Herrington & Jonathan Starkweather - also not exactly an e-book: this site, part of the R&SS at University of North Texas, contains multiple short documents teaching the basics of statistical software, and a few other computer tools that could aid in research
  • How to Share Data with a Statistician by Jeff Leek - this GitHub document describes how to format data to be shared with a statistician, in order to facilitate efficient and timely analysis
  • Introduction to Applied Bayesian Statistics and Estimation for Social Scientists by Scott M. Lynch - an introduction to Bayesian analysis and the use of what are called MCMC (Markov chain Monte Carlo) methods; this book starts with a refresher of classical statistics before introducing the Bayesian notion of probability
  • Learning Statistics with R by Daniel Navarro - what started off as lecture notes for an introductory statistics class taught with R became an e-book; there's even an R package (lsr) to go along with the book
Do you have any free resources you would recommend?

Like free stuff? Here are some free meta-analysis tools.

No comments:

Post a Comment