Data Science E-Books
Many of these books are statistically-oriented, but then a big part of data science involves drawing conclusions from the data. Hence, the line between the list below and the next list on statistics resources may be a bit blurry.
- Analyze Survey Data for Free edited by Anthony Joseph Damico - this edited online resource, which assumes knowledge of R, offers step-by-step instructions for exploring online survey data; entries are contributed by different users and some entries are still awaiting a contributor if you're so inclined!
- Think Python by Allen B. Downey - an introduction to one of the most popular programming languages for data science, Python
- Think Stats: Exploratory Data Analysis in Python by Allen B. Downey - an intro to stats and probability using Python, written by the same author as Think Python above; while this book is meant to introduce statistics to programmers, it could also be a good way for statisticians to get their feet wet in Python
- Deep Learning by Ian Goodfellow, Yoshua Bengio, & Aaron Courville - a free e-book on machine learning, specifically deep learning
- R for Data Science by Garrett Grolemund & Hadley Wickham - this book teaches you how to pull data into R, and clean, model, and visualize; this book was definitely talked up at the data science conference (thanks to a reader for sharing the link to the free e-book version!)
- Ten Signs of Data Science Maturity by Peter Guerra & Kirk Borne - Borne's was one of my favorite presentations from the data science conference I attended; this e-book highlights what indicates an organization is ready to venture into data science
- The Elements of Statistical Learning Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, & Jerome Friedman - predictive modeling and machine learning approaches
- An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, & Robert Tibshirani - covers many of the same topics as Elements above, but geared more toward beginners in statistical learning; if these are new concepts for you, read this book before Elements of Statistical Learning
- Python Programming WikiBook - another introduction to Python, which also includes extensions into other programming languages and additional resources/links
- R Programming WikiBook - an introduction to programming in R, another popular programming language for data science
- School of Data Handbook - this handbook, which goes along with the courses available through School of Data, offers recipes for scraping, cleaning, and filtering data to get you started on your data science journey
- Correlation and Causation: The Trouble with Story Telling by Lee Baker - a sort of follow-up to my previous discussion of spurious correlations, this book discusses the notion of probability and alternative explanations for correlations
- The Probability Cheatsheet by William Chen - technically not an e-book; it's a short PDF document that summarizes key probability concepts, like Simpson's paradox, the Law of Large Numbers, and conditional probability
- OpenIntro Statistics by David M. Diaz, Christopher D. Barr, & Mine Çetinkaya-Rundel - a free introductory statistics textbook and additional statistical resources
- Think Bayes: Bayesian Statistics Made Simple by Allen B. Downey - yet another free e-book from Downey (see Think Python and Think Stats above), introducing Bayes in mathematical notation (if you prefer mathematical notation when learning stats; not everyone does); it also uses Python for computer-aided analysis, so this book also straddles the statistics-data science line
- Research and Statistical Support Services Short Courses by Richard Herrington & Jonathan Starkweather - also not exactly an e-book: this site, part of the R&SS at University of North Texas, contains multiple short documents teaching the basics of statistical software, and a few other computer tools that could aid in research
- How to Share Data with a Statistician by Jeff Leek - this GitHub document describes how to format data to be shared with a statistician, in order to facilitate efficient and timely analysis
- Introduction to Applied Bayesian Statistics and Estimation for Social Scientists by Scott M. Lynch - an introduction to Bayesian analysis and the use of what are called MCMC (Markov chain Monte Carlo) methods; this book starts with a refresher of classical statistics before introducing the Bayesian notion of probability
- Learning Statistics with R by Daniel Navarro - what started off as lecture notes for an introductory statistics class taught with R became an e-book; there's even an R package (lsr) to go along with the book
Like free stuff? Here are some free meta-analysis tools.