Allen Downey wrote great short books to introduce statistics in a fun way, going over examples and exercises using python. They are freely available to boot.
###Update!
A new version of Think Stats has been released, and we switched over to the new one. It's a massive overhaul and includes many great new features for us, most obviously the v. convenient integration of packages that are staples in the bootcamp:pandas
,scipy
, andstatsmodels
.
Think Stats is a great book to both refresh/learn the most critical statistics topics and gain experience in how you'd apply statistics analysis to problems using python. It covers typical statistics topics from a modern, simulation/coding based perspective. The chapters are quite concise and easy to read.
You can download a PDF or read the book online here. Of course, if you are so inclined, you can also buy a hard copy on Amazon, but that is not necessary.
The older 1st edition is still available here: linky
We will be focusing on the first 9 chapters. You can go through these chapters in 6 to 9 hours (depending on how familiar you are with statistics and python) Please do so. If it takes somewhat longer, that's fine. However, don't get stuck too long on a single chapter. This preparation will help you a lot, and it will provide a good initial exposure to using python for data analysis, but it is not your only chance to internalize these topics. If at any point you feel overwhelmed, don't worry. You don't need to master all of this in prework.
5 of the 6 required exercises are also from this book. It would be a good idea to tackle these exercises as you work your way through the book. For example, the first exercise is 2.4 at the end of chapter 2. The best time to start working on it is after you read Chapter 2.
If you can finish the required exercises with time to spend, we suggest working on some of the optional ones. These may take longer, but will definitely improve your skills and bootcamp experience. Take your time with these exercises, don't rush them. Don't push yourself to finish any of them, if you're short on time. Even completing a single of these is an excellent bonus.
Another important subject is the Bayesian approach to probability, where probability is approached as the state of knowledge rather than the expected frequency of things. This Bayesian approach is used in a lot of data science applications. Luckily, Downey used his method in Think Stats to write another free introduction book on Bayes with python-based examples and exercises. It is called Think Bayes and you can find it here.
Please read the first two chapters of this book will work both as a reinforcement of probabilities (covered by Think Stats as well) and an introduction to the Bayesian framework.
The last required exercise is from Think Bayes, and two optional exercises are also Bayesian problems.