- Think Stats - Think Stats is an introduction to Probability and Statistics for Python programmers.
- An Introduction to Statistical Learning - This book provides an introduction to statistical learning methods. Website contains a free PDF. Excellent resource!
- Introduction to Statistical Learning Video Series - a YouTube playlist of videos accompanying the chapters of the ISL book.
- An Introduction to Statistical Learning with Python Code - this repository contains Python code for a selection of tables, figures and LAB sections from the book 'An Introduction to Statistical Learning with Applications in R' by James, Witten, Hastie, Tibshirani (2013). The book was originally written with examples in R, Jordi Warmenhoven "translates" it into Python code.
- The Elements of Statistical Learning - This book is a more advanced book, but also a great resource. You can download a PDF version for free.
- The Probability and Statistics Cookbook - The probability and statistics cookbook is a succinct representation of various topics in probability theory and statistics. It provides a comprehensive mathematical reference reduced to its essence, rather than aiming for elaborate explanations. Source; GitHub Repo.
- Khan Academy's Statistics and Probability classes are a pretty good introduction to statistics and probability.
- CS 229 ― Machine Learning Notes - "cheat sheets" for Stanford University's CS 229 - Machine Learning course.
- Statistics Cheatsheet - from Stanford University's CME 106 - Introduction to Probability and Statistics for Engineers course.
- Probablility Course in Khan Academy
- Basics of Probability for Data Science
- Probability Cheatsheet - from Stanford University's CME 106 - Introduction to Probability and Statistics for Engineers course.
- STAT 414 Intro Probability Theory (Penn State)
- Think Bayes - this is a great book on Bayesian statistics. Free PDF can be downloaded as well.
- Probabilistic Programming & Bayesian Methods for Hackers - an intro to Bayesian methods and probabilistic programming from a computation/understanding-first, mathematics-second point of view.
- The GitHub repo for the book can be found HERE. There's also a NBViewer version of the book.
- Bayesian Reasoning and Machine Learning - a book about Bayesian reasoning and Machine learning, by David Barber from University College of London Department of Computer Science. An ebook can be downloaded from here.
- Doing Bayesian Data Analysis - Python/PyMC3 - This repository contains Python/PyMC3 code for a selection of models and figures from the book 'Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan', Second Edition, by John Kruschke (2015).
- Statistical Rethinking with Python and PyMC3 - Statistical Rethinking is an incredible good introductory book to Bayesian Statistics, its follows a Jaynesian and practical approach with very good examples and clear explanations. In this repository the codes were ported (originally in R and Stan) in the book to PyMC3.
- Bayes Rules! An Introduction to Bayesian Modeling with R - an online book with goal of "making modern Bayesian thinking, modeling, and computing accessible to a broad audience."
- Basic Statistics in Python: Descriptive Statistics
- Basic Statistics in Python: Probability
- Chris Albon's Statistics Tutorials
- An Introduction to Statistical Learning with Python Code - his repository contains Python code for a selection of tables, figures and LAB sections from the book 'An Introduction to Statistical Learning with Applications in R' by James, Witten, Hastie, Tibshirani (2013). The book was originally written with examples in R, Jordi Warmenhoven "translates" it into Python code.
- STAT 415 Intro Mathematical Statistics (Penn State) - great resource for more details on statistical inference. Its preceding course, STAT 414 Intro Probability Theory is also worth checking.
- Comprehensive & Practical Inferential Statistics Guide for Data Science
- Computer-age Statistical Inference - A 2016 book that covers various topics in statistical inference that are relevant in this data-science era, with scalable techniques applicable to large datasets. A free PDF can be downloaded from here.
- Khan Academy Lesson on Confidence Intervals.
- Confidence Intervals Notes - from Stanford University's CME 106 - Introduction to Probability and Statistics for Engineers course.
- StackOverflow Link to Confidence Interval in SciPy.
- Khan Academy Lesson on Hypothesis Testing.
- Hypothesis Testing Notes - from Stanford University's CME 106 - Introduction to Probability and Statistics for Engineers course.
- Introduction to Linear Regression Analysis - Duke University Statistical Forecastion class
- Linear Regression Concepts - from the Machine Learning Cheatsheet.
- Evaluating linear relationships - "How to use scatterplots, correlation coefficients, and linear regression effectively"
- Wikipedia: Mean Square Error
- Wikipedia: Residual Sum of Squares
- Regression Metrics
- Stanford Statistical Learning - Ridge Regression - a video lecture from Stanford's Statistical Learning class.
- Stanford Statistical Learning - Lasso Regression - a video lecture from Stanford's Statistical Learning class.
- Logistic Regression Walkthrough
- Logistic Regression Video Walkthrough
- Odds Ratio Explanation
- Logistic Regression Concepts - from the Machine Learning Cheatsheet.
- A Detailed Introduction to K-Nearest Neighbor (KNN) Algorithm
- K-Nearest Neighbors: Dangerously Simple
- Classification Metrics
- An Introduction to Confusion Matrix Terminology - simple guide to confusion matrix terminology, from Data School.
- Making Sense of the Confusion Matrix - a video based on the confusion matrix guide.
- Precision, Recall, Sensitivity and Specificity - a useful discussion on confusion matrices.
- Performance Metrics for Classification problems in Machine Learning
- Data School's Video and Transcript on ROC/AUC
- A Deeper Introduction to ROC
- Rahul Patwari's video on ROC Curves and Sensitivity-Specificity tradeoffs
- Understanding the Bias-Variance Tradeoff - excellent (and even a must!) read on the bias variance tradeoff.
- Bias-Variance Tradeoff Notes
- Accurately Measuring Model Prediction Error - another excellent article from Scott Fortmann-Roe.
- Bias-Variance Tradeoff Lecture - a great lecture on bias-variance tradeoff from Caltech's Machine Learning Course - CS 156.
- Stanford Lecture on Cross Validation
- Model Selection
- Simplicity vs Complexity in Machine Learning — Finding the Right Balance - a blog post that explains the Bias-Variance Tradeoff in terms of model complexity/simplicity, discusses overfitting and underfitting and gives an applied example.
- Random Forests Guide
- How does randomization in a random forest work?
- "Decision Trees - Decoded" - a blog post on tree based methods, which consists of Decision Trees, Random Forest and Boosting methods.
- "Bagging - Unraveled" - a blog post on Bagging or Bootstrap Aggregating.
- Random Forests for Complete Beginners - "the definitive guide to Random Forests and Decision Trees."
- Support Vector Machine (SVM) Tutorial - learning SVMs from examples; an excellent blog post on Support Vector Machines.
- How Support Vector Machines Work - a video with an introduction to how support vector machines work.
- A Tutorial on Clustering Algorithms
- An Introduction to K-means Clustering Analysis - contains a theoretical explanation, pseudocode and Python code examples.
- K-means and Hierarchical Clustering - clustering tutorial from Andrew Moore's CS class at Carnegie Mellon. (There are additional tutorials at https://www.autonlab.org.)
- Visualizing K-Means Clustering - a thoretical as well as a visual overview of the K-Means algorithm.
- Visualizing DBSCAN Clustering - same as the previous link, just for DBSCAN clustering algorithm.
- Stanford Walkthrough of Hierarchical Clustering
- Clustering Notes/Cheat Sheet
- Dimension Reduction Notes/Cheat Sheet
- A One-Stop Shop for Principal Component Analysis - a blog post with an excellent introduction to Principal Component Analysis.
- Principal Component Analysis Explained Visually