This is a collection of working Jupyter notebooks with associated datasets (mostly from Kaggle) to show EDA, data cleaning, model building, validation, grid search for hyperparameter optimization, feature importances, and plotting. The logistic regression and random forest classifier notebook was the capstone project for my Google Advanced Data Analytics course. It includes business recommendations at the end.
- Naive Bayes classifier (naive-bayes-confusion-matrix.ipynb)
- Linear Regression with hypothesis testing (linear-regression-anova-hypothesis-test.ipynb)
- K-means unsupervised classifier with intertia and silhouette scoring (Kmeans-inertia-and-silhouette-score.ipynb)
- Decision Tree classifier with grid search and feature importance plotting (decision_tree_grid_search_feature_importances.ipynb)
- Logistic regression and Random Forest Classifier capstone project (capstone-logistic-random-forest-classifier.ipynb)