Skip to content

tekewin/data-science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science demo notebooks and files

This is a collection of working Jupyter notebooks with associated datasets (mostly from Kaggle) to show EDA, data cleaning, model building, validation, grid search for hyperparameter optimization, feature importances, and plotting. The logistic regression and random forest classifier notebook was the capstone project for my Google Advanced Data Analytics course. It includes business recommendations at the end.

Models Used:

  • Naive Bayes classifier (naive-bayes-confusion-matrix.ipynb)
  • Linear Regression with hypothesis testing (linear-regression-anova-hypothesis-test.ipynb)
  • K-means unsupervised classifier with intertia and silhouette scoring (Kmeans-inertia-and-silhouette-score.ipynb)
  • Decision Tree classifier with grid search and feature importance plotting (decision_tree_grid_search_feature_importances.ipynb)
  • Logistic regression and Random Forest Classifier capstone project (capstone-logistic-random-forest-classifier.ipynb)