This Repo contains notebooks used to Obtain, Scrub, Explore, Model, and iNterpret the data from the Pump It Up competition put on by Driven Data. The task is to create a model using machine learning that will predict whether a well in Tanzania is functional, not functional, or functional but needs repair.
This repo contains:
-
csv files for the data preprocessed and then processed
-
the test set from the competition used to submit an entry in the competition
-
a notebook that contains all the cleaning and exploratory analysis.
-
a folder of notebooks creating models for the ternary classification.
-
a folder contains my work on the problem using a binary classification of functional or needs repair.
-
an executive summary presentation showcasing my final models and my recommendations for those looking to invest in repairing wells.
I've been working with this data in Tableau and digging a bit deeper into how to best classify. This is one of the visualizations I was able to create to showcase the areas impacted by wells that need repair and the size of the population impacted.
Blog about competing for the first time. Blog about Tableau visualization