Note: This is just the copy of the original project repository, the original project repository is kept private and is available upon request.
- Student Name: Zhi Hern Tom
- Due Date: Friday 16th of August 2021 11:59:00 am (AEST).
- Report Link: https://www.overleaf.com/read/xpddwvkstgfg
This project aims to make a quantitative analysis of the New York City Taxi and Limousine Service Trip Record Data. The dataset covers trips taken in various types of taxi and for-hire vehicle services in the New York City area.
- Language: Python 3.8.3
- Packages / Libraries: pandas, pyspark, numpy, sklearn, geopandas, matplotlib, folium
- NYC TLC: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
- NOAA Weather: https://www.ncdc.noaa.gov/cdo-web/datasets/GHCND/stations/GHCND:USW00094728/detail
raw_data
: Contains all the raw data files.preprocessed_data
: Contains all the preprocessed data files.plots
: Output and save all your figures here.code
: Keep all notebooks and scripts in this folder. Ensure that you have notebooks for each stage of code. Here's the instructions:- run preprocessing.ipynb to download and preprocess data.
- run visualisation.ipynb for visualisation and exploratory data analysis.
- run modelling.ipynb for machine learning modelling.
deprecated
: A folder to store "old code".