This project provides an exploratory data analysis (EDA) and visual insights into the Olympics dataset. Using Python, Jupyter Notebook, and Streamlit, the project analyzes patterns, medal tallies, country-specific performance, and athlete statistics. The primary goal is to allow users to explore historical Olympic data in an interactive web app.
athlete_events.csv: Contains information about athletes, including names, countries, events, and medal records.
noc_regions.csv: Maps National Olympic Committees (NOC) codes to respective regions/countries.
EDA: Conducted in Jupyter Notebook to clean and analyze the dataset.
Streamlit App: Interactive app for visualizing the data.
Helper and Preprocessor Scripts: Scripts used to process and structure data for the Streamlit app.
Olympics120analysis.ipynb: Contains data cleaning, analysis, and visualizations.
app.py: Streamlit app for interactive data exploration.
preprocessor.py: Preprocessing functions for data merging and transformation.
helper.py: Contains functions for querying and aggregating data for app visualization.
Clone this repository and navigate to the project directory. Install required dependencies:
pip install -r requirements.txt
Ensure that athlete_events.csv and noc_regions.csv are in the project directory.
-
Jupyter Notebook:
Open .ipynb to view the exploratory analysis.
-
Streamlit App:
Run the app:
streamlit run app.py
- Data Cleaning:
. Filtered data for only "Summer" Olympics.
. Removed duplicates and merged athlete data with regional codes.
- Medal Tally Calculation:
. Aggregated medal counts by country for each year.
- Visualizations:
. Line charts for participation trends over time.
. Heatmaps for events per sport over the years.
. Distribution plots for athlete ages by medal type.
- Functions:
. fetch_year_country(): Fetches medal tally for a specific year and/or country. . most_successful(): Identifies top medalists in each sport.
. Sidebar Options:-
- Medal Tally: View medal counts by year and country.
data:image/s3,"s3://crabby-images/6a383/6a383a3db9810b9f1eb604a4d8093dffb5e9142c" alt="image"
data:image/s3,"s3://crabby-images/974a1/974a1c2e8b3e10792e69f75db2e8c2ac57d1fdfc" alt="image"
data:image/s3,"s3://crabby-images/40e2e/40e2e99b9b4e85b73c031c1af48d022940ee45f5" alt="image"
data:image/s3,"s3://crabby-images/e2f34/e2f341f5ad9d25520840994fde410f5ba493b2b7" alt="image"
- Overall Analysis: Summary statistics including total editions, sports, athletes, and a timeline of events and participants.
data:image/s3,"s3://crabby-images/09fa7/09fa7492d70c5e3fc9f1e1b797d0139f7f34dad5" alt="image"
data:image/s3,"s3://crabby-images/5d6cd/5d6cd79f0b32dd246f21854d777916fb3b59a058" alt="image"
data:image/s3,"s3://crabby-images/cadd7/cadd750b6511865e8a562210ee538707c540e3d2" alt="image"
data:image/s3,"s3://crabby-images/54dc8/54dc88ae9a64a23d4d9c34cc3e35729286d733db" alt="image"
data:image/s3,"s3://crabby-images/29d54/29d54f66a5d992ec21d75bb3d58424bfff684986" alt="image"
data:image/s3,"s3://crabby-images/096c3/096c3d8d745f98a4dc44cd5748478d4b5423c49b" alt="image"
data:image/s3,"s3://crabby-images/53ed2/53ed2022672fc6fd3e21ae3e228abfd04e4d90ae" alt="image"
data:image/s3,"s3://crabby-images/2e362/2e362a23374cc5a6d052555d3a85dba8b32b36a0" alt="image"
- Country-wise Analysis: Examine medal tallies and top athletes for selected countries.
data:image/s3,"s3://crabby-images/fafcc/fafccb3fca5249d1146526e44ec121ddca6d5e8b" alt="image"
data:image/s3,"s3://crabby-images/553a8/553a88cf6c6352c804b5c422bfdbd174adfa4154" alt="image"
data:image/s3,"s3://crabby-images/b41f1/b41f10d1155f08a60cf6814068861240f7c38689" alt="image"
data:image/s3,"s3://crabby-images/586d7/586d71bbe3c2eb09bf72b4fc5a1ba627a3b99f0e" alt="image"
. Key Features
- Interactive Graphs: Includes line plots and heatmaps using Plotly and Seaborn.
- Data Selection: Filters by year, country, and sport.
-
fetch_year_country(): Retrieves medal data based on year and country.
-
medal_tally(): Aggregates medal counts.
-
country_year_list(): Generates lists of unique years and countries for selection.
preprocess(): Merges athlete_events and noc_regions datasets and handles data preprocessing for analysis.
Python: Core language for data manipulation and app development.
Pandas, NumPy: Data handling and preprocessing.
Plotly, Seaborn, Matplotlib: Data visualization.
Streamlit: Web app framework.