This project provides an exploratory data analysis (EDA) and visual insights into the Olympics dataset. Using Python, Jupyter Notebook, and Streamlit, the project analyzes patterns, medal tallies, country-specific performance, and athlete statistics. The primary goal is to allow users to explore historical Olympic data in an interactive web app.
athlete_events.csv: Contains information about athletes, including names, countries, events, and medal records.
noc_regions.csv: Maps National Olympic Committees (NOC) codes to respective regions/countries.
EDA: Conducted in Jupyter Notebook to clean and analyze the dataset.
Streamlit App: Interactive app for visualizing the data.
Helper and Preprocessor Scripts: Scripts used to process and structure data for the Streamlit app.
Olympics120analysis.ipynb: Contains data cleaning, analysis, and visualizations.
app.py: Streamlit app for interactive data exploration.
preprocessor.py: Preprocessing functions for data merging and transformation.
helper.py: Contains functions for querying and aggregating data for app visualization.
Clone this repository and navigate to the project directory. Install required dependencies:
pip install -r requirements.txt
Ensure that athlete_events.csv and noc_regions.csv are in the project directory.
-
Jupyter Notebook:
Open .ipynb to view the exploratory analysis.
-
Streamlit App:
Run the app:
streamlit run app.py
- Data Cleaning:
. Filtered data for only "Summer" Olympics.
. Removed duplicates and merged athlete data with regional codes.
- Medal Tally Calculation:
. Aggregated medal counts by country for each year.
- Visualizations:
. Line charts for participation trends over time.
. Heatmaps for events per sport over the years.
. Distribution plots for athlete ages by medal type.
- Functions:
. fetch_year_country(): Fetches medal tally for a specific year and/or country. . most_successful(): Identifies top medalists in each sport.
. Sidebar Options:-
- Medal Tally: View medal counts by year and country.
data:image/s3,"s3://crabby-images/4fc35/4fc35aaac8681f3fc424a65e2710c0c96ef82b0d" alt="image"
data:image/s3,"s3://crabby-images/bc920/bc9206d3cc999dcfa3b192e2f3c1a27f8a9dd70e" alt="image"
data:image/s3,"s3://crabby-images/de642/de642af4534da2183371752310bd9761b233948d" alt="image"
data:image/s3,"s3://crabby-images/d91a6/d91a69318534ca826dc6c3118d2b2c92aad2f77c" alt="image"
- Overall Analysis: Summary statistics including total editions, sports, athletes, and a timeline of events and participants.
data:image/s3,"s3://crabby-images/614a6/614a65dd59b50f00c979aa92974b3c95b8044752" alt="image"
data:image/s3,"s3://crabby-images/e91b6/e91b6b891746e2bc3a9f1ef2bf1bd7489825d477" alt="image"
data:image/s3,"s3://crabby-images/a96f3/a96f34b0d081a0e0504db855119d55cb4c745439" alt="image"
data:image/s3,"s3://crabby-images/441d3/441d399c83ed108489b8b1872ea2ab20cf9980dc" alt="image"
data:image/s3,"s3://crabby-images/ed3a5/ed3a5c350c7ffa5a63a3d51c79bcb5c0fc638a7e" alt="image"
data:image/s3,"s3://crabby-images/b11e3/b11e3e85e1e78be4628a33a0ba2132dc526b3d14" alt="image"
data:image/s3,"s3://crabby-images/61edf/61edf9c4f6246e9b8d103d582155091551d41e06" alt="image"
data:image/s3,"s3://crabby-images/94992/949928f984ba387327b3d0b36d84f953c9073b07" alt="image"
- Country-wise Analysis: Examine medal tallies and top athletes for selected countries.
data:image/s3,"s3://crabby-images/19f4c/19f4caa6a28d5164f338fcbc982a680206bf4ef1" alt="image"
data:image/s3,"s3://crabby-images/e962d/e962d37c042e68fb6d416e423e0a7d3ecf7839cc" alt="image"
data:image/s3,"s3://crabby-images/0c0cb/0c0cbfe062228217fdd5dbf57f0dc671881e3928" alt="image"
data:image/s3,"s3://crabby-images/3681e/3681e21796f3d0007e562c125308c8654bd16e0f" alt="image"
. Key Features
- Interactive Graphs: Includes line plots and heatmaps using Plotly and Seaborn.
- Data Selection: Filters by year, country, and sport.
-
fetch_year_country(): Retrieves medal data based on year and country.
-
medal_tally(): Aggregates medal counts.
-
country_year_list(): Generates lists of unique years and countries for selection.
preprocess(): Merges athlete_events and noc_regions datasets and handles data preprocessing for analysis.
Python: Core language for data manipulation and app development.
Pandas, NumPy: Data handling and preprocessing.
Plotly, Seaborn, Matplotlib: Data visualization.
Streamlit: Web app framework.