This repository is dedicated to football data analysis, showcasing various Jupyter notebooks that detail the handling and visualization of football data from multiple perspectives. Through these projects, we explore different facets of football analytics, including creating sophisticated pass maps, evaluating player performance metrics, and more, using Python and libraries like Matplotlib and mplsoccer.
Field | Tasks | Planned Date | Status | Artefacts |
---|---|---|---|---|
Data Processing and Visualization | Develop Pass Map Template | 2024-03-01 | Done ✅ | code article |
Scraping Static FBref Data | 2024-03-01 | Done ✅ | code | |
Develop Team Radar Template | 2024-03-08 | Done ✅ | code article | |
Integrate Distribution on Radar Template | 2024-03-12 | Done ✅ | code | |
Shots and Goals map | 2024-03-19 | Done ✅ | code article | |
xT map by zones | 2024-06-01 | Planned 🔜 | ||
Calculating xT based on transitions matrix | 2024-06-01 | Planned 🔜 | ||
Create Player Templates | ❓ | To do | ||
Interpolate Carries on Event Data | 2024-06-01 | Planned 🔜 | ||
Identify Possession Chains | 2024-06-01 | Planned 🔜 | ||
Visualize Dynamic Metric Changes (xT, xG) | 2024-06-01 | Planned 🔜 | ||
TBD | TBD | TBD | ||
Data Management | Design data warehouse architecture | ❓ | To do | |
Load historical data into the warehouse | ❓ | To do | ||
Launch regular data loading processes | ❓ | To do | ||
Quality Assurance | ❓ | To do | ||
TBD | TBD | TBD | ||
Advanced Analytics | Build xT transition matrix | 2024-03-01 | Done ✅ | article |
Build an Up-to-Date baseline VAEP Model | ❓ | To do | ||
Increasing quality of VAEP | ❓ | To do | ||
TBD | ||||
Automation & Integration | Pilot a Twitter Bot for automated posting | ❓ | To do | |
Pilot a Telegram Bot for automated posting | ❓ | To do | ||
TBD |
The Pass_map
directory contains Jupyter notebooks and datasets used for creating advanced pass maps (passing network map). Key components include:
1. Pass map creating v1.20240221.ipynb
: Notebook for pass map visualization.data/
: Directory containing raw event data files for match Man City 1:1 Chelsea | Premier League | Season 2023-2024 | 2024-02-17img/
: Directory containing resultingpass_map.jpeg
and reference map from the Athletic teamplate theAthletic pass map.jpeg
For a detailed walkthrough of the pass map creation process, check out my Medium articles:
- Article 1: Passing networks with expected threat (xT) layer. Walking through popular templates. Explaining the details.
- Article 2: A Detailed Guide to Creating Advanced Pass Maps with Python and Matplotlib
The Scraping_fbref_static_data
directory facilitates the collection of comprehensive football statistics from FBRef, targeting the top 5 European leagues. It includes data spanning the last five seasons and up-to-date statistics for the current season (as of March 2, 2024).
Key Components:
utils/
: Contains Python utility file with functions essential for data scraping and manipulation.notebooks/
: Features Jupyter notebook that guides users through the scraping process (based on /~https://github.com/parth1902/Scrape-FBref-data/blob/master/Scrape_FBref.ipynb)img/
: Provides screenshots from the FBRef website, offering insights into the tables and statistics being collected, facilitating a better understanding of the data's structure and content.data/old_seasons/
: Stores historical data for the top 5 European leagues from the 2018-2019 season to the 2022-2023 season, including:top5_leagues_keeper_2018_2019__2022_2023.csv
: Goalkeeper statistics for the last five seasons.top5_leagues_outfields_2018_2019__2022_2023.csv
: Outfield player statistics.top5_leagues_team_2018_2019__2022_2023.csv
: Team-level statistics.top5_leagues_team_vs_2018_2019__2022_2023.csv
: Team versus team statistics.
data/current_season/{date}
/: Contains the latest season's data, structured as follows:top5_leagues_keeper_2023_2024.csv
: Current season goalkeeper statistics.top5_leagues_outfields_2023_2024.csv
: Outfield player statistics.top5_leagues_team_2023_2024.csv
: Team-level statistics.top5_leagues_team_vs_2023_2024.csv
: Team versus team statistics.
Data Collection Time (MacBook Air M1 8GB): Collecting the entire dataset for the last five seasons requires approximately 1.5 hours, while updating with the current season's data takes about 20 minutes (4 minute for 1 league and 1 minute if you need just Outfield data for example). This process can be expedited by leveraging multiprocessing.
Data Utilization: It is recommended to use the already available data for the past five seasons and only update with the actual data for the current season.
The Team_radar
directory contains Jupyter notebooks and Python (.py) modules with utility functions used for creating a template similar to StatsBomb for generating Team Radars. The key components include:
notebooks/1. Team radar and statistics table.ipynb
: A notebook for team radar visualization.notebooks/2. Team radar and distribution.ipynb
: A notebook for team radar visualization.img/
: A directory containing the resulting radar images, statistics table images and distributions images.utils/
: A directory containing modules with utility functions for creating the Radar Map
The data used for creating the template and plotting up-to-date statistics for teams are provided by the Scraping_fbref_static_data
directory in the same repository.
For a detailed walkthrough of the process of creating team radars, check out my Medium articles:
- Article 1: Create a StatsBomb-Inspired Template for Team Radar Comparison Using Free Data from FBRef