This repository showcases an application of concepts learned during the "Business, Economic, and Financial Data" course at the University of Padua to analyze time series data of the number of visitors in some Italian museums.
Predict monthly visitors for two Italian museums by leveraging time series forecasting techniques.
Manually crafted, using data from multiple sources:
- Main variable (museum visitors time series) from Visit Piemonte
- External regressors: Google Trends, Turin weather data, and some custom-built (# school holidays, COVID closures, and renovation).
- Basic Baseline: Predicts using the mean OR the same value as the last year, serving as a simple benchmark.
- Advanced Baseline (SoTA): Utilizes TimeGPT, representing the cutting-edge model for time series forecasting.
- Holt-Winters' exponential smoothing with additive seasonality
- Generalized Additive Model (GAM)
- SARIMA
- SARIMAX
- Generalized Bass Model (GBM)
- Time-series linear regression
- Lasso regression
- Boosting (Gradient Boosting and XGBoost)
Also, combinations of methods are utilized, like GBM + SARIMAX for first modeling the trend, and then modeling the residuals.
Time-series cross validation in combination with AIC.
- RMSE
- MAPE
- AIC
Five models outperform both baselines, including: Exp. smoothing Holt Winters, TSLM, SARIMA, XGBoost, and Gradient Boosting.
Eleven models outperform both baselines, with SARIMAX showing exceptional performance (0.271 RMSE vs 1.085 of TimeGPT).
Finally, performed analysis of the effects of COVID, trying to interpolate the outliers using two approaches:
- Use a good forecasting model.
- Replace each month using the historical monthly mean.
This didn't give much of an improvement of the previous models.