Advanced hybrid recommendation system combining collaborative filtering and content-based approaches with temporal awareness and contextual personalization.
- Hybrid Architecture: Combines matrix factorization (ALS) and semantic content-based filtering
- Temporal Weighting: Exponential decay of tag relevance (λ=0.002)
- BM25 Transformation: Non-linear weighting of implicit feedback
- Dynamic Blending: Context-aware balance between CF and CB components
- Online Learning: Partial updates for real-time adaptation
- Stemmed N-gram Features: Enhanced text processing with Snowball stemmer
graph TB
subgraph "Data Sources"
A[User-Artist<br>Interactions] --> B[CSR Matrix]
C[User-Tag<br>Timestamps] --> D[Temporal Decay<br>Calculation]
E[Artist Metadata] --> F[TF-IDF Corpus]
end
subgraph "Collaborative Filtering"
B --> G[BM25 Weighting]
G --> H[ALS Model]
H --> I((128 Latent<br>Factors))
end
subgraph "Content-Based Filtering"
D --> J[Weighted Tags]
F --> K[Stemmed TF-IDF]
J --> K
K --> L[Artist Similarity<br>Matrix]
end
subgraph "Hybrid Blending"
M[User Context] --> N{Dynamic Alpha<br>Calculation}
H --> O[CF Scores]
L --> P[CB Scores]
O --> Q[Score Blending]
P --> Q
N --> Q
Q --> R[Top-N Recommendations]
end
subgraph "Online Learning"
S[New Interactions] --> T[CSR Matrix Update]
T --> U[Partial ALS Retrain]
U --> H
end
style A stroke-width:2px
style C stroke-width:2px
style E stroke-width:2px
style H stroke-width:2px
style K stroke-width:2px
style Q stroke-width:2px
style T stroke-width:2px
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Requirements:
- implicit==0.7.2 (requires C compiler)
- scipy>=1.10.1
- scikit-learn>=1.2.2
- pandas>=2.0.3
- numpy>=1.24.3
- nltk>=3.8.1
Place Last.fm dataset files in /lastfmdata
:
user_artists.dat
artists.dat
tags.dat
user_taggedartists.dat
user_taggedartists-timestamps.dat
# Initialize recommender
from hybrid_recommender import HybridRecommender
recommender = HybridRecommender(data_path="lastfmdata/")
# Get recommendations
recommendations = recommender.recommend(user_id=2, top_n=10)
"""
[('The Beatles', 0.872),
('Radiohead', 0.855),
('Pink Floyd', 0.841), ...]
"""
# Update model with new interactions
new_data = pd.DataFrame({
'userID': [2, 2],
'artistID': [123, 456],
'weight': [5000, 3000]
})
recommender.partial_fit(new_data)
-
Collaborative Filtering
- BM25-weighted implicit feedback
- Alternating Least Squares optimization
- CSR matrix format for efficient operations
-
Content-Based Filtering
- Temporal-weighted tag aggregation
- Stemmed bigram TF-IDF vectors
- Cosine similarity index
-
Hybrid Engine
- Dynamic weighting (α) based on user activity
- Contextual blending of three signals:
- ALS predicted scores
- Personal tag relevance
- Favorite artist similarity
- Sparse matrix operations (CSR format)
- Cached similarity indices
- Batched partial updates
- Memory-efficient resizing
- Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative Filtering for Implicit Feedback Datasets
- Robertson, S., & Zaragoza, H. (2009). The Probabilistic Relevance Framework: BM25 and Beyond
- Cremonesi, P., et al. (2010). Performance of Recommender Algorithms on Top-N Recommendation Tasks
MIT License (see LICENSE file)