Lately, I have become interested in natural language processing (NLP), the machine learning technology that allows us to "translate" human language into a coding that machines can understand. I am drawn to the idea that an AI can not only interpret the frequency of elements in speech but also manage to interpret the context of what is being said.
I have dabbled in language processing before, such as in the Fake News Detector project. In that project, I used NLP to measure the frequency of key terms in a large dataset and, based on that, developed a linear classification model to categorize new articles as True or False. Additionally, in the same project, I approached Sentiment Analysis at a basic level to demonstrate the hypothesis that Fake News is generally written from a very negative perspective.
Now, in this project, I am delving deeper into natural language processing. This time, through Mood Analysis. I process it using an NLP model called Emotion English DistilRoBERTa-base from Hugging Face, a refined version of RoBERTa (a famous model that works by seeking relationships between language and context) that classifies moods into labels (joy, anger, disgust, fear…).
With this project, I primarily aim to break down the synopses of each entry to classify the titles in the Netflix catalog into different moods. This way, I can offer users the possibility of receiving a suggestion based on how they feel.
- Data Preprocessing and EDA: Preprocessing the dataset to properly tailor it to the information the user will receive when requesting a suggestion from the catalog. Additionally, exploring the data to gain a detailed understanding of the catalog's characteristics.
- Mood Analysis: Applying NLP to the synopses of the titles through an ML pipeline that will classify the entries according to moods.
- Netflix Recommender: Creating the code for the recommender, which will offer titles based on the inputs provided by the user.
This project is developed through three notebooks.