This repository contains the code and documentation for a project focused on accurately classifying AI-related discussions on Reddit into positive, negative, or neutral sentiments. The main objectives of the project were to conduct a comparative analysis of various machine learning models and to explore sentiment trends over time within these discussions. To achieve these goals, a combination of conventional machine learning techniques and an LSTM model were employed.
The project's key achievements include:
- Implementing a comprehensive sentiment analysis framework for AI-related Reddit discussions.
- Utilizing multiple libraries, such as Scikit-Learn, for building and training sentiment classification models.
- Conducting a thorough comparative analysis of different machine learning models to determine their effectiveness in sentiment classification.
- Exploring and visualizing sentiment trends over time to gain insights into how sentiments have evolved in AI-related discussions.
Two notable models were developed and evaluated: Logistic Regression: Achieved a F1 Score of 0.71, with the positive class achieving 0.77. Linear SVC Model: Achieved F1 Score of 0.73. BERT: Achieved a competitive F1 Score of 0.80.
The project leveraged a range of technologies and skills, including but not limited to:
- Scikit-Learn: Used for building machine learning models and conducting analysis.
- Keras: Employed to construct and train the LSTM model for sentiment classification.
- VADER: Used for the sentiment analysis of textual data.
- Long Short-term Memory (LSTM): A type of recurrent neural network well-suited for sequence-based tasks.
- BERT: Bidirectional transformer pre-trained using a combination of masked language modeling and next sentence prediction