This project is a sentiment classifier for IMDb movie reviews. It uses a pre-trained GloVe word embedding model and a Bidirectional LSTM network to classify reviews as positive or negative.
- Loads IMDb movie reviews for training, validation, and testing.
- Uses GloVe embeddings for enhanced text representation.
- Trains a Bidirectional LSTM (Long Short-Term Memory) model to classify reviews as positive or negative.
- Achieves high accuracy on both validation and test sets.
-
Clone this repository:
git clone /~https://github.com/sminerport/IMDbSentimentClassifier.git cd IMDbSentimentClassifier
-
Install dependencies:
pip install -r requirements.txt
-
Run the model:
python src/main.py
The model uses IMDb review data split into training, validation, and test sets. These files are stored in the data/
directory and are managed with Git Large File Storage (Git LFS) to optimize storage and download efficiency.
To ensure access to the data files, please install Git LFS if you haven’t already. You can download Git LFS here.
# Install Git LFS
git lfs install
Then, clone the repository as usual:
git clone /~https://github.com/sminerport/IMDbSentimentClassifier.git
cd IMDbSentimentClassifier
If you’ve already cloned the repository without Git LFS, run the following command to pull the LFS files:
git lfs pull
To train the model:
python src/main.py
After running, the script will automatically download and clean up GloVe embeddings to save space.
Below is a snapshot of the model's training and validation accuracy and loss across epochs:
This image provides a visual summary of the training process. Each epoch displays the model's accuracy and loss on both the training and validation sets, showing the progression as the model improves over time.
The script will delete the GloVe embeddings and the saved model (best_model.keras
) after evaluation to conserve storage. If you'd like to keep these files, set the cleanup
variable to False
in the script.
- To adjust storage usage, toggle the
cleanup
variable in the script. requirements.txt
is generated by runningpip freeze > requirements.txt
in a Colab environment or your local environment.
This project is licensed under the MIT License. See the LICENSE file for more details.