The aim of this project is to investigate whether adding recurrent components to the decomposable attention model (DAM) [1] could enhance the performance of the overall network. Additionally, the proposed enhancements will be strict in maintaining a lower amount of trainable parameters than those required for other RNN models, so as to determine if an increase in performance could be achieved with minimal increase in parameters.
Note:
To run the project, make sure you view INSTRUCTIONS for installing the necessary packages. *
To fully understand the project scope and see results, please refer to the pdf present in the repo. *
This project extends Parikh et al.'s Decomposable Attention Model (DAM) [1] by adding reccurent components to the original. The following models were defined:
- DAM
- DAM-INTRA (DAM with intra-sentence attention)
- DAM-BiLSTM (DAM with bilateral LSTM)
- DAM-BiGRU (DAM with bilateral GRU)
- Stores SNLI zipped folder.
- Stores SNLI unzipped train, validation and test sets.
- Populated when running dataset-utils.ipynb.
- Stores the visualised architectures of each model.
- Contains hdf5 files representing the best weights obtained when training the models.
- These files are overriden when running model-trainer.ipynb and used when running model-evaluator.ipynb.
- Notebook used to observe the distribution of the SNLI dataset.
- This is also used to explain decisions taken throughout the project, such as choosing the maximum sequence lengths.
- Used to retrieve the SNLI dataset and unzip the necessary train, validation and test files.
-
Used to evaluate the models saved in the saved-models directory.
-
Quantitative Evaluation
- Classification Report
- Confusion Matrix
- ROC Curve
- Precision-Recall Curve
-
Qualitative Evaluation
- Observation of Incorrect Predictions
- Custom Input Predictions
- Used to train the models and display their architecture.
- Display accuracy against loss for training and validation sets after training.
[1] A. P. Parikh, O. Tackstr ¨ om, D. Das, and J. Uszkoreit, “A Decomposable Attention Model for Natural Language Inference,” ¨ arXiv:1606.01933 [cs], June 2016. arXiv: 1606.01933 version: 1.