Sarcasm-Detection-using-LSTM-Models

Using Long Short-Term Memory(LSTM) models to find out sarcastic contents in texts.

Datasets

A subset of Prinston's SARC 2.0 corpus for sarcasm detection with size of 100,000 samples.
(Reference: nlp.cs.princeton.edu/SARC/2.0/)
With the project's focus on the influence of conversational context, only 'label', 'comment' and 'parent_comment' columns are saved ('author' column is also saved but actually not used).

train.csv: training set with size of 80,000 samples.
val.csv: validation set with size of 10,000 samples.
test.csv: test set with size of 10,000 samples.

Implementation

Text Data Preprocessing: TorchText(0.4.0)
Pre-trained word vector: GloVe.6B.300d
Model Implementation: PyTorch(1.3.1)
Models are with/without conversational context and with/without attention mechanism. In total 4 LSTM models.

Running the program

Download all .csv and .py files
Run training.py -- change parameters/models in training.py and load_data.py
See results in the console

Evaluation

Precisions:

LSTM: 69.39%
LSTM(context): 65.4%
LSTM(attention): 69.73%
LSTM(attention+context): 66.49%

Future Work

Try to separate comments and parent_comments into different networks (using Seq2Seq models)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sarcasm-Detection-using-LSTM-Models

Datasets

Implementation

Running the program

Evaluation

Future Work

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
models		models
README.md		README.md
load_data.py		load_data.py
test.csv		test.csv
train.csv		train.csv
training.py		training.py
val.csv		val.csv

Maaaartian/Sarcasm-Detection-using-LSTM-Models

Folders and files

Latest commit

History

Repository files navigation

Sarcasm-Detection-using-LSTM-Models

Datasets

Implementation

Running the program

Evaluation

Future Work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages