Official repository for the Adversarial Attacks against Abuse (AAA) evaluation tool. AAA is a new evaluation metric for abuse detection systems that better captures a model's performance on certain classes of hard-to-classify microposts, and for example penalises systems which are biased on low-level lexical features.
With Docker
Within the Adversifier
directory run the following command:
docker build -f AAA-Dockerfile -t aaa .
The AAA tool works in two steps:
- Generating the AAA data files starting from your training and test sets
- Reading your answer files and computing the AAA score and sub-scores.
The AAA data files are generated starting from your training and test sets. Both files are expected to be tab-separated files with format:
post_text label
Labels need to be binary, with 1 corresponding to the abusive class, and 0 to the non-abusive class, e.g.:
This is an abusive message 1
This is a non-abusive message 0
To generate the AAA data files, run the following command:
docker run --mount type=bind,source=$AAA_FILE_DIR,target=/aaa/input aaa python3 gen.py --dataset_name $DATASET_NAME --train $TRAINING_SET --test $TEST_SET
where $AAA_FILE_DIR
is the absolute path to the directory containing your datasets (for example, "$(pwd)"/mydata
), $TRAINING_SET
and $TEST_SET
are the filenames of the training and test data files (to be placed inside $AAA_FILE_DIR), and $DATASET_NAME
is a string identifier for the dataset.
The tool will create the ${AAA_FILE_DIR}/aaa_files
directory containing the following tab-separated files:
corr_a_to_a.tsv
corr_n_to_n.tsv
f1_o.tsv
flip_n_to_a.tsv
hashtag_check.tsv
quoting_a_to_n.tsv
All files will have the same format as your input datasets:
post_text label
In order to evaluate your model with the AAA tool, create a ANSWER_FILE_DIR
directory containing the following tab-separated files:
corr_a_to_a.tsv
corr_n_to_n.tsv
f1_o.tsv
flip_n_to_a.tsv
hashtag_check.tsv
quoting_a_to_n.tsv
All files are expected to follow the following format:
post_text label your_model_prediction
To evaluate the answer files, run the following command:
docker run --mount type=bind,source=$AAA_FILE_DIR,target=/aaa/output/answer_files aaa python3 eval.py --dataset_name $DATASET_NAME
where $ANSWER_FILE_DIR
is the absolute path to the directory containing your answer files (for example, "$(pwd)"/mydata/aaa_files
), while $DATASET_NAME
is a string identifier for the dataset. Scores are stored in the $ANSWER_FILE_DIR/results.tsv
file.
With Binder
aaa_binder_demo.mov
The AAA data files are generated starting from your training and test sets. Both files are expected to be tab-separated files with format:
post_text label
Labels need to be binary, with 1 corresponding to the abusive class, and 0 to the non-abusive class, e.g.:
This is an abusive message 1
This is a non-abusive message 0
To generate the AAA data files:
- Click on the Binder badge at the top of this page.
- Upload your training and test sets into the
input
folder. - Open a new terminal, and run the following command:
python gen.py --dataset_name $DATASET_NAME --train $TRAINING_SET --test $TEST_SET
where $TRAINING_SET
and $TEST_SET
are the filenames of the training and test data files (to be placed inside input
), and $DATASET_NAME
is a string identifier for the dataset.
The tool will create the input/aaa_files
directory containing the following tab-separated files:
corr_a_to_a.tsv
corr_n_to_n.tsv
f1_o.tsv
flip_n_to_a.tsv
hashtag_check.tsv
quoting_a_to_n.tsv
All files will have the same format as your input datasets:
post_text label
In order to evaluate your model with the AAA tool, upload your answer files into the output/answer_files
directory. The tool expects the following tab-separated files:
corr_a_to_a.tsv
corr_n_to_n.tsv
f1_o.tsv
flip_n_to_a.tsv
hashtag_check.tsv
quoting_a_to_n.tsv
All files are expected to follow the following format:
post_text label your_model_prediction
To evaluate the answer files, run the following command in the terminal:
python eval.py --dataset_name $DATASET_NAME
where $DATASET_NAME
is a string identifier for the dataset. Scores are stored in the output/answer_files/results.tsv
file.
Old-school way
Within the Adversifier
directory run the following command:
./setup.sh
All the files' paths (e.g., data files) are specified within the info/info.py
file. Customise this file to meet your needs.
To run the AAA tool on your model with a generic dataset, you can choose among two different strategies:
- two-step pipeline: first query the tool to generate the AAA files starting from your data files, and then make a new query to evaluate your answer files.
- one-step pipeline: one single query to generate the new instances and evaluate your model. Besides your training and test sets, it requires you to provide your model's predictor.
You'll need to provide:
- the training and test sets, as tab-separated files in the format:
post_text label
Labels need to be binary, with 1 corresponding to the abusive class, and 0 to the non-abusive class, e.g.:
This is an abusive message 1
This is a non-abusive message 0
Labels are assumed to be binary, with 1 corresponding to the abusive class, and 0 to the non-abusive class.
To generate the AAA data files, create a directory named input
within the Adversifier
directory, and copy there your training and test sets. Then run the following command:
python3 gen.py --dataset_name $DATASET_NAME --train $TRAINING_SET --test $TEST_SET
where $TRAINING_SET
and $TEST_SET
are the name of the training and test data files, and $DATASET_NAME
is a string identifier for the dataset.
The tool will create the input/aaa_files
directory containing the following tab-separated files:
corr_a_to_a.tsv
corr_n_to_n.tsv
f1_o.tsv
flip_n_to_a.tsv
hashtag_check.tsv
quoting_a_to_n.tsv
All files have the following format:
post_text label
In order to evaluate your model with the AAA tool, create a directory named output/answer_files
containing the following tab-separated files:
corr_a_to_a.tsv
corr_n_to_n.tsv
f1_o.tsv
flip_n_to_a.tsv
hashtag_check.tsv
quoting_a_to_n.tsv
All files are expected to follow the following format:
post_text label your_model_prediction
To evaluate the answer files, run the following command:
python3 eval.py --dataset_name $DATASET_NAME
where $DATASET_NAME
is a string identifier for the dataset. Scores are stored in the output/answer_files/results.tsv
file.
You'll need to provide:
- the training and test sets, in the format specified here.
- your model's predictor: a function that takes as input a list of arguments, the 1st one being a list of NON-pre-processed posts, and returns a list of binary predictions.
Here is an example:
from AAAdversifier import AAAdversifier
adversifier = AAAdversifier()
train_data, test_data = load_your_data()
adversifier.aaa('your_model_name', your_model.predictor, train_data, test_data)
Check main.py
for usage examples. Scores are stored in the output/answer_files/results.tsv
file.
For the AAA tool to run, you'll need to provide both a training and test set. Both sets should be in the form:
data_split = [list of posts, list of labels, list of any extra information your model might use]
Therefore, the ith element of each list will contain information regarding the ith instance in the split. Labels are assumed to be binary, with 1 corresponding to the abusive class, and 0 to the non-abusive class.
For details on how to replicate the experiments in the AAA paper, click here.
Within the Adversifier
directory run the following command:
./setup.sh
If willing to replicate our results with the BERTMOZ or BERTKEN models, you'll need to install the transformers library:
pip3 install transformers
All the files' paths (e.g., data files, models' checkpoints) are specified within the info/info.py
file. Customise this file to meet your needs.
To replicate the experiments reported in the AAA paper, download the data files and models' checkpoints as described below, and run the following command:
python3 main.py
For the AAA tool to run, you'll need to provide both a training and test set. Both sets should be in the form:
data_split = [list of posts, list of labels, list of any extra information your model might use]
Therefore, the ith element of each list will contain information regarding the ith instance in the split. Labels are assumed to be binary, with 1 corresponding to the abusive class, and 0 to the non-abusive class.
To run the AAA tool on the Waseem et al., 2018's dataset, download the tweets through the Twitter API and put them in DATA/waseem_data.tsv
. The tab-separated file should have the following header (and format):
tweet_id tweet_text label
You can then call the utils.get_waseem_data
function, that returns a dictionary with keys {'train', 'test'} and the corresponding data_split as argument.
Splits are created using stratified sampling to split tweets from each class into training (80%), validation (10%) and test (10%) sets. The corresponding ids can be found in the waseem_train_ids.csv
,waseem_val_ids.csv
and waseem_test_ids.csv
files within the DATA
directory.
Note that the utils.get_waseem_data
function maps the "sexism", "racism" and "both" labels into the abusive class, and the "neither" label into the non abusive class.
To run the AAA tool on the Davidson et al., 2017's dataset, download the davidson_data.csv
file and add it to the DATA
directory. You can then call the utils.get_davidson_data
function, that returns a dictionary with keys {'train', 'test'} and the corresponding data_split as argument.
Splits are created using stratified sampling to split 0.8, 0.1, and 0.1 portions of tweets from each class into training, validation and test sets. The corresponding ids can be found in the davidson_train_ids.csv
, davidson_val_ids.csv
and davidson_test_ids.csv
files within the DATA
directory.
Note that the utils.get_davidson_data
function maps the "hate speech" and "offensive" labels into the abusive class, and the "neither" label into the non abusive class.
We provide code and checkpoints for the SVM, BERTMOZ and BERTKEN models trained on the Waseem et al., 2018 and Davidson et al., 2017 datasets.
To replicate our experiments on the Waseem et al., 2018's dataset you'll need to download the following checkpoints. You can download all the checkpoints from here (3.01 GB), or run the following command:
from utils import download_checkpoints
download_checkpoints('waseem-18')
Alternatively, you can download the checkpoints of interest from the following list. Add all the files to the models
directory, or modify the info/info.py
file accordingly.
The weights of our SVM model can be downloaded at:
The weights of our re-implementation of BERTMOZ (Mozafari et al., 2019) can be downloaded at:
- mozafari_waseem.pt
- mozafari_waseem_nh.pt (variant of the BERTMOZ model that fully discards hashtag content)
The weights of BERTKEN (Kennedy et al., 2020) can be downloaded at:
To replicate our experiments on the Davidson et al., 2017's dataset you'll need to download the following checkpoints. You can download all the checkpoints from here (1.91 GB), or run the following command:
from utils import download_checkpoints
download_checkpoints('davidson-17')
Alternatively, you can download the checkpoints of interest from the following list. Add all the files to the models
directory, or modify the info/info.py
file accordingly.
The weights of our SVM model can be downloaded at:
The weights of our re-implementation of BERTMOZ (Mozafari et al., 2019) can be downloaded at:
The weights of BERTKEN (Kennedy et al., 2020) can be downloaded at:
If you use our tool or metric, please cite our paper:
@inproceedings{calabrese-etal-2021-aaa,
author = {Agostina Calabrese and
Michele Bevilacqua and
Bj{\"{o}}rn Ross and
Rocco Tripodi and
Roberto Navigli},
editor = {Clare Hooper and
Matthew Weber and
Katrin Weller and
Wendy Hall and
Noshir Contractor and
Jie Tang},
title = {{AAA:} Fair Evaluation for Abuse Detection Systems Wanted},
booktitle = {WebSci '21: 13th {ACM} Web Science Conference 2021, Virtual Event,
United Kingdom, June 21-25, 2021},
pages = {243--252},
publisher = {{ACM}},
year = {2021},
url = {https://doi.org/10.1145/3447535.3462484},
doi = {10.1145/3447535.3462484},
timestamp = {Thu, 24 Jun 2021 15:05:45 +0200},
biburl = {https://dblp.org/rec/conf/websci/CalabreseBRTN21.bib}
}