huggingface · lhoestq · Sep 21, 2021 · Aug 2, 2021 · Sep 7, 2021 · Sep 10, 2021
diff --git a/datasets/sem_eval_2018_task_1/README.md b/datasets/sem_eval_2018_task_1/README.md
@@ -0,0 +1,219 @@
+---
+annotations_creators:
+- crowdsourced
+language_creators:
+- found
+languages:
+- en
+- ar
+- es
+licenses:
+- unknown
+multilinguality:
+- multilingual
+pretty_name: 'SemEval-2018 Task 1: Affect in Tweets'
+size_categories:
+- 1K<n<10K
+source_datasets:
+- original
+task_categories:
+- text-classification
+task_ids:
+- multi-label-classification
+- text-classification-other-emotion-classification
+---
+
+# Dataset Card for SemEval-2018 Task 1: Affect in Tweets
+
+## Table of Contents
+- [Table of Contents](#table-of-contents)
+- [Dataset Description](#dataset-description)
+  - [Dataset Summary](#dataset-summary)
+  - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
+  - [Languages](#languages)
+- [Dataset Structure](#dataset-structure)
+  - [Data Instances](#data-instances)
+  - [Data Fields](#data-fields)
+  - [Data Splits](#data-splits)
+- [Dataset Creation](#dataset-creation)
+  - [Curation Rationale](#curation-rationale)
+  - [Source Data](#source-data)
+  - [Annotations](#annotations)
+  - [Personal and Sensitive Information](#personal-and-sensitive-information)
+- [Considerations for Using the Data](#considerations-for-using-the-data)
+  - [Social Impact of Dataset](#social-impact-of-dataset)
+  - [Discussion of Biases](#discussion-of-biases)
+  - [Other Known Limitations](#other-known-limitations)
+- [Additional Information](#additional-information)
+  - [Dataset Curators](#dataset-curators)
+  - [Licensing Information](#licensing-information)
+  - [Citation Information](#citation-information)
+  - [Contributions](#contributions)
+
+## Dataset Description
+
+- **Homepage: https://competitions.codalab.org/competitions/17751**
+- **Repository:**
+- **Paper: http://saifmohammad.com/WebDocs/semeval2018-task1.pdf**
+- **Leaderboard:**
+- **Point of Contact: https://www.saifmohammad.com/**
+
+### Dataset Summary
+
+Tasks: We present an array of tasks where systems have to automatically determine the intensity of emotions (E) and intensity of sentiment (aka valence V) of the tweeters from their tweets. (The term tweeter refers to the person who has posted the tweet.) We also include a multi-label emotion classification task for tweets. For each task, we provide separate training and test datasets for English, Arabic, and Spanish tweets. The individual tasks are described below:
+
+1. EI-reg (an emotion intensity regression task): Given a tweet and an emotion E, determine the  intensity of E that best represents the mental state of the tweeter—a real-valued score between 0 (least E) and 1 (most E).
+Separate datasets are provided for anger, fear, joy, and sadness.
+
+2. EI-oc (an emotion intensity ordinal classification task): Given a tweet and an emotion E, classify the tweet into one of four ordinal classes of intensity of E that best represents the mental state of the tweeter.
+Separate datasets are provided for anger, fear, joy, and sadness.
+
+3. V-reg (a sentiment intensity regression task): Given a tweet, determine the intensity of sentiment or valence (V) that best represents the mental state of the tweeter—a real-valued score between 0 (most negative) and 1 (most positive).
+
+4. V-oc (a sentiment analysis, ordinal classification, task): Given a tweet, classify it into one of seven ordinal classes, corresponding to various levels of positive and negative sentiment intensity, that best represents the mental state of the tweeter.
+
+5. E-c (an emotion classification task): Given a tweet, classify it as 'neutral or no emotion' or as one, or more, of eleven given emotions that best represent the mental state of the tweeter.
+Here, E refers to emotion, EI refers to emotion intensity, V refers to valence or sentiment intensity, reg refers to regression, oc refers to ordinal classification, c refers to classification. 
+
+Together, these tasks encompass various emotion and sentiment analysis tasks. You are free to participate in any number of tasks and on any of the datasets.
+
+**Currently only the subtask 5 (E-c) is available on the Hugging Face Dataset Hub.**
+
+### Supported Tasks and Leaderboards
+
+### Languages
+
+English, Arabic and Spanish
+
+## Dataset Structure
+
+### Data Instances
+
+An example from the `subtask5.english` config is:
+
+```
+{'ID': '2017-En-21441',
+ 'Tweet': "“Worry is a down payment on a problem you may never have'. \xa0Joyce Meyer.  #motivation #leadership #worry",
+ 'anger': False,
+ 'anticipation': True,
+ 'disgust': False,
+ 'fear': False,
+ 'joy': False,
+ 'love': False,
+ 'optimism': True,
+ 'pessimism': False,
+ 'sadness': False,
+ 'surprise': False,
+ 'trust': True}
+ ```
+
+### Data Fields
+
+For any config of the subtask 5:
+- ID: string id of the tweet
+- Tweet: text content of the tweet as a string
+- anger: boolean, True if anger represents the mental state of the tweeter
+- anticipation: boolean, True if anticipation represents the mental state of the tweeter
+- disgust: boolean, True if disgust represents the mental state of the tweeter
+- fear: boolean, True if fear represents the mental state of the tweeter
+- joy: boolean, True if joy represents the mental state of the tweeter
+- love: boolean, True if love represents the mental state of the tweeter
+- optimism: boolean, True if optimism represents the mental state of the tweeter
+- pessimism: boolean, True if pessimism represents the mental state of the tweeter
+- sadness: boolean, True if sadness represents the mental state of the tweeter
+- surprise: boolean, True if surprise represents the mental state of the tweeter
+- trust: boolean, True if trust represents the mental state of the tweeter
+
+Note that the test set has no labels, and therefore all labels are set to False.
+
+### Data Splits
+
+|                            | Tain   | Dev   | Test |
+| -----                      | ------ | ----- | ---- |
+| English                    | 6,838  |  886  | 3,259|
+| Arabic                     | 2,278  |  585  | 1,518|
+| Spanish                    | 3,561  |  679  | 2,854|
+
+
+## Dataset Creation
+
+### Curation Rationale
+
+### Source Data
+
+Tweets
+
+#### Initial Data Collection and Normalization
+
+#### Who are the source language producers?
+
+Twitter users.
+
+### Annotations
+
+#### Annotation process
+
+We presented one tweet at a time to the annotators
+and asked which of the following options best de-
+scribed the emotional state of the tweeter:
+– anger (also includes annoyance, rage)
+– anticipation (also includes interest, vigilance)
+– disgust (also includes disinterest, dislike, loathing)
+– fear (also includes apprehension, anxiety, terror)
+– joy (also includes serenity, ecstasy)
+– love (also includes affection)
+– optimism (also includes hopefulness, confidence)
+– pessimism (also includes cynicism, no confidence)
+– sadness (also includes pensiveness, grief)
+– surprise (also includes distraction, amazement)
+– trust (also includes acceptance, liking, admiration)
+– neutral or no emotion
+Example tweets were provided in advance with ex-
+amples of suitable responses.
+On the Figure Eight task settings, we specified
+that we needed annotations from seven people for
+each tweet. However, because of the way the gold
+tweets were set up, they were annotated by more
+than seven people. The median number of anno-
+tations was still seven. In total, 303 people anno-
+tated between 10 and 4,670 tweets each. A total of
+174,356 responses were obtained.
+
+Mohammad, S., Bravo-Marquez, F., Salameh, M., & Kiritchenko, S. (2018). SemEval-2018 task 1: Affect in tweets. Proceedings of the 12th International Workshop on Semantic Evaluation, 1–17. https://doi.org/10.18653/v1/S18-1001
+
+#### Who are the annotators?
+
+Crowdworkers on Figure Eight.
+
+### Personal and Sensitive Information
+
+## Considerations for Using the Data
+
+### Social Impact of Dataset
+
+### Discussion of Biases
+
+### Other Known Limitations
+
+## Additional Information
+
+### Dataset Curators
+
+Saif M. Mohammad, Felipe Bravo-Marquez, Mohammad Salameh and Svetlana Kiritchenko
+
+### Licensing Information
+
+See the official [Terms and Conditions](https://competitions.codalab.org/competitions/17751#learn_the_details-terms_and_conditions)
+
+### Citation Information
+
+@InProceedings{SemEval2018Task1,
+ author = {Mohammad, Saif M. and Bravo-Marquez, Felipe and Salameh, Mohammad and Kiritchenko, Svetlana},
+ title = {SemEval-2018 {T}ask 1: {A}ffect in Tweets},
+ booktitle = {Proceedings of International Workshop on Semantic Evaluation (SemEval-2018)},
+ address = {New Orleans, LA, USA},
+ year = {2018}} 
+
+### Contributions
+
+Thanks to [@maxpel](/~https://github.com/maxpel) for adding this dataset.
diff --git a/datasets/sem_eval_2018_task_1/dataset_infos.json b/datasets/sem_eval_2018_task_1/dataset_infos.json
@@ -0,0 +1 @@
+{"subtask5.english": {"description": " SemEval-2018 Task 1: Affect in Tweets: SubTask 5: Emotion Classification.\n This is a dataset for multilabel emotion classification for tweets.\n 'Given a tweet, classify it as 'neutral or no emotion' or as one, or more, of eleven given emotions that best represent the mental state of the tweeter.'\n It contains 22467 tweets in three languages manually annotated by crowdworkers using Best\u2013Worst Scaling.\n", "citation": "@InProceedings{SemEval2018Task1,\n author = {Mohammad, Saif M. and Bravo-Marquez, Felipe and Salameh, Mohammad and Kiritchenko, Svetlana},\n title = {SemEval-2018 {T}ask 1: {A}ffect in Tweets},\n booktitle = {Proceedings of International Workshop on Semantic Evaluation (SemEval-2018)},\n address = {New Orleans, LA, USA},\n year = {2018}}\n", "homepage": "https://competitions.codalab.org/competitions/17751", "license": "", "features": {"ID": {"dtype": "string", "id": null, "_type": "Value"}, "Tweet": {"dtype": "string", "id": null, "_type": "Value"}, "anger": {"dtype": "bool", "id": null, "_type": "Value"}, "anticipation": {"dtype": "bool", "id": null, "_type": "Value"}, "disgust": {"dtype": "bool", "id": null, "_type": "Value"}, "fear": {"dtype": "bool", "id": null, "_type": "Value"}, "joy": {"dtype": "bool", "id": null, "_type": "Value"}, "love": {"dtype": "bool", "id": null, "_type": "Value"}, "optimism": {"dtype": "bool", "id": null, "_type": "Value"}, "pessimism": {"dtype": "bool", "id": null, "_type": "Value"}, "sadness": {"dtype": "bool", "id": null, "_type": "Value"}, "surprise": {"dtype": "bool", "id": null, "_type": "Value"}, "trust": {"dtype": "bool", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "sem_eval2018_task1", "config_name": "subtask5.english", "version": {"version_str": "1.1.0", "description": null, "major": 1, "minor": 1, "patch": 0}, "splits": {"train": {"name": "train", "num_bytes": 809768, "num_examples": 6838, "dataset_name": "sem_eval2018_task1"}, "test": {"name": "test", "num_bytes": 384519, "num_examples": 3259, "dataset_name": "sem_eval2018_task1"}, "validation": {"name": "validation", "num_bytes": 104660, "num_examples": 886, "dataset_name": "sem_eval2018_task1"}}, "download_checksums": {"http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/E-c/English/2018-E-c-En-train.zip": {"num_bytes": 359408, "checksum": "7a64a0ffc7d54505ae6556d17d37ad56bd8817ef5724c6e3782909e3a3bca0ae"}, "http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/E-c/English/2018-E-c-En-dev.zip": {"num_bytes": 48375, "checksum": "3279ba27452162b1ce0f58b23442ca3fb57c749c3dae7944cbda3ea0984c8a1e"}, "http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/AIT2018-TEST-DATA/semeval2018englishtestfiles/2018-E-c-En-test.zip": {"num_bytes": 174899, "checksum": "9afa650190d749561749348e360fd1fc0d0a80c5f374d12cc5ef4b9a9ffc4430"}}, "download_size": 582682, "post_processing_size": null, "dataset_size": 1298947, "size_in_bytes": 1881629}, "subtask5.spanish": {"description": " SemEval-2018 Task 1: Affect in Tweets: SubTask 5: Emotion Classification.\n This is a dataset for multilabel emotion classification for tweets.\n 'Given a tweet, classify it as 'neutral or no emotion' or as one, or more, of eleven given emotions that best represent the mental state of the tweeter.'\n It contains 22467 tweets in three languages manually annotated by crowdworkers using Best\u2013Worst Scaling.\n", "citation": "@InProceedings{SemEval2018Task1,\n author = {Mohammad, Saif M. and Bravo-Marquez, Felipe and Salameh, Mohammad and Kiritchenko, Svetlana},\n title = {SemEval-2018 {T}ask 1: {A}ffect in Tweets},\n booktitle = {Proceedings of International Workshop on Semantic Evaluation (SemEval-2018)},\n address = {New Orleans, LA, USA},\n year = {2018}}\n", "homepage": "https://competitions.codalab.org/competitions/17751", "license": "", "features": {"ID": {"dtype": "string", "id": null, "_type": "Value"}, "Tweet": {"dtype": "string", "id": null, "_type": "Value"}, "anger": {"dtype": "bool", "id": null, "_type": "Value"}, "anticipation": {"dtype": "bool", "id": null, "_type": "Value"}, "disgust": {"dtype": "bool", "id": null, "_type": "Value"}, "fear": {"dtype": "bool", "id": null, "_type": "Value"}, "joy": {"dtype": "bool", "id": null, "_type": "Value"}, "love": {"dtype": "bool", "id": null, "_type": "Value"}, "optimism": {"dtype": "bool", "id": null, "_type": "Value"}, "pessimism": {"dtype": "bool", "id": null, "_type": "Value"}, "sadness": {"dtype": "bool", "id": null, "_type": "Value"}, "surprise": {"dtype": "bool", "id": null, "_type": "Value"}, "trust": {"dtype": "bool", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "sem_eval2018_task1", "config_name": "subtask5.spanish", "version": {"version_str": "1.1.0", "description": null, "major": 1, "minor": 1, "patch": 0}, "splits": {"train": {"name": "train", "num_bytes": 362549, "num_examples": 3561, "dataset_name": "sem_eval2018_task1"}, "test": {"name": "test", "num_bytes": 288692, "num_examples": 2854, "dataset_name": "sem_eval2018_task1"}, "validation": {"name": "validation", "num_bytes": 67259, "num_examples": 679, "dataset_name": "sem_eval2018_task1"}}, "download_checksums": {"http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/E-c/Spanish/2018-E-c-Es-train.zip": {"num_bytes": 156975, "checksum": "28547e933b3087b8a82d7997e15021ef2f3680f6a1b134ca41766ce44034a276"}, "http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/E-c/Spanish/2018-E-c-Es-dev.zip": {"num_bytes": 30152, "checksum": "399cd39ae7dc00b11b2f319dfbb9360614e86c92898318fdfd06af46a81f5ebe"}, "http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/AIT2018-TEST-DATA/semeval2018spanishtestfiles/2018-E-c-Es-test.zip": {"num_bytes": 126924, "checksum": "3909e38a167ec40250b0b78f254e03fc3fb79ac7790bce6b695ef273a1d289d1"}}, "download_size": 314051, "post_processing_size": null, "dataset_size": 718500, "size_in_bytes": 1032551}, "subtask5.arabic": {"description": " SemEval-2018 Task 1: Affect in Tweets: SubTask 5: Emotion Classification.\n This is a dataset for multilabel emotion classification for tweets.\n 'Given a tweet, classify it as 'neutral or no emotion' or as one, or more, of eleven given emotions that best represent the mental state of the tweeter.'\n It contains 22467 tweets in three languages manually annotated by crowdworkers using Best\u2013Worst Scaling.\n", "citation": "@InProceedings{SemEval2018Task1,\n author = {Mohammad, Saif M. and Bravo-Marquez, Felipe and Salameh, Mohammad and Kiritchenko, Svetlana},\n title = {SemEval-2018 {T}ask 1: {A}ffect in Tweets},\n booktitle = {Proceedings of International Workshop on Semantic Evaluation (SemEval-2018)},\n address = {New Orleans, LA, USA},\n year = {2018}}\n", "homepage": "https://competitions.codalab.org/competitions/17751", "license": "", "features": {"ID": {"dtype": "string", "id": null, "_type": "Value"}, "Tweet": {"dtype": "string", "id": null, "_type": "Value"}, "anger": {"dtype": "bool", "id": null, "_type": "Value"}, "anticipation": {"dtype": "bool", "id": null, "_type": "Value"}, "disgust": {"dtype": "bool", "id": null, "_type": "Value"}, "fear": {"dtype": "bool", "id": null, "_type": "Value"}, "joy": {"dtype": "bool", "id": null, "_type": "Value"}, "love": {"dtype": "bool", "id": null, "_type": "Value"}, "optimism": {"dtype": "bool", "id": null, "_type": "Value"}, "pessimism": {"dtype": "bool", "id": null, "_type": "Value"}, "sadness": {"dtype": "bool", "id": null, "_type": "Value"}, "surprise": {"dtype": "bool", "id": null, "_type": "Value"}, "trust": {"dtype": "bool", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "sem_eval2018_task1", "config_name": "subtask5.arabic", "version": {"version_str": "1.1.0", "description": null, "major": 1, "minor": 1, "patch": 0}, "splits": {"train": {"name": "train", "num_bytes": 414458, "num_examples": 2278, "dataset_name": "sem_eval2018_task1"}, "test": {"name": "test", "num_bytes": 278715, "num_examples": 1518, "dataset_name": "sem_eval2018_task1"}, "validation": {"name": "validation", "num_bytes": 105452, "num_examples": 585, "dataset_name": "sem_eval2018_task1"}}, "download_checksums": {"http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/E-c/Arabic/2018-E-c-Ar-train.zip": {"num_bytes": 142792, "checksum": "cd25acadaf262e1e8dfb27c4d12f392ccb9caf648933a183fc0c83255a86f4a1"}, "http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/E-c/Arabic/2018-E-c-Ar-dev.zip": {"num_bytes": 37428, "checksum": "177e1eee9967cd5dd4b4853ef0cde694b9c20a7b4eb8bfbcb82b11d53cbd30f9"}, "http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/AIT2018-TEST-DATA/semeval2018arabictestfiles/2018-E-c-Ar-test.zip": {"num_bytes": 97606, "checksum": "4f1fc9f082c08c29b0acec180ebcb10ff425b96c117d8aa86a13ea092fce59f3"}}, "download_size": 277826, "post_processing_size": null, "dataset_size": 798625, "size_in_bytes": 1076451}}
diff --git a/datasets/sem_eval_2018_task_1/dummy/subtask5.arabic/1.1.0/dummy_data.zip b/datasets/sem_eval_2018_task_1/dummy/subtask5.arabic/1.1.0/dummy_data.zip
diff --git a/datasets/sem_eval_2018_task_1/dummy/subtask5.english/1.1.0/dummy_data.zip b/datasets/sem_eval_2018_task_1/dummy/subtask5.english/1.1.0/dummy_data.zip
diff --git a/datasets/sem_eval_2018_task_1/dummy/subtask5.spanish/1.1.0/dummy_data.zip b/datasets/sem_eval_2018_task_1/dummy/subtask5.spanish/1.1.0/dummy_data.zip
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{"subtask5.english": {"description": " SemEval-2018 Task 1: Affect in Tweets: SubTask 5: Emotion Classification.\n This is a dataset for multilabel emotion classification for tweets.\n 'Given a tweet, classify it as 'neutral or no emotion' or as one, or more, of eleven given emotions that best represent the mental state of the tweeter.'\n It contains 22467 tweets in three languages manually annotated by crowdworkers using Best\u2013Worst Scaling.\n", "citation": "@InProceedings{SemEval2018Task1,\n author = {Mohammad, Saif M. and Bravo-Marquez, Felipe and Salameh, Mohammad and Kiritchenko, Svetlana},\n title = {SemEval-2018 {T}ask 1: {A}ffect in Tweets},\n booktitle = {Proceedings of International Workshop on Semantic Evaluation (SemEval-2018)},\n address = {New Orleans, LA, USA},\n year = {2018}}\n", "homepage": "https://competitions.codalab.org/competitions/17751", "license": "", "features": {"ID": {"dtype": "string", "id": null, "_type": "Value"}, "Tweet": {"dtype": "string", "id": null, "_type": "Value"}, "anger": {"dtype": "bool", "id": null, "_type": "Value"}, "anticipation": {"dtype": "bool", "id": null, "_type": "Value"}, "disgust": {"dtype": "bool", "id": null, "_type": "Value"}, "fear": {"dtype": "bool", "id": null, "_type": "Value"}, "joy": {"dtype": "bool", "id": null, "_type": "Value"}, "love": {"dtype": "bool", "id": null, "_type": "Value"}, "optimism": {"dtype": "bool", "id": null, "_type": "Value"}, "pessimism": {"dtype": "bool", "id": null, "_type": "Value"}, "sadness": {"dtype": "bool", "id": null, "_type": "Value"}, "surprise": {"dtype": "bool", "id": null, "_type": "Value"}, "trust": {"dtype": "bool", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "sem_eval2018_task1", "config_name": "subtask5.english", "version": {"version_str": "1.1.0", "description": null, "major": 1, "minor": 1, "patch": 0}, "splits": {"train": {"name": "train", "num_bytes": 809768, "num_examples": 6838, "dataset_name": "sem_eval2018_task1"}, "test": {"name": "test", "num_bytes": 384519, "num_examples": 3259, "dataset_name": "sem_eval2018_task1"}, "validation": {"name": "validation", "num_bytes": 104660, "num_examples": 886, "dataset_name": "sem_eval2018_task1"}}, "download_checksums": {"http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/E-c/English/2018-E-c-En-train.zip": {"num_bytes": 359408, "checksum": "7a64a0ffc7d54505ae6556d17d37ad56bd8817ef5724c6e3782909e3a3bca0ae"}, "http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/E-c/English/2018-E-c-En-dev.zip": {"num_bytes": 48375, "checksum": "3279ba27452162b1ce0f58b23442ca3fb57c749c3dae7944cbda3ea0984c8a1e"}, "http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/AIT2018-TEST-DATA/semeval2018englishtestfiles/2018-E-c-En-test.zip": {"num_bytes": 174899, "checksum": "9afa650190d749561749348e360fd1fc0d0a80c5f374d12cc5ef4b9a9ffc4430"}}, "download_size": 582682, "post_processing_size": null, "dataset_size": 1298947, "size_in_bytes": 1881629}, "subtask5.spanish": {"description": " SemEval-2018 Task 1: Affect in Tweets: SubTask 5: Emotion Classification.\n This is a dataset for multilabel emotion classification for tweets.\n 'Given a tweet, classify it as 'neutral or no emotion' or as one, or more, of eleven given emotions that best represent the mental state of the tweeter.'\n It contains 22467 tweets in three languages manually annotated by crowdworkers using Best\u2013Worst Scaling.\n", "citation": "@InProceedings{SemEval2018Task1,\n author = {Mohammad, Saif M. and Bravo-Marquez, Felipe and Salameh, Mohammad and Kiritchenko, Svetlana},\n title = {SemEval-2018 {T}ask 1: {A}ffect in Tweets},\n booktitle = {Proceedings of International Workshop on Semantic Evaluation (SemEval-2018)},\n address = {New Orleans, LA, USA},\n year = {2018}}\n", "homepage": "https://competitions.codalab.org/competitions/17751", "license": "", "features": {"ID": {"dtype": "string", "id": null, "_type": "Value"}, "Tweet": {"dtype": "string", "id": null, "_type": "Value"}, "anger": {"dtype": "bool", "id": null, "_type": "Value"}, "anticipation": {"dtype": "bool", "id": null, "_type": "Value"}, "disgust": {"dtype": "bool", "id": null, "_type": "Value"}, "fear": {"dtype": "bool", "id": null, "_type": "Value"}, "joy": {"dtype": "bool", "id": null, "_type": "Value"}, "love": {"dtype": "bool", "id": null, "_type": "Value"}, "optimism": {"dtype": "bool", "id": null, "_type": "Value"}, "pessimism": {"dtype": "bool", "id": null, "_type": "Value"}, "sadness": {"dtype": "bool", "id": null, "_type": "Value"}, "surprise": {"dtype": "bool", "id": null, "_type": "Value"}, "trust": {"dtype": "bool", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "sem_eval2018_task1", "config_name": "subtask5.spanish", "version": {"version_str": "1.1.0", "description": null, "major": 1, "minor": 1, "patch": 0}, "splits": {"train": {"name": "train", "num_bytes": 362549, "num_examples": 3561, "dataset_name": "sem_eval2018_task1"}, "test": {"name": "test", "num_bytes": 288692, "num_examples": 2854, "dataset_name": "sem_eval2018_task1"}, "validation": {"name": "validation", "num_bytes": 67259, "num_examples": 679, "dataset_name": "sem_eval2018_task1"}}, "download_checksums": {"http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/E-c/Spanish/2018-E-c-Es-train.zip": {"num_bytes": 156975, "checksum": "28547e933b3087b8a82d7997e15021ef2f3680f6a1b134ca41766ce44034a276"}, "http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/E-c/Spanish/2018-E-c-Es-dev.zip": {"num_bytes": 30152, "checksum": "399cd39ae7dc00b11b2f319dfbb9360614e86c92898318fdfd06af46a81f5ebe"}, "http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/AIT2018-TEST-DATA/semeval2018spanishtestfiles/2018-E-c-Es-test.zip": {"num_bytes": 126924, "checksum": "3909e38a167ec40250b0b78f254e03fc3fb79ac7790bce6b695ef273a1d289d1"}}, "download_size": 314051, "post_processing_size": null, "dataset_size": 718500, "size_in_bytes": 1032551}, "subtask5.arabic": {"description": " SemEval-2018 Task 1: Affect in Tweets: SubTask 5: Emotion Classification.\n This is a dataset for multilabel emotion classification for tweets.\n 'Given a tweet, classify it as 'neutral or no emotion' or as one, or more, of eleven given emotions that best represent the mental state of the tweeter.'\n It contains 22467 tweets in three languages manually annotated by crowdworkers using Best\u2013Worst Scaling.\n", "citation": "@InProceedings{SemEval2018Task1,\n author = {Mohammad, Saif M. and Bravo-Marquez, Felipe and Salameh, Mohammad and Kiritchenko, Svetlana},\n title = {SemEval-2018 {T}ask 1: {A}ffect in Tweets},\n booktitle = {Proceedings of International Workshop on Semantic Evaluation (SemEval-2018)},\n address = {New Orleans, LA, USA},\n year = {2018}}\n", "homepage": "https://competitions.codalab.org/competitions/17751", "license": "", "features": {"ID": {"dtype": "string", "id": null, "_type": "Value"}, "Tweet": {"dtype": "string", "id": null, "_type": "Value"}, "anger": {"dtype": "bool", "id": null, "_type": "Value"}, "anticipation": {"dtype": "bool", "id": null, "_type": "Value"}, "disgust": {"dtype": "bool", "id": null, "_type": "Value"}, "fear": {"dtype": "bool", "id": null, "_type": "Value"}, "joy": {"dtype": "bool", "id": null, "_type": "Value"}, "love": {"dtype": "bool", "id": null, "_type": "Value"}, "optimism": {"dtype": "bool", "id": null, "_type": "Value"}, "pessimism": {"dtype": "bool", "id": null, "_type": "Value"}, "sadness": {"dtype": "bool", "id": null, "_type": "Value"}, "surprise": {"dtype": "bool", "id": null, "_type": "Value"}, "trust": {"dtype": "bool", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "sem_eval2018_task1", "config_name": "subtask5.arabic", "version": {"version_str": "1.1.0", "description": null, "major": 1, "minor": 1, "patch": 0}, "splits": {"train": {"name": "train", "num_bytes": 414458, "num_examples": 2278, "dataset_name": "sem_eval2018_task1"}, "test": {"name": "test", "num_bytes": 278715, "num_examples": 1518, "dataset_name": "sem_eval2018_task1"}, "validation": {"name": "validation", "num_bytes": 105452, "num_examples": 585, "dataset_name": "sem_eval2018_task1"}}, "download_checksums": {"http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/E-c/Arabic/2018-E-c-Ar-train.zip": {"num_bytes": 142792, "checksum": "cd25acadaf262e1e8dfb27c4d12f392ccb9caf648933a183fc0c83255a86f4a1"}, "http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/E-c/Arabic/2018-E-c-Ar-dev.zip": {"num_bytes": 37428, "checksum": "177e1eee9967cd5dd4b4853ef0cde694b9c20a7b4eb8bfbcb82b11d53cbd30f9"}, "http://saifmohammad.com/WebDocs/AIT-2018/AIT2018-DATA/AIT2018-TEST-DATA/semeval2018arabictestfiles/2018-E-c-Ar-test.zip": {"num_bytes": 97606, "checksum": "4f1fc9f082c08c29b0acec180ebcb10ff425b96c117d8aa86a13ea092fce59f3"}}, "download_size": 277826, "post_processing_size": null, "dataset_size": 798625, "size_in_bytes": 1076451}}