Further update task domains & categories; Add Utility Scripts #436

garyhlai · 2021-10-13T05:19:21Z

update domains and categories for tasks in mctao and cosmosqa
add script for auto adding domains for all task files of a given dataset
modify the test script to allow testing only a range of tasks (instead of ALL tasks)

danyaljj · 2021-10-13T07:01:22Z

README.md

@@ -68,6 +68,7 @@ We would appreciate any external contributions! 🙏
    * If you're building your tasks based existing datasets and their crowdsourcing templates, see these [guidelines](doc/crowdsourcing.md). 
 * Add your task to [our list of tasks](tasks/README.md).
 * To make sure that your addition is formatted correctly, run the tests: `> python src/test_all.py`
+ * To only test the formatting of a range of tasks, run `> python src/test_all.py --task <begin_task_number> <end_task_number>`. For example, running `> python src/test_all.py --task 5 10` will run the test from task005 to task010.


Nice! Make it a sub-item of the earlier item? (i.e., indent it to the right).

danyaljj · 2021-10-13T07:02:48Z

tasks/task003_mctaco_question_generation_event_duration.json

@@ -7,9 +7,9 @@
        "mctaco"
    ],
    "Categories": [
-        "Question Generation"
+        "Contextual Question Generation"


This also involves temporal reasoing:
Reasoning -> Temporal Reasoning

danyaljj · 2021-10-13T07:03:23Z

tasks/task004_mctaco_answer_generation_event_duration.json

@@ -7,9 +7,9 @@
        "mctaco"
    ],
    "Categories": [
-        "Answer Generation"
+        "Answer Generation -> Commonsense Question Answering"


Also involves temporal reasoning.

(this applies to most of the tasks here)

So would the "Categories" for this be

"Categories": [
"Answer Generation -> Commonsense Question Answering",
"Reasoning -> Temporal Reasoning",
"Reasoning -> Commonsense Reasoning"
]

since temporal reasoning can be considered a type of commonsense reasoning too?

What I'm trying to get at is whether some categories are mutually exclusive (e.g. "Reasoning -> Temporal Reasoning" vs. "Reasoning -> Commonsense Reasoning")?

Yeah, your list of categories make sense to me.
Basically, we should try to indicate all the category labels that apply to each task.

danyaljj · 2021-10-13T07:04:47Z

tasks/task022_cosmosqa_passage_inappropriate_binary.json

@@ -3038,5 +3038,8 @@
    ],
    "Instruction_language": [
        "English"
+    ],
+    "Domains": [


We're not changing its task type?

would this be "Categories": ["Classification", "Reasoning -> Commonsense Reasoning", "Reasoning -> Logical Reasoning"]?

I am not sure about "Logical Reasoning" (which is meant to indicate conclusions made using logic).

danyaljj · 2021-10-13T07:06:03Z

tasks/task023_cosmosqa_question_generation.json

@@ -7,7 +7,7 @@
        "cosmosqa"
    ],
    "Categories": [
-        "Question Generation"
+        "Contextual Question Generation"


One might say that this also involves Reasoning -> Commonsense Reasoning according to the instructions.

(applies to more tasks below).

danyaljj · 2021-10-13T07:07:51Z

src/auto_add_domain.py

@@ -0,0 +1,48 @@
+import json


since this is an ad hoc function (used only for this annotation), I would prefer if we don't check this in.

I added it because it seems that it could be useful for other people working on #4 who would likely need it.

garyhlai · 2021-10-13T21:34:59Z

@danyaljj all your concerns should be addressed in my latest push. I have updated all task files accordingly. Note that I modified the task-hierarchy.md:

- Grammar Error
  -  Grammar Error -> Grammar Error Correction
  -  Grammar Error -> Grammar Error Detection

because task021 seems like a "grammar" task as opposed to simply "classification".

Do you encourage me to further modify the task-hierarchy.md when suitable, or should I keep it fixed?

danyaljj · 2021-10-13T21:40:02Z

Do you encourage me to further modify the task-hierarchy.md when suitable, or should I keep it fixed?

Yes, that's alright (and somewhat encouraged)! We should continue massaging the hierarchy so that it best describes the space of our tasks.

danyaljj · 2021-10-13T21:49:57Z

Overall, the changes look good to me. Leaving it for others to review, before merging it.

swarooprm · 2021-10-13T23:04:23Z

@ghlai9665 We need to have the category "Answer Generation -> Open Question Answering" for all answer generation tasks in MCTACO and CosmosQA as the answer to those questions does not lie in the associated context.

task001 and 002 (Quoref) need to have category: "Answer Generation -> Contextual Question Answering" and "Coreference -> Entity Coreference", they also can potentially have "Text Span Selection category".

I assume the domains are taken from the associated paper.

On a different note: In hindsight, I feel we missed creating some tasks e.g. MCTACO is originally an MCQ dataset, but we don't have an MCQ task associated with MCTACO. Should we have some action items there? @danyaljj

danyaljj · 2021-10-13T23:15:42Z

@ghlai9665 We need to have the category "Answer Generation -> Open Question Answering" for all answer generation tasks in MCTACO and CosmosQA as the answer to those questions does not lie in the associated context.

@swarooprm I am pretty sure CosmosQA questions come with paragraphs.

On a different note: In hindsight, I feel we missed creating some tasks e.g. MCTACO is originally an MCQ dataset, but we don't have an MCQ task associated with MCTACO. Should we have some action items there? @danyaljj

Let's document this as an issue.

swarooprm · 2021-10-13T23:24:06Z

CosmosQA has context, but answer does not lie within the context (similar to MCTACO)
e.g. https://instructions.apps.allenai.org/dataset_viewer?file=subtask024_cosmosqa_answer_generation.json
Context: I was told, in person over the phone, that my shoes were on their way. They have my money. I have no shoes. Question: What may happen after the call?
Answer: I will return the shoes and get back the money.

danyaljj · 2021-10-13T23:37:50Z

CosmosQA has context, but answer does not lie within the context (similar to MCTACO) e.g. https://instructions.apps.allenai.org/dataset_viewer?file=subtask024_cosmosqa_answer_generation.json Context: I was told, in person over the phone, that my shoes were on their way. They have my money. I have no shoes. Question: What may happen after the call? Answer: I will return the shoes and get back the money.

Assuming that "context" in Answer Generation -> Contextual Question Answering refers to the existence of a context paragraph, this tag certainly applies to CosmosQA.

Whether the answer is in a given context is not is a different issue (extractive vs abstractive answer generation). We can create other tags to separate these two.

garyhlai · 2021-10-13T23:55:03Z

@ghlai9665 We need to have the category "Answer Generation -> Open Question Answering" for all answer generation tasks in MCTACO and CosmosQA as the answer to those questions does not lie in the associated context.

task001 and 002 (Quoref) need to have category: "Answer Generation -> Contextual Question Answering" and "Coreference -> Entity Coreference", they also can potentially have "Text Span Selection category".

I assume the domains are taken from the associated paper.

On a different note: In hindsight, I feel we missed creating some tasks e.g. MCTACO is originally an MCQ dataset, but we don't have an MCQ task associated with MCTACO. Should we have some action items there? @danyaljj

@swarooprm Done. What you said about task002 makes sense -- task001 is a bit different though because it's question generation. I have update all tasks accordingly.

swarooprm · 2021-10-14T00:20:26Z

Looks good @ghlai9665
Based on @danyaljj's comment, here are my suggestions:

Let's change MCTACO and CosmosQA categories to "contextual question answering" (in place of "Open Question Answering")
In the task hierarchy, create two new categories
- answer generation-->extractive
- answer generation-->abstractive.
Add "answer generation-->extractive" for Quoref and "answer generation-->abstractive" for MCTACO and CosmosQA as answers to these questions can not be extracted from the context.

garyhlai · 2021-10-14T02:25:02Z

Looks good @ghlai9665 Based on @danyaljj's comment, here are my suggestions:

Let's change MCTACO and CosmosQA categories to "contextual question answering" (in place of "Open Question Answering")

In the task hierarchy, create two new categories

answer generation-->extractive

answer generation-->abstractive.

Add "answer generation-->extractive" for Quoref and "answer generation-->abstractive" for MCTACO and CosmosQA as answers to these questions can not be extracted from the context.

Then it looks like every task would either be answer generation-->extractive or answer generation-->abstractive. So for task021, it's a grammar task so it would be answer generation-->extractive?

swarooprm · 2021-10-14T02:32:56Z

task021 is a classification task, so no need to add this category ("answer generation-->extractive").
Any category of answer generation (answer generation--->X) should be added only to answer generation tasks. Let's not add it to other tasks such as question generation, classification etc.
Do you agree? @danyaljj

… accordingly

garyhlai · 2021-10-14T02:47:01Z

@swarooprm yeah technically anything can be reformulated in the format of QA and classification can be seen as answer generation. So I'm not sure about this either @danyaljj

danyaljj · 2021-10-14T05:05:50Z

doc/task-hierarchy.md

@@ -8,6 +8,8 @@
  - `Answer Generation -> Fill in the Blank`
  - `Answer Generation -> Multiple Choice Question Answering`
  - `Answer Generation -> Open Question Answering`
+  - `Answer Generation -> Extractive`
+  - `Answer Generation -> Abstractive`


Since "extractive" and "abstractive" are only defined in the context of "contextual qa", it might make sense to make them nested:

Answer Generation -> Contextual Question Answering

Answer Generation -> Contextual Question Answering -> Extractive

Answer Generation -> Contextual Question Answering -> Abstractive

What do you all think? @ghlai9665 @swarooprm

yes, makes sense to me.

danyaljj · 2021-10-14T05:09:56Z

task021 is a classification task ...

I tend to agree w/ Swaroop on this. The task is formulated like a classification.

danyaljj · 2021-10-14T05:17:01Z

Part of the confusion might be due to the naming of Answer Generation (which sounds a bit open-ended). A better naming is probably Question Answer Generation or simply Question Answering.

garyhlai · 2021-10-15T04:24:59Z

I feel like categories related to QA can be better structured. I think we can do something like:

Question Answering
- Question Answering -> Answer Generation
  - Question Answering -> Answer Generation -> Commonsense Question Answering
  - Question Answering -> Answer Generation -> Contextual Question Answering
  - Question Answering -> Answer Generation -> Contextual Question Answering -> Extractive
  - Question Answering -> Answer Generation -> Contextual Question Answering -> Abstractive
  - Question Answering -> Answer Generation -> Fill in the Blank
  - Question Answering -> Answer Generation -> Multiple Choice Question Answering
  - Question Answering -> Answer Generation -> Open Question Answering
  - Question Answering -> Answer Generation -> Incorrect Answer Generation
- Question Answering -> Question Generation
  - Question Answering -> Question Generation -> Contextual Question Generation
  - Question Answering -> Question Generation -> Question Composition

This should cover most of the QA-related categories, and we've agreed that classification categories would be mutually exclusive with these QA categories. What do you think? @swarooprm @danyaljj

danyaljj · 2021-10-15T16:54:40Z

I mostly like your structure, except that I wouldn't make Question Generation a sub-category of Question Answering.
How about this? And I thin you can merge Answer Generation and Question Answering.

Question Answering
- Question Answering -> Commonsense Question Answering
- Question Answering -> Contextual Question Answering
  - Question Answering -> Contextual Question Answering -> Extractive
  - Question Answering -> Contextual Question Answering -> Abstractive
- Question Answering -> Fill in the Blank
- Question Answering -> Multiple Choice Question Answering
- Question Answering -> Open Question Answering
- Question Answering -> Incorrect Answer Generation
Question Generation
- Question Generation -> Contextual Question Generation
- Question Generation -> Question Composition

I am actually not sure about the following:

we've agreed that classification categories would be mutually exclusive with these QA categories

A task might be QA and classification at the same time.

swarooprm · 2021-10-15T17:02:53Z

I mostly like your structure, except that I wouldn't make Question Generation a sub-category of Question Answering. How about this? And I thin you can merge Answer Generation and Question Answering.

Question Answering

Question Answering -> Commonsense Question Answering

Question Answering -> Contextual Question Answering
**- Question Answering -> Contextual Question Answering -> Extractive

Question Answering -> Contextual Question Answering -> Abstractive**

Question Answering -> Fill in the Blank

Question Answering -> Multiple Choice Question Answering

Question Answering -> Open Question Answering

Question Answering -> Incorrect Answer Generation

Question Generation

Question Generation -> Contextual Question Generation

Question Generation -> Question Composition

I am actually not sure about the following:

we've agreed that classification categories would be mutually exclusive with these QA categories

A task might be QA and classification at the same time.

Looks good to me.

…ories

garyhlai · 2021-10-15T22:04:52Z

@danyaljj @swarooprm Task hierarchy is updated and all task files accordingly. Let me know how everything looks and I'll continue for the rest of the tasks.

danyaljj · 2021-10-18T00:15:07Z

Thanks!

For task001_quoref_question_generation, we currently have:

    "Categories": [
        "Contextual Question Generation",
        "Coreference -> Entity Coreference",
        "Question Answering -> Contextual Question Answering -> Extractive"
    ],

since that is about generating questions (not answering them), we should drop "Question Answering -> Contextual Question Answering -> Extractive" and add Question Generation -> Contextual Question Generation.

garyhlai · 2021-10-18T02:30:18Z

Thanks!

For task001_quoref_question_generation, we currently have:
    "Categories": [
        "Contextual Question Generation",
        "Coreference -> Entity Coreference",
        "Question Answering -> Contextual Question Answering -> Extractive"
    ],
since that is about generating questions (not answering them), we should drop "Question Answering -> Contextual Question Answering -> Extractive" and add Question Generation -> Contextual Question Generation.

Good catch! I updated the test to extract the categories better from Task-Hierarchy -- now it raises a warning when it's just Contextual Question Generation instead of Question Generation -> Contextual Question Generation.

danyaljj · 2021-10-18T04:05:40Z

Let's set another convention: if a category is used, let's not include its parent category. Say, the question is of type "Question Answering -> Contextual Question Answering -> Abstractive", let's not mention its parent category of "Question Answering -> Contextual Question Answering".

garyhlai · 2021-10-18T04:57:00Z

@danyaljj You mean just in the Task-Hierarchy file or in every task file? Wouldn't it feel a little strange (and not too informative) to just say "Abstractive" in the category for the task files?

danyaljj · 2021-10-18T09:03:51Z

@danyaljj You mean just in the Task-Hierarchy file or in every task file? Wouldn't it feel a little strange (and not too informative) to just say "Abstractive" in the category for the task files?

Looks like I didn't explain myself properly: What I'm saying is that, instead of writing, say:

    "Categories": [
        "Question Answering -> Contextual Question Answering, 
        "Question Answering -> Contextual Question Answering -> Extractive"
    ],

drop the parent category and write:

    "Categories": [
        "Question Answering -> Contextual Question Answering -> Extractive"
    ],

Does that make sense?

…ategories when it's incorrect answer generation

…natural-instructions-expansion into task-hierarchy-update

danyaljj · 2021-10-18T17:05:48Z

tasks/task002_quoref_answer_generation.json

@@ -7,7 +7,6 @@
        "quoref"
    ],
    "Categories": [
-        "Question Answering -> Contextual Question Answering",


danyaljj · 2021-10-18T17:15:06Z

tasks/task028_drop_answer_generation.json

@@ -7,7 +7,6 @@
        "drop"
    ],
    "Categories": [
-        "Question Answering",


danyaljj · 2021-10-18T17:16:50Z

tasks/task011_mctaco_wrong_answer_generation_event_ordering.json

@@ -7,7 +7,10 @@
        "mctaco"
    ],
    "Categories": [
-        "Incorrect Answer Generation"
+        "Question Answering -> Incorrect Answer Generation",
+        "Question Answering -> Contextual Question Answering -> Abstractive",


Should this be here? Didn't have it for tasks/task008_mctaco_wrong_answer_generation_transient_stationary.json

you're right. Just fixed. Thanks!

danyaljj · 2021-10-18T20:34:20Z

Looks good to me, thanks!

Haven't heard back from @swarooprm yet and I am not sure when we will. I think we should move on with the PR. If there is anything missing, we can address it in another one.

garyhlai added 4 commits October 12, 2021 21:55

update domains for mctaco tasks

698925b

add script for auto adding domains for all task files of a given dataset

6f7487a

update domains and categories for mctao and cosmosqa

5c5c55c

add a script to only test a range of tasks

098e26c

danyaljj reviewed Oct 13, 2021

View reviewed changes

garyhlai added 2 commits October 13, 2021 10:50

revise README for test_all.py

79c63e1

refine task hierarchy & update task files

1c1bdf3

further update task files

855c0f1

add categories answer generation abstractive/extractive, update tasks…

80c4765

… accordingly

danyaljj reviewed Oct 14, 2021

View reviewed changes

update categories & domains for drop dataset tasks

851302c

garyhlai added 2 commits October 15, 2021 15:49

update task hiearachy for QA-related categories

2cd51cd

update task files based on updatd task hiearachy for QA-related categ…

13f9967

…ories

garyhlai added 2 commits October 17, 2021 21:07

update contextual question generation categories

a5b2da5

update test to extract categories better from readme

b8920a0

Merge branch 'master' into task-hierarchy-update

3cb5ab5

garyhlai added 2 commits October 18, 2021 11:27

delete parent categories when leaf category exists; delete other qa c…

33fac4b

…ategories when it's incorrect answer generation

Merge branch 'task-hierarchy-update' of /~https://github.com/ghlai9665/…

7b295aa

…natural-instructions-expansion into task-hierarchy-update

danyaljj reviewed Oct 18, 2021

View reviewed changes

tasks/task028_drop_answer_generation.json Outdated

@@ -7,7 +7,6 @@

"drop"

],

"Categories": [

"Question Answering",

Copy link

Contributor

danyaljj Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

danyaljj reviewed Oct 18, 2021

View reviewed changes

fix task011

38da451

danyaljj merged commit ff5f77d into allenai:master Oct 18, 2021

Further update task domains & categories; Add Utility Scripts #436

Further update task domains & categories; Add Utility Scripts #436

Conversation

garyhlai commented Oct 13, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garyhlai Oct 13, 2021 • edited Loading

Choose a reason for hiding this comment

garyhlai commented Oct 13, 2021 • edited Loading

danyaljj commented Oct 13, 2021

danyaljj commented Oct 13, 2021

swarooprm commented Oct 13, 2021 • edited Loading

danyaljj commented Oct 13, 2021

swarooprm commented Oct 13, 2021 • edited Loading

danyaljj commented Oct 13, 2021

garyhlai commented Oct 13, 2021

swarooprm commented Oct 14, 2021

garyhlai commented Oct 14, 2021

swarooprm commented Oct 14, 2021

garyhlai commented Oct 14, 2021

danyaljj Oct 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danyaljj commented Oct 14, 2021 • edited Loading

danyaljj commented Oct 14, 2021

garyhlai commented Oct 15, 2021 • edited Loading

danyaljj commented Oct 15, 2021 • edited Loading

swarooprm commented Oct 15, 2021

garyhlai commented Oct 15, 2021

danyaljj commented Oct 18, 2021

garyhlai commented Oct 18, 2021 • edited Loading

danyaljj commented Oct 18, 2021

garyhlai commented Oct 18, 2021

danyaljj commented Oct 18, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garyhlai Oct 18, 2021 • edited Loading

Choose a reason for hiding this comment

danyaljj commented Oct 18, 2021

garyhlai Oct 13, 2021 •

edited

Loading

garyhlai commented Oct 13, 2021 •

edited

Loading

swarooprm commented Oct 13, 2021 •

edited

Loading

swarooprm commented Oct 13, 2021 •

edited

Loading

danyaljj Oct 14, 2021 •

edited

Loading

danyaljj commented Oct 14, 2021 •

edited

Loading

garyhlai commented Oct 15, 2021 •

edited

Loading

danyaljj commented Oct 15, 2021 •

edited

Loading

garyhlai commented Oct 18, 2021 •

edited

Loading

garyhlai Oct 18, 2021 •

edited

Loading