Update Hierarchy for Task 059-081 #474

garyhlai · 2021-10-23T23:08:48Z

updated the domains & categories for task 059-081
files for task 063-064 are missing
I temporarily removed task 078 because the definition seems wrong (not matching with instance)
some nontrivial reorganization of the task hierarchy

swarooprm · 2021-10-25T18:27:01Z

Comments regarding modification to task hierarchy: some really good additions!
Some comments:

With the addition ofQuestion Answering -> Multihop Question Answering, we now have some redundancies in question answering and reasoning categories, since we also have Reasoning -> Multihop Reasoning. But, I think this is fine. Just that, we should mention this upfront in the paper.
Can we keep Text Generation -> Structured Text Generation within Text Generation -> Sentence Generation, so the new category would look like Text Generation -> Sentence Generation-> Structured Text Generation?
Do we need Text Generation -> Long Text Generation -> Contextual Text Generation and Text Generation -> Long Text Generation -> Paragraph Generation both? I guess one will work (since this is long text, contextual should always be a paragraph, no?)

Regarding Domains:

I don't think we should have very granular domain names such as SQL etc., because this way the list will be huge. I would say let's create a domain "Program/Code" and there if you want we can have sub-categories like SQL, Stack Overflow.
Code->language->SQL
Code->language->Python
Code->repo->Stackoverflow
Code->repo->Github
Similarly, Iet's not make "Conceptnet" a separate domain. My suggestion to have the following category for conceptnet.
Commonsense->Concepts and Relations Then we have to change existing categories to Commonsense -> Concepts and Relations->Social Commonsense Commonsense -> Concepts and Relations->Physical Commonsense`

swarooprm · 2021-10-25T19:48:35Z

Task category assignment also looks good, some comments:

task061: Add "Text Span Selection"
task 062: we need to have a "text generation" category as well, say "text generation->long text generation-> Contextual Text Generation->templated text generation" (we need to add this category).
task 065: Add "Text Comparison"
task067-072: Add "Reasoning -> Abductive Reasoning" and "Textual Entailment"
task 068: Add "Question Answering -> Incorrect Answer Generation"
task 069-070: Add "Text Comparison"
task076: Wondering if we should have a category "Programming" and this task will come under "Programming->Syntax" (the other category you have in the task is good)
task 077: This task is loosely associated with semantic parsing. I am fine if we keep it, however, I would say let's also create a category "Program Synthesis" which better represent this task.
task 079: Along with the category you have, we can also have a category "Programming/Code->Strings".
task 080: Also add. Text Generation -> Long Text Generation -> Contextual Text Generation (wondering if we should have multiple subcategories here: Text Generation -> Long Text Generation -> Contextual Text Generation-> Story Generation, Text Generation -> Long Text Generation -> Contextual Text Generation->Process Description (applicable to this task, Text Generation -> Long Text Generation -> Contextual Text Generation->Templated text generation (task062)))
task081: Also, add "Reasoning -> Spatial Reasoning" since its there in task080

Feel free to let me know if you have any questions @ghlai9665

yeganehkordi · 2021-10-25T16:12:37Z

tasks/task059_ropes_story_generation.json

@@ -4,10 +4,13 @@
        "Daniel Khashabi"
    ],
    "Source": [
-        "ropes"
+        "ropes (https://allennlp.org/ropes)"


Please add the link to the paper as well. https://arxiv.org/pdf/1908.05852.pdf

That link includes the dataset download link + link to the paper. I don't see any particular reason to have both?

So that's fine.

yeganehkordi · 2021-10-25T16:23:47Z

tasks/README.md

@@ -80,7 +80,6 @@ Name | Summary | Category | Domain | Input Language | Output Language
 `task075_squad1.1_answer_generation`	| Generate answers to SQuAD 1.1 questions.	| Answer Generation -> Contextual Question Answering	| Wikipedia	| English	| English
 `task076_splash_correcting_sql_mistake` | Correct the mistake in a given SQL statement based on feedback. | Structured Query Generation, Text Modification
 `task077_splash_explanation_to_sql` | Generate a SQL statement based on a description of what the SQL statement does. | Structured Query Generation
-`task078_splash_sql_to_explanation` | Give a natural language description of what a given SQL statement does. | Structured Query Classification


Shouldn't we update the README based on the new categories?

There's a script auto_update_readme.py we can run once we finalized all the categories and such. Not modifying it for now to reduce review overhead

yeganehkordi · 2021-10-25T16:26:23Z

tasks/task059_ropes_story_generation.json

+        "Text Generation -> Long Text Generation -> Contextual Text Generation",
+        "Reasoning -> Causal Reasoning",
+        "Reasoning -> Qualitative Reasoning",
+        "Reasoning -> Commonsense Reasoning"
    ],


I think adding a "Relation detection" category makes sense here.

This is not a "relation detection" task if you look at the instance outputs, even though it requires reasoning about relationships (covered by "Reasoning -> Qualitative Reasoning" and "Reasoning -> Commonsense Reasoning"). @swarooprm what you think? We probably shouldn't add categories that only describe one part of the task as it complicates training, right?

yes, I think "Qualitative Reasoning" already covers the relation part (similar to https://arxiv.org/pdf/1811.08048.pdf) and I suggest let's not add "relation detection" here (we have some tasks in the repo where relation needs to be detected explicitly, we can add "relation detection" there)

yeganehkordi · 2021-10-25T18:39:34Z

tasks/task065_timetravel_consistent_sentence_classification.json

@@ -3,10 +3,12 @@
        "Neeraj Varshney"
    ],
    "Categories": [
-        "Classification"
+        "Classification",


I think it's not classification. It's Multiple Choice Question Answering.

The two are not mutually exclusive. Actually, almost all multiple choice question answering should also be classification as well. @swarooprm what do you think?

Yeah! They are similar. Although to avoid redundancy, I suggest considering classification when our classes have a definition by themselves (i.e., verification).

Considering how the tasks will be used (examining cross-task generalization), we are not too worried about redundancy as long as the category is apt

yes, redundancies are ok as long as they perfectly fit to the task, let's add Multiple Choice Question Answering also here. Good that @yeganehkordi pointed out.

@yeganehkordi good point bringing it up. After discussion with @swarooprm, classification: definition of classes is fixed; Multiple QA: answer options (definition of the class) varies from question to question. They're almost mutually exclusive. So in this case even if they're option 1 and option 2, their definitions are different. So this is multiple QA, not classification

Sounds good! I agree.

yeganehkordi · 2021-10-25T18:45:27Z

tasks/task067_abductivenli_answer_generation.json

@@ -6,7 +6,9 @@
        "Abductive NLI, https://arxiv.org/pdf/1908.05739.pdf"
    ],
    "Categories": [
-        "Answer Generation"


Maybe add Abductive Reasoning as well (just a guess based on the dataset name).

yeganehkordi · 2021-10-25T19:04:23Z

tasks/task070_abductivenli_incorrect_classification.json

+        "Reasoning -> Commonsense Reasoning",
+        "Reasoning -> Abductive Reasoning",
+        "Classification",
+        "Question Answering -> Incorrect Answer Generation"


I've added Question Answering -> Incorrect Answer Generation -> Multiple Choice Incorrect Answer Generation before. I think we need to separate incorrect answer generation tasks.
Also, to me, it's not a classification task.

I like this suggestion. We have 2 kinds of incorrect answer generation (1) where we generate incorrect answer, (2) where we choose an incorrect answer from the given options.
Let's add Question Answering -> Incorrect Answer Generation -> Multiple Choice Incorrect Answer Generation here
Also, please change task hierarchy file that incorporated 2 types of incorrect answer

yeganehkordi · 2021-10-25T20:07:39Z

tasks/task081_piqa_wrong_answer_generation.json

    ],
    "Categories": [
-        "Incorrect Answer Generation"
+        "Question Answering -> Incorrect Answer Generation",


I suggest adding Question Answering -> Incorrect Answer Generation -> Comprehensive Incorrect Answer Generation

I agree that Question Answering -> Incorrect Answer Generation can be made more granular, but I don't know what Comprehensive Incorrect Answer Generation means. Symmetrical to task080, we can change this task to Question Answering -> Incorrect Answer Generation -> Commonsense Question Answering

Maybe we could have:
Question Answering -> Incorrect Answer Generation

Question Answering -> Incorrect Answer Generation -> Commonsense Question Answer Generation

Question Answering -> Incorrect Answer Generation -> Contextual Question Answer Generation

Question Answering -> Incorrect Answer Generation -> Multiple Choice Answer Generation

yes good point.
Also, see if we need more granularities in Contextual Answer Generation here e.g. the following:
Question Answering -> Incorrect Answer Generation -> Contextual Question Answering -> Extractive
Question Answering -> Incorrect Answer Generation -> Contextual Question Answering -> Abstractive

Note that, more the depth in our hierarchy tree, better it will be for our experiments and also for our paper.

garyhlai · 2021-10-26T00:02:20Z

Comments regarding modification to task hierarchy: some really good additions! Some comments:

With the addition ofQuestion Answering -> Multihop Question Answering, we now have some redundancies in question answering and reasoning categories, since we also have Reasoning -> Multihop Reasoning. But, I think this is fine. Just that, we should mention this upfront in the paper.

Yeah, I think we can err on the side of redundancies for now, especially because each task can have multiple categories.

Can we keep Text Generation -> Structured Text Generation within Text Generation -> Sentence Generation, so the new category would look like Text Generation -> Sentence Generation-> Structured Text Generation?

I think structured text generation should be its own category because sentence generation is unstructured. So let's say the task is to generate a SQL, it can be Text Generation -> Structured Text Generation -> Code. If the task is to generate a table from a piece of unstructured text, it can be Text Generation -> Structured Text Generation -> Table

Do we need Text Generation -> Long Text Generation -> Contextual Text Generation and Text Generation -> Long Text Generation -> Paragraph Generation both? I guess one will work (since this is long text, contextual should always be a paragraph, no?)

I think the "Contextual" in Contextual Text Generation refers to the context based on which the output text is generated. e.g. background paragraph in task 059

Regarding Domains:

I don't think we should have very granular domain names such as SQL etc., because this way the list will be huge. I would say let's create a domain "Program/Code" and there if you want we can have sub-categories like SQL, Stack Overflow.
Code->language->SQL
Code->language->Python
Code->repo->Stackoverflow
Code->repo->Github

Sounds good

Similarly, Iet's not make "Conceptnet" a separate domain. My suggestion to have the following category for conceptnet.
Commonsense->Concepts and Relations Then we have to change existing categories to Commonsense -> Concepts and Relations->Social Commonsense Commonsense -> Concepts and Relations->Physical Commonsense`

Sounds good to me.

garyhlai · 2021-10-26T00:49:50Z

task 062: we need to have a "text generation" category as well, say "text generation->long text generation-> Contextual Text Generation->templated text generation" (we need to add this category).

Why would this be "long text generation"? Most outputs seem pretty short. Btw, do you think this can be "Command Execution" as well?

task 065: Add "Text Comparison"

Trying to understand why this is "Text Comparison". Is it because you're comparing Option 1 and Option 2, or you're comparing the option to the rest of the story?

task076: Wondering if we should have a category "Programming" and this task will come under "Programming->Syntax" (the other category you have in the task is good)

I feel like "programming" is more of a domain than a task in itself

task 077: This task is loosely associated with semantic parsing. I am fine if we keep it, however, I would say let's also create a category "Program Synthesis" which better represent this task.

I thought this task is actually best defined by semantic parsing ("the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning")? It asks you to convert the natural language to code.

What does "Program Synthesis" mean? Could you elaborate / provide an example or paper?

I think maybe instead of Program Synethsis, we can have:
Translation -> Natural Language to Natural Language
Translation -> Code to Natural Language
Translation -> Natural Language to Code (the applicable one for this particular task)
Translation -> Code to Code

task067-072: Add "Reasoning -> Abductive Reasoning" and "Textual Entailment"

What does "Textual Entailment" mean exactly? In particular, I'm not sure if we should have it for task070 (classification)

task 079: Along with the category you have, we can also have a category "Programming/Code->Strings".

I'm not sure about "Programming/Code->Strings" because it doesn't sound like a task. How about "Text Generation -> Code Execution"

task 080: Also add. Text Generation -> Long Text Generation -> Contextual Text Generation (wondering if we should have multiple subcategories here: Text Generation -> Long Text Generation -> Contextual Text Generation-> Story Generation, Text Generation -> Long Text Generation -> Contextual Text Generation->Process Description (applicable to this task, Text Generation -> Long Text Generation -> Contextual Text Generation->Templated text generation (task062)))

Would Text Generation -> Long Text Generation -> Contextual Text Generation-> Process Description be too specific that no other tasks would fall under this ("Process Description")?

swarooprm · 2021-10-26T18:04:40Z

Can we keep Text Generation -> Structured Text Generation within Text Generation -> Sentence Generation, so the new category would look like Text Generation -> Sentence Generation-> Structured Text Generation?

I think structured text generation should be its own category because sentence generation is unstructured. So let's say the task is to generate a SQL, it can be Text Generation -> Structured Text Generation -> Code. If the task is to generate a table from a piece of unstructured text, it can be Text Generation -> Structured Text Generation -> Table

Ok, makes sense.

Do we need Text Generation -> Long Text Generation -> Contextual Text Generation and Text Generation -> Long Text Generation -> Paragraph Generation both? I guess one will work (since this is long text, contextual should always be a paragraph, no?)

I think the "Contextual" in Contextual Text Generation refers to the context based on which the output text is generated. e.g. background paragraph in task 059

Yes, but do you see there can be a "Paragraph generation" task without context?
My point is: all paragraph generation tasks need to have a context (which can be as long as a background paragraph or as short as a word as in task029), so this way Text Generation -> Long Text Generation -> Contextual Text Generation is enough and we may not need Text Generation -> Long Text Generation -> Paragraph Generation as the latter is redundant for long text generation.
Now I am thinking should we remove the category Text Generation -> Long Text Generation -> Contextual Text Generation altogether, if all long text generation are contextual text generation? What do you think @ghlai9665 ?

swarooprm · 2021-10-26T23:00:28Z

task 062: we need to have a "text generation" category as well, say "text generation->long text generation-> Contextual Text Generation->templated text generation" (we need to add this category).

Why would this be "long text generation"? Most outputs seem pretty short. Btw, do you think this can be "Command Execution" as well?

What I meant was we need a text generation category here and a templated text generation subcategory (since this does not exist in the task hierarchy). I agree that inputs are relatively short, let's add text generation->sentence generation-> templated text generation?
Also "Command Execution" also fits here (good suggestion), may be we can add a new subcategory that is applicable for this task.

task 065: Add "Text Comparison"

Trying to understand why this is "Text Comparison". Is it because you're comparing Option 1 and Option 2, or you're comparing the option to the rest of the story?

Both

swarooprm · 2021-10-26T23:18:29Z

task076: Wondering if we should have a category "Programming" and this task will come under "Programming->Syntax" (the other category you have in the task is good)

I feel like "programming" is more of a domain than a task in itself

Agree, you are right.

task 077: This task is loosely associated with semantic parsing. I am fine if we keep it, however, I would say let's also create a category "Program Synthesis" which better represent this task.

I thought this task is actually best defined by semantic parsing ("the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning")? It asks you to convert the natural language to code.
Semantic Parsing is fine here as category.
What does "Program Synthesis" mean? Could you elaborate / provide an example or paper?

Program synthesis: here
Paper

I think maybe instead of Program Synethsis, we can have: Translation -> Natural Language to Natural Language Translation -> Code to Natural Language Translation -> Natural Language to Code (the applicable one for this particular task) Translation -> Code to Code

I am not in favor of adding translation, since it typically is used with converting from one language (say english) to another (say french).

task067-072: Add "Reasoning -> Abductive Reasoning" and "Textual Entailment"

What does "Textual Entailment" mean exactly? In particular, I'm not sure if we should have it for task070 (classification)

Textual Entailment: here, my point is: in order to do incorrect answering, one must be aware about the correct answers, this way the skill "textual entailment" is also necessary for task070 (we have taken similar decisions for task055 where we have added multihop reasoning for incorrect answer generation)

swarooprm · 2021-10-26T23:26:53Z

task 079: Along with the category you have, we can also have a category "Programming/Code->Strings".

I'm not sure about "Programming/Code->Strings" because it doesn't sound like a task. How about "Text Generation -> Code Execution"

Perfect, let's add that.

task 080: Also add. Text Generation -> Long Text Generation -> Contextual Text Generation (wondering if we should have multiple subcategories here: Text Generation -> Long Text Generation -> Contextual Text Generation-> Story Generation, Text Generation -> Long Text Generation -> Contextual Text Generation->Process Description (applicable to this task, Text Generation -> Long Text Generation -> Contextual Text Generation->Templated text generation (task062)))

Would Text Generation -> Long Text Generation -> Contextual Text Generation-> Process Description be too specific that no other tasks would fall under this ("Process Description")?

How about Text Generation -> Long Text Generation -> Contextual Text Generation-> Instruction generation?
My point is: the type of text generated here is different from a typical long text generation which usually involves stories, Q&A etc., here output is a set of instructions
I am also fine if we don't add anything new here.

Also, once we fix the hierarchy.md file, let's keep all categories alphabetically (by using something like this in UI or the sort() function in python). This will help other contributors to search and find categories easily.

swarooprm · 2021-10-26T23:29:06Z

There are too many comments here.
If you think it's easier/quicker to discuss over call, I am open to scheduling calls periodically to discuss task category assignments and finalize @ghlai9665

garyhlai · 2021-10-27T02:51:38Z

There are too many comments here. If you think it's easier/quicker to discuss over call, I am open to scheduling calls periodically to discuss task category assignments and finalize @ghlai9665

Yeah can you do tomorrow 5 pm pacific time? @swarooprm

swarooprm · 2021-10-27T02:55:17Z

There are too many comments here. If you think it's easier/quicker to discuss over call, I am open to scheduling calls periodically to discuss task category assignments and finalize @ghlai9665

Yeah can you do tomorrow 5 pm pacific time? @swarooprm

Yes that works, I will send you invite @ghlai9665

swarooprm · 2021-10-28T06:44:29Z

@ghlai9665 ,
term for generating natural language description of a code: code summarization
I will get back to you on "textual entailment" soon

swarooprm · 2021-10-28T19:40:29Z

@ghlai9665 , term for generating natural language description of a code: code summarization I will get back to you on "textual entailment" soon

“textual entailment” carries a broader sense of inferring conclusions.
An entailment could be through deduction, abduction, analogy, induction, etc.
So, let's remove "textual entailment" as a separate category and just keep it as a category in the chain of reasoning list.
Reasoning -> Textual Entailment->Deductive Reasoning
Reasoning -> Textual Entailment->Abductive Reasoning
Reasoning -> Textual Entailment->Analogical Reasoning
Reasoning -> Textual Entailment->Inductive Reasoning

swarooprm · 2021-10-29T02:33:35Z

Everything looks good.
One typo:
Program Synethsis--> Program Synthesis

This PR is ready to be merged once you resolve the merge conflicts (I am not able to merge it in UI, you have to do in command line) @ghlai9665

garyhlai added 13 commits October 19, 2021 17:21

add ropes dataset source link

c60969f

add domains for ropes

00577f0

add two categories & update categories for all ropes tasks

559b441

update task for bigbench

ac02311

update tasks for timetravel

fa04b7d

add domains for abductivenli tasks

74fafa8

update categories for abductivenli tasks

1ec6851

update task 073-075

02340e8

make text-generation into a big category

907989d

Merge branch 'upstream' into hierarchy-059-098

9dc50f3

temporarily remove task078 -- definition doesn't match the instances

eb39323

major reorganization of task-hierarchy

3a2589f

update task 059-084

7cba45e

garyhlai changed the title ~~Update Hierarchy for Task 059-084~~ Update Hierarchy for Task 059-081 Oct 23, 2021

danyaljj requested review from gkaramanolakis, yeganehkordi and swarooprm October 24, 2021 16:41

swarooprm mentioned this pull request Oct 25, 2021

task 078 inconsistent definitions #481

Closed

swarooprm mentioned this pull request Oct 25, 2021

Missing task numbers #482

Closed

yeganehkordi reviewed Oct 25, 2021

View reviewed changes

revision from swaroop comments

09f4cb1

kurbster mentioned this pull request Oct 26, 2021

Correcting definition for Task 078 #485

Merged

swarooprm mentioned this pull request Oct 26, 2021

Update categories, domains and addressing evaluation feedbacks for tasks 200-230 #463

Merged

update task hierarchy after discussion & all task files accordingly

514fc9a

garyhlai added 2 commits October 28, 2021 21:37

fix program synthesis typo

131f09f

Merge branch 'upstream' into hierarchy-059-084

2429307

swarooprm approved these changes Oct 29, 2021

View reviewed changes

swarooprm merged commit ca26f30 into allenai:master Oct 29, 2021

Update Hierarchy for Task 059-081 #474

Update Hierarchy for Task 059-081 #474

Conversation

garyhlai commented Oct 23, 2021 • edited Loading

swarooprm commented Oct 25, 2021 • edited Loading

swarooprm commented Oct 25, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garyhlai Oct 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swarooprm Oct 26, 2021 • edited Loading

Choose a reason for hiding this comment

garyhlai commented Oct 26, 2021

garyhlai commented Oct 26, 2021 • edited Loading

swarooprm commented Oct 26, 2021 • edited Loading

swarooprm commented Oct 26, 2021 • edited Loading

swarooprm commented Oct 26, 2021

swarooprm commented Oct 26, 2021 • edited Loading

swarooprm commented Oct 26, 2021 • edited Loading

garyhlai commented Oct 27, 2021

swarooprm commented Oct 27, 2021

swarooprm commented Oct 28, 2021 • edited Loading

swarooprm commented Oct 28, 2021

swarooprm commented Oct 29, 2021

garyhlai commented Oct 23, 2021 •

edited

Loading

swarooprm commented Oct 25, 2021 •

edited

Loading

garyhlai Oct 26, 2021 •

edited

Loading

swarooprm Oct 26, 2021 •

edited

Loading

garyhlai commented Oct 26, 2021 •

edited

Loading

swarooprm commented Oct 26, 2021 •

edited

Loading

swarooprm commented Oct 26, 2021 •

edited

Loading

swarooprm commented Oct 26, 2021 •

edited

Loading

swarooprm commented Oct 26, 2021 •

edited

Loading

swarooprm commented Oct 28, 2021 •

edited

Loading