Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Hierarchy for Task 059-081 #474

Merged
merged 17 commits into from
Oct 29, 2021
Merged

Conversation

garyhlai
Copy link
Contributor

@garyhlai garyhlai commented Oct 23, 2021

  • updated the domains & categories for task 059-081
  • files for task 063-064 are missing
  • I temporarily removed task 078 because the definition seems wrong (not matching with instance)
  • some nontrivial reorganization of the task hierarchy

@garyhlai garyhlai changed the title Update Hierarchy for Task 059-084 Update Hierarchy for Task 059-081 Oct 23, 2021
@swarooprm
Copy link
Contributor

swarooprm commented Oct 25, 2021

Comments regarding modification to task hierarchy: some really good additions!
Some comments:

  • With the addition ofQuestion Answering -> Multihop Question Answering, we now have some redundancies in question answering and reasoning categories, since we also have Reasoning -> Multihop Reasoning. But, I think this is fine. Just that, we should mention this upfront in the paper.

  • Can we keep Text Generation -> Structured Text Generation within Text Generation -> Sentence Generation, so the new category would look like Text Generation -> Sentence Generation-> Structured Text Generation?

  • Do we need Text Generation -> Long Text Generation -> Contextual Text Generation and Text Generation -> Long Text Generation -> Paragraph Generation both? I guess one will work (since this is long text, contextual should always be a paragraph, no?)

Regarding Domains:

  • I don't think we should have very granular domain names such as SQL etc., because this way the list will be huge. I would say let's create a domain "Program/Code" and there if you want we can have sub-categories like SQL, Stack Overflow.
    Code->language->SQL
    Code->language->Python
    Code->repo->Stackoverflow
    Code->repo->Github

  • Similarly, Iet's not make "Conceptnet" a separate domain. My suggestion to have the following category for conceptnet.
    Commonsense->Concepts and Relations Then we have to change existing categories to Commonsense -> Concepts and Relations->Social Commonsense Commonsense -> Concepts and Relations->Physical Commonsense`

@swarooprm swarooprm mentioned this pull request Oct 25, 2021
@swarooprm
Copy link
Contributor

Task category assignment also looks good, some comments:

  • task061: Add "Text Span Selection"
  • task 062: we need to have a "text generation" category as well, say "text generation->long text generation-> Contextual Text Generation->templated text generation" (we need to add this category).
  • task 065: Add "Text Comparison"
  • task067-072: Add "Reasoning -> Abductive Reasoning" and "Textual Entailment"
  • task 068: Add "Question Answering -> Incorrect Answer Generation"
  • task 069-070: Add "Text Comparison"
  • task076: Wondering if we should have a category "Programming" and this task will come under "Programming->Syntax" (the other category you have in the task is good)
  • task 077: This task is loosely associated with semantic parsing. I am fine if we keep it, however, I would say let's also create a category "Program Synthesis" which better represent this task.
  • task 079: Along with the category you have, we can also have a category "Programming/Code->Strings".
  • task 080: Also add. Text Generation -> Long Text Generation -> Contextual Text Generation (wondering if we should have multiple subcategories here: Text Generation -> Long Text Generation -> Contextual Text Generation-> Story Generation, Text Generation -> Long Text Generation -> Contextual Text Generation->Process Description (applicable to this task, Text Generation -> Long Text Generation -> Contextual Text Generation->Templated text generation (task062)))
  • task081: Also, add "Reasoning -> Spatial Reasoning" since its there in task080

Feel free to let me know if you have any questions @ghlai9665

@@ -4,10 +4,13 @@
"Daniel Khashabi"
],
"Source": [
"ropes"
"ropes (https://allennlp.org/ropes)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the link to the paper as well. https://arxiv.org/pdf/1908.05852.pdf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That link includes the dataset download link + link to the paper. I don't see any particular reason to have both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that's fine.

@@ -80,7 +80,6 @@ Name | Summary | Category | Domain | Input Language | Output Language
`task075_squad1.1_answer_generation` | Generate answers to SQuAD 1.1 questions. | Answer Generation -> Contextual Question Answering | Wikipedia | English | English
`task076_splash_correcting_sql_mistake` | Correct the mistake in a given SQL statement based on feedback. | Structured Query Generation, Text Modification
`task077_splash_explanation_to_sql` | Generate a SQL statement based on a description of what the SQL statement does. | Structured Query Generation
`task078_splash_sql_to_explanation` | Give a natural language description of what a given SQL statement does. | Structured Query Classification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we update the README based on the new categories?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a script auto_update_readme.py we can run once we finalized all the categories and such. Not modifying it for now to reduce review overhead

"Text Generation -> Long Text Generation -> Contextual Text Generation",
"Reasoning -> Causal Reasoning",
"Reasoning -> Qualitative Reasoning",
"Reasoning -> Commonsense Reasoning"
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a "Relation detection" category makes sense here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a "relation detection" task if you look at the instance outputs, even though it requires reasoning about relationships (covered by "Reasoning -> Qualitative Reasoning" and "Reasoning -> Commonsense Reasoning"). @swarooprm what you think? We probably shouldn't add categories that only describe one part of the task as it complicates training, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think "Qualitative Reasoning" already covers the relation part (similar to https://arxiv.org/pdf/1811.08048.pdf) and I suggest let's not add "relation detection" here (we have some tasks in the repo where relation needs to be detected explicitly, we can add "relation detection" there)

@@ -3,10 +3,12 @@
"Neeraj Varshney"
],
"Categories": [
"Classification"
"Classification",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not classification. It's Multiple Choice Question Answering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two are not mutually exclusive. Actually, almost all multiple choice question answering should also be classification as well. @swarooprm what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! They are similar. Although to avoid redundancy, I suggest considering classification when our classes have a definition by themselves (i.e., verification).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering how the tasks will be used (examining cross-task generalization), we are not too worried about redundancy as long as the category is apt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, redundancies are ok as long as they perfectly fit to the task, let's add Multiple Choice Question Answering also here. Good that @yeganehkordi pointed out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yeganehkordi good point bringing it up. After discussion with @swarooprm, classification: definition of classes is fixed; Multiple QA: answer options (definition of the class) varies from question to question. They're almost mutually exclusive. So in this case even if they're option 1 and option 2, their definitions are different. So this is multiple QA, not classification

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I agree.

@@ -6,7 +6,9 @@
"Abductive NLI, https://arxiv.org/pdf/1908.05739.pdf"
],
"Categories": [
"Answer Generation"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add Abductive Reasoning as well (just a guess based on the dataset name).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree!

"Reasoning -> Commonsense Reasoning",
"Reasoning -> Abductive Reasoning",
"Classification",
"Question Answering -> Incorrect Answer Generation"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added Question Answering -> Incorrect Answer Generation -> Multiple Choice Incorrect Answer Generation before. I think we need to separate incorrect answer generation tasks.
Also, to me, it's not a classification task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this suggestion. We have 2 kinds of incorrect answer generation (1) where we generate incorrect answer, (2) where we choose an incorrect answer from the given options.
Let's add Question Answering -> Incorrect Answer Generation -> Multiple Choice Incorrect Answer Generation here
Also, please change task hierarchy file that incorporated 2 types of incorrect answer

],
"Categories": [
"Incorrect Answer Generation"
"Question Answering -> Incorrect Answer Generation",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding Question Answering -> Incorrect Answer Generation -> Comprehensive Incorrect Answer Generation

Copy link
Contributor Author

@garyhlai garyhlai Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that Question Answering -> Incorrect Answer Generation can be made more granular, but I don't know what Comprehensive Incorrect Answer Generation means. Symmetrical to task080, we can change this task to Question Answering -> Incorrect Answer Generation -> Commonsense Question Answering

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could have:
Question Answering -> Incorrect Answer Generation

  • Question Answering -> Incorrect Answer Generation -> Commonsense Question Answer Generation
  • Question Answering -> Incorrect Answer Generation -> Contextual Question Answer Generation
  • Question Answering -> Incorrect Answer Generation -> Multiple Choice Answer Generation

Copy link
Contributor

@swarooprm swarooprm Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes good point.
Also, see if we need more granularities in Contextual Answer Generation here e.g. the following:
Question Answering -> Incorrect Answer Generation -> Contextual Question Answering -> Extractive
Question Answering -> Incorrect Answer Generation -> Contextual Question Answering -> Abstractive

Note that, more the depth in our hierarchy tree, better it will be for our experiments and also for our paper.

@garyhlai
Copy link
Contributor Author

Comments regarding modification to task hierarchy: some really good additions! Some comments:

  • With the addition ofQuestion Answering -> Multihop Question Answering, we now have some redundancies in question answering and reasoning categories, since we also have Reasoning -> Multihop Reasoning. But, I think this is fine. Just that, we should mention this upfront in the paper.

Yeah, I think we can err on the side of redundancies for now, especially because each task can have multiple categories.

  • Can we keep Text Generation -> Structured Text Generation within Text Generation -> Sentence Generation, so the new category would look like Text Generation -> Sentence Generation-> Structured Text Generation?

I think structured text generation should be its own category because sentence generation is unstructured. So let's say the task is to generate a SQL, it can be Text Generation -> Structured Text Generation -> Code. If the task is to generate a table from a piece of unstructured text, it can be Text Generation -> Structured Text Generation -> Table

  • Do we need Text Generation -> Long Text Generation -> Contextual Text Generation and Text Generation -> Long Text Generation -> Paragraph Generation both? I guess one will work (since this is long text, contextual should always be a paragraph, no?)

I think the "Contextual" in Contextual Text Generation refers to the context based on which the output text is generated. e.g. background paragraph in task 059

Regarding Domains:

  • I don't think we should have very granular domain names such as SQL etc., because this way the list will be huge. I would say let's create a domain "Program/Code" and there if you want we can have sub-categories like SQL, Stack Overflow.
    Code->language->SQL
    Code->language->Python
    Code->repo->Stackoverflow
    Code->repo->Github

Sounds good

  • Similarly, Iet's not make "Conceptnet" a separate domain. My suggestion to have the following category for conceptnet.
    Commonsense->Concepts and Relations Then we have to change existing categories to Commonsense -> Concepts and Relations->Social Commonsense Commonsense -> Concepts and Relations->Physical Commonsense`

Sounds good to me.

@garyhlai
Copy link
Contributor Author

garyhlai commented Oct 26, 2021

  • task 062: we need to have a "text generation" category as well, say "text generation->long text generation-> Contextual Text Generation->templated text generation" (we need to add this category).

Why would this be "long text generation"? Most outputs seem pretty short. Btw, do you think this can be "Command Execution" as well?

  • task 065: Add "Text Comparison"

Trying to understand why this is "Text Comparison". Is it because you're comparing Option 1 and Option 2, or you're comparing the option to the rest of the story?

  • task076: Wondering if we should have a category "Programming" and this task will come under "Programming->Syntax" (the other category you have in the task is good)

I feel like "programming" is more of a domain than a task in itself

  • task 077: This task is loosely associated with semantic parsing. I am fine if we keep it, however, I would say let's also create a category "Program Synthesis" which better represent this task.

I thought this task is actually best defined by semantic parsing ("the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning")? It asks you to convert the natural language to code.

What does "Program Synthesis" mean? Could you elaborate / provide an example or paper?

I think maybe instead of Program Synethsis, we can have:
Translation -> Natural Language to Natural Language
Translation -> Code to Natural Language
Translation -> Natural Language to Code (the applicable one for this particular task)
Translation -> Code to Code

task067-072: Add "Reasoning -> Abductive Reasoning" and "Textual Entailment"

What does "Textual Entailment" mean exactly? In particular, I'm not sure if we should have it for task070 (classification)

  • task 079: Along with the category you have, we can also have a category "Programming/Code->Strings".

I'm not sure about "Programming/Code->Strings" because it doesn't sound like a task. How about "Text Generation -> Code Execution"

  • task 080: Also add. Text Generation -> Long Text Generation -> Contextual Text Generation (wondering if we should have multiple subcategories here: Text Generation -> Long Text Generation -> Contextual Text Generation-> Story Generation, Text Generation -> Long Text Generation -> Contextual Text Generation->Process Description (applicable to this task, Text Generation -> Long Text Generation -> Contextual Text Generation->Templated text generation (task062)))

Would Text Generation -> Long Text Generation -> Contextual Text Generation-> Process Description be too specific that no other tasks would fall under this ("Process Description")?

@swarooprm
Copy link
Contributor

swarooprm commented Oct 26, 2021

  • Can we keep Text Generation -> Structured Text Generation within Text Generation -> Sentence Generation, so the new category would look like Text Generation -> Sentence Generation-> Structured Text Generation?

I think structured text generation should be its own category because sentence generation is unstructured. So let's say the task is to generate a SQL, it can be Text Generation -> Structured Text Generation -> Code. If the task is to generate a table from a piece of unstructured text, it can be Text Generation -> Structured Text Generation -> Table

Ok, makes sense.

  • Do we need Text Generation -> Long Text Generation -> Contextual Text Generation and Text Generation -> Long Text Generation -> Paragraph Generation both? I guess one will work (since this is long text, contextual should always be a paragraph, no?)

I think the "Contextual" in Contextual Text Generation refers to the context based on which the output text is generated. e.g. background paragraph in task 059

Yes, but do you see there can be a "Paragraph generation" task without context?
My point is: all paragraph generation tasks need to have a context (which can be as long as a background paragraph or as short as a word as in task029), so this way Text Generation -> Long Text Generation -> Contextual Text Generation is enough and we may not need Text Generation -> Long Text Generation -> Paragraph Generation as the latter is redundant for long text generation.
Now I am thinking should we remove the category Text Generation -> Long Text Generation -> Contextual Text Generation altogether, if all long text generation are contextual text generation? What do you think @ghlai9665 ?

@swarooprm
Copy link
Contributor

swarooprm commented Oct 26, 2021

  • task 062: we need to have a "text generation" category as well, say "text generation->long text generation-> Contextual Text Generation->templated text generation" (we need to add this category).

Why would this be "long text generation"? Most outputs seem pretty short. Btw, do you think this can be "Command Execution" as well?

What I meant was we need a text generation category here and a templated text generation subcategory (since this does not exist in the task hierarchy). I agree that inputs are relatively short, let's add text generation->sentence generation-> templated text generation?
Also "Command Execution" also fits here (good suggestion), may be we can add a new subcategory that is applicable for this task.

  • task 065: Add "Text Comparison"

Trying to understand why this is "Text Comparison". Is it because you're comparing Option 1 and Option 2, or you're comparing the option to the rest of the story?

Both

@swarooprm
Copy link
Contributor

  • task076: Wondering if we should have a category "Programming" and this task will come under "Programming->Syntax" (the other category you have in the task is good)

I feel like "programming" is more of a domain than a task in itself

Agree, you are right.

  • task 077: This task is loosely associated with semantic parsing. I am fine if we keep it, however, I would say let's also create a category "Program Synthesis" which better represent this task.

I thought this task is actually best defined by semantic parsing ("the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning")? It asks you to convert the natural language to code.
Semantic Parsing is fine here as category.
What does "Program Synthesis" mean? Could you elaborate / provide an example or paper?

Program synthesis: here
Paper

I think maybe instead of Program Synethsis, we can have: Translation -> Natural Language to Natural Language Translation -> Code to Natural Language Translation -> Natural Language to Code (the applicable one for this particular task) Translation -> Code to Code

I am not in favor of adding translation, since it typically is used with converting from one language (say english) to another (say french).

task067-072: Add "Reasoning -> Abductive Reasoning" and "Textual Entailment"

What does "Textual Entailment" mean exactly? In particular, I'm not sure if we should have it for task070 (classification)

Textual Entailment: here, my point is: in order to do incorrect answering, one must be aware about the correct answers, this way the skill "textual entailment" is also necessary for task070 (we have taken similar decisions for task055 where we have added multihop reasoning for incorrect answer generation)

@swarooprm
Copy link
Contributor

swarooprm commented Oct 26, 2021

  • task 079: Along with the category you have, we can also have a category "Programming/Code->Strings".

I'm not sure about "Programming/Code->Strings" because it doesn't sound like a task. How about "Text Generation -> Code Execution"

Perfect, let's add that.

  • task 080: Also add. Text Generation -> Long Text Generation -> Contextual Text Generation (wondering if we should have multiple subcategories here: Text Generation -> Long Text Generation -> Contextual Text Generation-> Story Generation, Text Generation -> Long Text Generation -> Contextual Text Generation->Process Description (applicable to this task, Text Generation -> Long Text Generation -> Contextual Text Generation->Templated text generation (task062)))

Would Text Generation -> Long Text Generation -> Contextual Text Generation-> Process Description be too specific that no other tasks would fall under this ("Process Description")?

How about Text Generation -> Long Text Generation -> Contextual Text Generation-> Instruction generation?
My point is: the type of text generated here is different from a typical long text generation which usually involves stories, Q&A etc., here output is a set of instructions
I am also fine if we don't add anything new here.

Also, once we fix the hierarchy.md file, let's keep all categories alphabetically (by using something like this in UI or the sort() function in python). This will help other contributors to search and find categories easily.

@swarooprm
Copy link
Contributor

swarooprm commented Oct 26, 2021

There are too many comments here.
If you think it's easier/quicker to discuss over call, I am open to scheduling calls periodically to discuss task category assignments and finalize @ghlai9665

@garyhlai
Copy link
Contributor Author

There are too many comments here. If you think it's easier/quicker to discuss over call, I am open to scheduling calls periodically to discuss task category assignments and finalize @ghlai9665

Yeah can you do tomorrow 5 pm pacific time? @swarooprm

@swarooprm
Copy link
Contributor

There are too many comments here. If you think it's easier/quicker to discuss over call, I am open to scheduling calls periodically to discuss task category assignments and finalize @ghlai9665

Yeah can you do tomorrow 5 pm pacific time? @swarooprm

Yes that works, I will send you invite @ghlai9665

@swarooprm
Copy link
Contributor

swarooprm commented Oct 28, 2021

@ghlai9665 ,
term for generating natural language description of a code: code summarization
I will get back to you on "textual entailment" soon

@swarooprm
Copy link
Contributor

@ghlai9665 , term for generating natural language description of a code: code summarization I will get back to you on "textual entailment" soon

“textual entailment” carries a broader sense of inferring conclusions.
An entailment could be through deduction, abduction, analogy, induction, etc.
So, let's remove "textual entailment" as a separate category and just keep it as a category in the chain of reasoning list.
Reasoning -> Textual Entailment->Deductive Reasoning
Reasoning -> Textual Entailment->Abductive Reasoning
Reasoning -> Textual Entailment->Analogical Reasoning
Reasoning -> Textual Entailment->Inductive Reasoning

@swarooprm
Copy link
Contributor

Everything looks good.
One typo:
Program Synethsis--> Program Synthesis

This PR is ready to be merged once you resolve the merge conflicts (I am not able to merge it in UI, you have to do in command line) @ghlai9665

@swarooprm swarooprm merged commit ca26f30 into allenai:master Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants