-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Hierarchy for Task 059-081 #474
Conversation
garyhlai
commented
Oct 23, 2021
•
edited
Loading
edited
- updated the domains & categories for task 059-081
- files for task 063-064 are missing
- I temporarily removed task 078 because the definition seems wrong (not matching with instance)
- some nontrivial reorganization of the task hierarchy
Comments regarding modification to task hierarchy: some really good additions!
Regarding Domains:
|
Task category assignment also looks good, some comments:
Feel free to let me know if you have any questions @ghlai9665 |
@@ -4,10 +4,13 @@ | |||
"Daniel Khashabi" | |||
], | |||
"Source": [ | |||
"ropes" | |||
"ropes (https://allennlp.org/ropes)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the link to the paper as well. https://arxiv.org/pdf/1908.05852.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That link includes the dataset download link + link to the paper. I don't see any particular reason to have both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that's fine.
@@ -80,7 +80,6 @@ Name | Summary | Category | Domain | Input Language | Output Language | |||
`task075_squad1.1_answer_generation` | Generate answers to SQuAD 1.1 questions. | Answer Generation -> Contextual Question Answering | Wikipedia | English | English | |||
`task076_splash_correcting_sql_mistake` | Correct the mistake in a given SQL statement based on feedback. | Structured Query Generation, Text Modification | |||
`task077_splash_explanation_to_sql` | Generate a SQL statement based on a description of what the SQL statement does. | Structured Query Generation | |||
`task078_splash_sql_to_explanation` | Give a natural language description of what a given SQL statement does. | Structured Query Classification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we update the README based on the new categories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a script auto_update_readme.py
we can run once we finalized all the categories and such. Not modifying it for now to reduce review overhead
"Text Generation -> Long Text Generation -> Contextual Text Generation", | ||
"Reasoning -> Causal Reasoning", | ||
"Reasoning -> Qualitative Reasoning", | ||
"Reasoning -> Commonsense Reasoning" | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a "Relation detection" category makes sense here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a "relation detection" task if you look at the instance outputs, even though it requires reasoning about relationships (covered by "Reasoning -> Qualitative Reasoning" and "Reasoning -> Commonsense Reasoning"). @swarooprm what you think? We probably shouldn't add categories that only describe one part of the task as it complicates training, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think "Qualitative Reasoning" already covers the relation part (similar to https://arxiv.org/pdf/1811.08048.pdf) and I suggest let's not add "relation detection" here (we have some tasks in the repo where relation needs to be detected explicitly, we can add "relation detection" there)
@@ -3,10 +3,12 @@ | |||
"Neeraj Varshney" | |||
], | |||
"Categories": [ | |||
"Classification" | |||
"Classification", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not classification. It's Multiple Choice Question Answering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two are not mutually exclusive. Actually, almost all multiple choice question answering should also be classification as well. @swarooprm what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! They are similar. Although to avoid redundancy, I suggest considering classification when our classes have a definition by themselves (i.e., verification).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering how the tasks will be used (examining cross-task generalization), we are not too worried about redundancy as long as the category is apt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, redundancies are ok as long as they perfectly fit to the task, let's add Multiple Choice Question Answering also here. Good that @yeganehkordi pointed out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yeganehkordi good point bringing it up. After discussion with @swarooprm, classification: definition of classes is fixed; Multiple QA: answer options (definition of the class) varies from question to question. They're almost mutually exclusive. So in this case even if they're option 1 and option 2, their definitions are different. So this is multiple QA, not classification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I agree.
@@ -6,7 +6,9 @@ | |||
"Abductive NLI, https://arxiv.org/pdf/1908.05739.pdf" | |||
], | |||
"Categories": [ | |||
"Answer Generation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add Abductive Reasoning as well (just a guess based on the dataset name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree!
"Reasoning -> Commonsense Reasoning", | ||
"Reasoning -> Abductive Reasoning", | ||
"Classification", | ||
"Question Answering -> Incorrect Answer Generation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added Question Answering -> Incorrect Answer Generation -> Multiple Choice Incorrect Answer Generation
before. I think we need to separate incorrect answer generation tasks.
Also, to me, it's not a classification task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this suggestion. We have 2 kinds of incorrect answer generation (1) where we generate incorrect answer, (2) where we choose an incorrect answer from the given options.
Let's add Question Answering -> Incorrect Answer Generation -> Multiple Choice Incorrect Answer Generation here
Also, please change task hierarchy file that incorporated 2 types of incorrect answer
], | ||
"Categories": [ | ||
"Incorrect Answer Generation" | ||
"Question Answering -> Incorrect Answer Generation", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding Question Answering -> Incorrect Answer Generation -> Comprehensive Incorrect Answer Generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that Question Answering -> Incorrect Answer Generation
can be made more granular, but I don't know what Comprehensive Incorrect Answer Generation
means. Symmetrical to task080, we can change this task to Question Answering -> Incorrect Answer Generation -> Commonsense Question Answering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could have:
Question Answering -> Incorrect Answer Generation
- Question Answering -> Incorrect Answer Generation -> Commonsense Question Answer Generation
- Question Answering -> Incorrect Answer Generation -> Contextual Question Answer Generation
- Question Answering -> Incorrect Answer Generation -> Multiple Choice Answer Generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes good point.
Also, see if we need more granularities in Contextual Answer Generation here e.g. the following:
Question Answering -> Incorrect Answer Generation -> Contextual Question Answering -> Extractive
Question Answering -> Incorrect Answer Generation -> Contextual Question Answering -> Abstractive
Note that, more the depth in our hierarchy tree, better it will be for our experiments and also for our paper.
Yeah, I think we can err on the side of redundancies for now, especially because each task can have multiple categories.
I think structured text generation should be its own category because sentence generation is unstructured. So let's say the task is to generate a SQL, it can be
I think the "Contextual" in
Sounds good
Sounds good to me. |
Why would this be "long text generation"? Most outputs seem pretty short. Btw, do you think this can be "Command Execution" as well?
Trying to understand why this is "Text Comparison". Is it because you're comparing Option 1 and Option 2, or you're comparing the option to the rest of the story?
I feel like "programming" is more of a domain than a task in itself
I thought this task is actually best defined by semantic parsing ("the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning")? It asks you to convert the natural language to code. What does "Program Synthesis" mean? Could you elaborate / provide an example or paper? I think maybe instead of
What does "Textual Entailment" mean exactly? In particular, I'm not sure if we should have it for task070 (classification)
I'm not sure about "Programming/Code->Strings" because it doesn't sound like a task. How about "Text Generation -> Code Execution"
Would |
Ok, makes sense.
Yes, but do you see there can be a "Paragraph generation" task without context? |
What I meant was we need a text generation category here and a templated text generation subcategory (since this does not exist in the task hierarchy). I agree that inputs are relatively short, let's add text generation->sentence generation-> templated text generation?
Both |
Agree, you are right.
I am not in favor of adding translation, since it typically is used with converting from one language (say english) to another (say french).
Textual Entailment: here, my point is: in order to do incorrect answering, one must be aware about the correct answers, this way the skill "textual entailment" is also necessary for task070 (we have taken similar decisions for task055 where we have added multihop reasoning for incorrect answer generation) |
Perfect, let's add that.
How about Text Generation -> Long Text Generation -> Contextual Text Generation-> Instruction generation? Also, once we fix the hierarchy.md file, let's keep all categories alphabetically (by using something like this in UI or the sort() function in python). This will help other contributors to search and find categories easily. |
There are too many comments here. |
Yeah can you do tomorrow 5 pm pacific time? @swarooprm |
Yes that works, I will send you invite @ghlai9665 |
@ghlai9665 , |
“textual entailment” carries a broader sense of inferring conclusions. |
Everything looks good. This PR is ready to be merged once you resolve the merge conflicts (I am not able to merge it in UI, you have to do in command line) @ghlai9665 |