-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update categories and domains for tasks 029 - 058 #459
Conversation
@danyaljj could you link me to the dataset paper for task 043 and task 044? I can't seem to find a dataset named "essential" or "essential terms" anywhere. |
Oh good point. The data was extracted from the experiments in this paper: Please add the pointer to the task files. |
Looks good. Some comments:
|
Yeah, agreed that we should do something about it. There is a discussion on this here: #443 (comment) |
I don't think we should have "Question Answering -> Contextual Question Answering -> Extractive" for task031 and task032 because the output is a question, not an answer, unlike task033 (output is an answer).
While most tasks themselves are not multihop per se, they require multihop reasoning ability to carry out.
I agree that "Verification" should be made more granular, but for task 039, I think "Verification" should just be removed (I don't see how this is "verification") -- I only kept it there because I thought there was a good reason that you guys put it there. I will keep annotating the remaining tasks, keeping these discussions in mind, and massage subcategories of "Verification" accordingly. |
@danyaljj @swarooprm could you also link me to the dataset paper for task 045 - 047? (Dataset "miscellaneous") |
The other two are not tied to any particular works, unfortunately. |
What are their domains? Or do I need to look at the instances to guess? |
@swarooprm @danyaljj I updated "Verification" categories in 6445613 based on discussion #443. Some of the multirc tasks fit under these new "Verification" categories. Let me know what you think. |
I was actually referring to task029 and task030 (instead of task031 and 032). Sorry for the typo!
I am not sure. E.g. task038 is about combining 2 facts in the input, can we say it multihop?
Well, so when we created the Natural Instruction v1 dataset by covering various intermediate steps involved in a data creation process, we thought of having the 'verification' category as this is an important part of the data creation process. Also, we did not have more tasks to create a more granular category that better represent task039. But, now since we have lots of tasks and better task categories, I think it's fine to drop the 'verification' category from task039. |
So definitely not task030 for the same reasons I mentioned: output is a cloze-styled question. I had the same hesitation about task029 but decided to not put "Question Answering" because it is not answering a question from the input -- it is given a context word (not a question), and asked to generate questions (with answers attached to them).
That looks like multihop to me? What would be the definition of multihop?
Sounds good to me! |
I am fine if we don't add the categories for task029 and task030. Re: multihop: |
I am actually not sure what is the discussion. But my understanding is that:
Let me know if this helps. |
551d829
to
372c3fc
Compare
Thanks @danyaljj I think we have the same understanding. I also updated the rest of the tasks in the PR (it's no longer WIP) now. @swarooprm What do you think? Can this be merged now? |
"Scientific reasoning" is a good addition. "Text Comparison" is the appropriate category for task039.
Task hierarchy will help design our experiments and will be a very sensitive component in our paper, So I think the discussions at this point will be very helpful later :) |
doc/task-hierarchy.md
Outdated
@@ -65,6 +76,7 @@ | |||
- `Reasoning -> Reasoning with Symbols`: Tasks where symbols represent various things e.g. if X is the number of apples in the freeze today morning and Y is the number remaining after I ate a few apples, X-Y is the number of apples I ate. | |||
- `Reasoning -> Spatial Reasoning` | |||
- `Reasoning -> Temporal Reasoning` | |||
- `Reasoning -> Scientific Reasoning` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this means. Could you define it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type of reasoning required to be able to answer questions related to science
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but I am still not sure what that means.
Are you sure this is not a "domain" (rather than a reasoning type)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in order to answer a question like "Question: A student riding a bicycle observes that it moves faster on a smooth road than on a rough road. This happens because the smooth road has (A) less gravity(B) more gravity(C) less friction(D) more friction?", what do you think would be the type of reasoning, if not scientific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, talked with @swarooprm and decided that this would be Reasoning -> Qualitative Reasoning. Science would be a domain like you said (it's already included). I've updated everything and this PR should be good to go. @danyaljj what do you think?
Done.
Yeah for sure @swarooprm. I guess it's a bit hard for me to determine whether to put down a category sometimes without having gone through the modeling stage. In particular, I'm not sure how we will use the multiple categories for each task and the category hierarchy (as I understand it, for the first paper, each task only had one category and there was no "category hierarchy", so the train-valid split seemed more straightforward). More discussions about how these new categories & hierarchy will be used would be really helpful in carving out some heuristics for the annotation process. |
- `Classification -> Verification -> Sufficient Information Verification`: Verify whether a text contains sufficient information to answer a question | ||
- `Classification -> Verification -> Grammar Verification`: Verify whether a text is grammatical | ||
- `Classification -> Verification -> Relevance Verification` | ||
- `Classification -> Verification -> Answer Verification`: Verify whether a text answers the question |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
@ghlai9665 Very good question. Will you be available sometime to chat? @danyaljj also can join if he is available during that time. I think it will be easier to discuss over chat than here. |
Sure! Anytime after 5 pm California time works for me. I'm also wide open this weekend. Let me know if neither of those works. |
Let's chat at 5.30 pm PST today. I will send you invite. |
sounds good
…On Thu, Oct 21, 2021 at 3:19 PM Swaroop Mishra ***@***.***> wrote:
Let's chat at 5.30 pm PST today. I will send you invite.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#459 (comment)>,
or unsubscribe
</~https://github.com/notifications/unsubscribe-auth/AFNLHSSKKZAQILSOKGIOOL3UIBYU5ANCNFSM5GICA3LA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Looks good!
From my side, this PR is good to go after the minor fix above @danyaljj |
# Conflicts: # tasks/task044_essential_terms_identifying_essential_words.json # tasks/task045_miscellaneous_sentence_paraphrasing.json # tasks/task046_miscellaneous_question_typing.json # tasks/task048_multirc_question_generation.json # tasks/task049_multirc_questions_needed_to_answer.json # tasks/task054_multirc_write_correct_answer.json # tasks/task055_multirc_write_incorrect_answer.json # tasks/task056_multirc_classify_correct_answer.json # tasks/task057_multirc_classify_incorrect_answer.json # tasks/task058_multirc_question_answering.json
Looks great! Thanks to both of you! |
Update categories and domains for tasks 029 - 058