-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create prompt_task_map.json #737
Conversation
doc/prompt_task_map.json
Outdated
], | ||
"social_i_qa": [ | ||
"task384_socialiqa_question_classification", | ||
"task580_socialiqa_answer_generation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm ... isn't social_i_qa
a question answering task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if so, I am confused about why it is mapped to an "answer_generation" task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I mapped our tasks to the datasets. They have created the following tasks types from social_i_qa:
- answer verification (task384_socialiqa_question_classification)
- multiple-choice question answering (task580_socialiqa_answer_generation)
- contextual question answering without options (missing task)
- question generation from the given context and answer (missing task)
(They have one prompt for answer with option index and one for answer with the option string. I just add one of them as a missing task.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you share the pointers to these?
answer verification (task384_socialiqa_question_classification)
multiple-choice question answering (task580_socialiqa_answer_generation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, answer verification (task384_socialiqa_question_classification) -> data in sheet, task
multiple-choice question answering (task580_socialiqa_answer_generation) -> data in sheet, task
You can see the prompts in Hosted version of PromptSource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the prompt name and id as a key in the json files.
Also, I can change this file to a prompt to task map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the prompt name and id as a key in the json files.
Maybe I'm missing something here. Does the name of these json files tell us whether they correspond to socialiqa_question_classification
or socialiqa_question_classification
?
My understanding is that, here we have a 1-to-2 mapping (as opposed to 1-to-1 mapping). It's possible that I am missing something, in which case, help me see it! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you're right! I think I can use better architecture here.
Script gets the name of the dataset and generates a task for each prompt, and the name of the json files is based on the dataset and prompt names (e.g., task_socialiqa_Generate_answer).
The mapping is based on the datasets and our tasks and it hasn't mapped the tasks to the prompts. So, I need to add the prompt-task correspondence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the file to map the tasks to the prompt. I'll add T0p and T0pp datasets tmrw.
Hi! I am sorry for my absence @danyaljj @yeganehkordi. |
Yes, The list of the missing tasks is in the spreadsheet (based on the dataset and prompt name). |
@yeganehkordi Question: are you including only the T0 training datasets here? It seems some common datasets (e.g., COPA, BoolQ, WSC, etc.) are missing. Those are used for training T0p, T0pp as well as for evaluation. Could we add them? |
Yes, They are only T0 training datasets. I'll add T0p and T0pp training datasets. |
Thanks! And also the test tasks mentioned here. We have their mapping before in our working doc, but it's not in a prompt-to-task format. |
Will do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the organization of datasets, NI tasks and PS prompts are much more clear. I left some comments for minor fix. @Palipoor Could you work on the missing datasets?
] | ||
} | ||
], | ||
"hotpot_qa/distractor": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T0 casts hotpot_qa as a closed_book qa task. Can we add a hotpot_qa/closed_book
placeholder here, and add the task together with other missing tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate more?
They don't have any closed_book prompt. are you suggesting adding this task without prompt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems they used the kilt version of hotpotqa, according the the list here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll add it.
doc/prompt_task_map.json
Outdated
], | ||
"winogrande/winogrande_l": [ | ||
], | ||
"winogrande/winogrande_debiased": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For winogrande, could you confirm that the xl/xs/s/m/l/debiased settings only differ in the training sizes? I think we can remove the other and only keep this debiased setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Their only difference is their sizes.
Sure, will do.
] | ||
} | ||
], | ||
"super_glue/copa": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does task828_copa_commonsense_cause_effect
or task827_copa_commonsense_reasoning
correspond to this? We have been using these two tasks for our current evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a slight difference between our task and their tasks. Our task is to choose the completion which is the cause or effect of the first sentence, but they have specified that the completion should be which one of them in the instances.
So, I think our task is more general and probably more difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Are we going to add a task similar to theirs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I will.
], | ||
"wiki_hop/masked": [ | ||
], | ||
"adversarial_qa/adversarialQA": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to keep all the four settings for adversarial_qa
here. But when we add the missing tasks later, please just use this adversarialQA setting. Or, probably we can just drop the other three now to avoid confusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the setting of our task and I'm not sure if it is adversarial_qa
or not.
If we don't need all the subsets, we can change the instances of this task and just keep adversarial_qa
.
] | ||
} | ||
], | ||
"qasc": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have qasc tasks right? task040_qasc_question_generation
, task041_qasc_answer_generation
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but we don't have any shared task with them.
], | ||
"wiqa": [ | ||
], | ||
"cosmos_qa": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that you included the question generation tasks for some QA datasets. This is good because I remember the original T0 also uses such prompts. But for some QA datasets, you didn't include the question generation task (e.g., task023_cosmosqa_question_generation
can be included here for cosmos_qa
). Can we add all of them if we already have them in our current data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only added the tasks that had an equivalent prompt. In this case, we have a question generation task, but they don't have a question generation prompt for cosmos_qa
.
Merging this PR since we iterated over it several times now. If anything is missing let's address in another PR. |
Here is the map of the shared task between our tasks and training datasets of the T0 model.
They have trained the model on the 35 datasets. We have 31 shared tasks and lots of missing tasks that I've listed in the spreadsheet.
Also, for the paws, duorc, amazon_us_reviews, and hotpotqa datasets, our tasks don't have a specific subset. So, we may need to add them again.