Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further update task domains & categories; Add Utility Scripts #436

Merged
merged 17 commits into from
Oct 18, 2021

Conversation

garyhlai
Copy link
Contributor

  • update domains and categories for tasks in mctao and cosmosqa
  • add script for auto adding domains for all task files of a given dataset
  • modify the test script to allow testing only a range of tasks (instead of ALL tasks)

README.md Outdated
@@ -68,6 +68,7 @@ We would appreciate any external contributions! 🙏
* If you're building your tasks based existing datasets and their crowdsourcing templates, see these [guidelines](doc/crowdsourcing.md).
* Add your task to [our list of tasks](tasks/README.md).
* To make sure that your addition is formatted correctly, run the tests: `> python src/test_all.py`
* To only test the formatting of a range of tasks, run `> python src/test_all.py --task <begin_task_number> <end_task_number>`. For example, running `> python src/test_all.py --task 5 10` will run the test from task005 to task010.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Make it a sub-item of the earlier item? (i.e., indent it to the right).

@@ -7,9 +7,9 @@
"mctaco"
],
"Categories": [
"Question Generation"
"Contextual Question Generation"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also involves temporal reasoing:
Reasoning -> Temporal Reasoning

@@ -7,9 +7,9 @@
"mctaco"
],
"Categories": [
"Answer Generation"
"Answer Generation -> Commonsense Question Answering"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also involves temporal reasoning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this applies to most of the tasks here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So would the "Categories" for this be

"Categories": [
"Answer Generation -> Commonsense Question Answering",
"Reasoning -> Temporal Reasoning",
"Reasoning -> Commonsense Reasoning"
]

since temporal reasoning can be considered a type of commonsense reasoning too?

What I'm trying to get at is whether some categories are mutually exclusive (e.g. "Reasoning -> Temporal Reasoning" vs. "Reasoning -> Commonsense Reasoning")?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, your list of categories make sense to me.
Basically, we should try to indicate all the category labels that apply to each task.

@@ -3038,5 +3038,8 @@
],
"Instruction_language": [
"English"
],
"Domains": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not changing its task type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be "Categories": ["Classification", "Reasoning -> Commonsense Reasoning", "Reasoning -> Logical Reasoning"]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about "Logical Reasoning" (which is meant to indicate conclusions made using logic).

@@ -7,7 +7,7 @@
"cosmosqa"
],
"Categories": [
"Question Generation"
"Contextual Question Generation"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One might say that this also involves Reasoning -> Commonsense Reasoning according to the instructions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(applies to more tasks below).

@@ -0,0 +1,48 @@
import json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is an ad hoc function (used only for this annotation), I would prefer if we don't check this in.

Copy link
Contributor Author

@garyhlai garyhlai Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it because it seems that it could be useful for other people working on #4 who would likely need it.

@garyhlai
Copy link
Contributor Author

garyhlai commented Oct 13, 2021

@danyaljj all your concerns should be addressed in my latest push. I have updated all task files accordingly. Note that I modified the task-hierarchy.md:

- Grammar Error
  -  Grammar Error -> Grammar Error Correction
  -  Grammar Error -> Grammar Error Detection

because task021 seems like a "grammar" task as opposed to simply "classification".

Do you encourage me to further modify the task-hierarchy.md when suitable, or should I keep it fixed?

@danyaljj
Copy link
Contributor

Do you encourage me to further modify the task-hierarchy.md when suitable, or should I keep it fixed?

Yes, that's alright (and somewhat encouraged)! We should continue massaging the hierarchy so that it best describes the space of our tasks.

@danyaljj
Copy link
Contributor

Overall, the changes look good to me. Leaving it for others to review, before merging it.

@swarooprm
Copy link
Contributor

swarooprm commented Oct 13, 2021

@ghlai9665 We need to have the category "Answer Generation -> Open Question Answering" for all answer generation tasks in MCTACO and CosmosQA as the answer to those questions does not lie in the associated context.

task001 and 002 (Quoref) need to have category: "Answer Generation -> Contextual Question Answering" and "Coreference -> Entity Coreference", they also can potentially have "Text Span Selection category".

I assume the domains are taken from the associated paper.

On a different note: In hindsight, I feel we missed creating some tasks e.g. MCTACO is originally an MCQ dataset, but we don't have an MCQ task associated with MCTACO. Should we have some action items there? @danyaljj

@danyaljj
Copy link
Contributor

@ghlai9665 We need to have the category "Answer Generation -> Open Question Answering" for all answer generation tasks in MCTACO and CosmosQA as the answer to those questions does not lie in the associated context.

@swarooprm I am pretty sure CosmosQA questions come with paragraphs.

On a different note: In hindsight, I feel we missed creating some tasks e.g. MCTACO is originally an MCQ dataset, but we don't have an MCQ task associated with MCTACO. Should we have some action items there? @danyaljj

Let's document this as an issue.

@swarooprm
Copy link
Contributor

swarooprm commented Oct 13, 2021

CosmosQA has context, but answer does not lie within the context (similar to MCTACO)
e.g. https://instructions.apps.allenai.org/dataset_viewer?file=subtask024_cosmosqa_answer_generation.json
Context: I was told, in person over the phone, that my shoes were on their way. They have my money. I have no shoes. Question: What may happen after the call?
Answer: I will return the shoes and get back the money.

@danyaljj
Copy link
Contributor

CosmosQA has context, but answer does not lie within the context (similar to MCTACO) e.g. https://instructions.apps.allenai.org/dataset_viewer?file=subtask024_cosmosqa_answer_generation.json Context: I was told, in person over the phone, that my shoes were on their way. They have my money. I have no shoes. Question: What may happen after the call? Answer: I will return the shoes and get back the money.

Assuming that "context" in Answer Generation -> Contextual Question Answering refers to the existence of a context paragraph, this tag certainly applies to CosmosQA.

Whether the answer is in a given context is not is a different issue (extractive vs abstractive answer generation). We can create other tags to separate these two.

@garyhlai
Copy link
Contributor Author

@ghlai9665 We need to have the category "Answer Generation -> Open Question Answering" for all answer generation tasks in MCTACO and CosmosQA as the answer to those questions does not lie in the associated context.

task001 and 002 (Quoref) need to have category: "Answer Generation -> Contextual Question Answering" and "Coreference -> Entity Coreference", they also can potentially have "Text Span Selection category".

I assume the domains are taken from the associated paper.

On a different note: In hindsight, I feel we missed creating some tasks e.g. MCTACO is originally an MCQ dataset, but we don't have an MCQ task associated with MCTACO. Should we have some action items there? @danyaljj

@swarooprm Done. What you said about task002 makes sense -- task001 is a bit different though because it's question generation. I have update all tasks accordingly.

@swarooprm
Copy link
Contributor

Looks good @ghlai9665
Based on @danyaljj's comment, here are my suggestions:

  • Let's change MCTACO and CosmosQA categories to "contextual question answering" (in place of "Open Question Answering")
  • In the task hierarchy, create two new categories
    • answer generation-->extractive
    • answer generation-->abstractive.
  • Add "answer generation-->extractive" for Quoref and "answer generation-->abstractive" for MCTACO and CosmosQA as answers to these questions can not be extracted from the context.

@garyhlai
Copy link
Contributor Author

Looks good @ghlai9665 Based on @danyaljj's comment, here are my suggestions:

  • Let's change MCTACO and CosmosQA categories to "contextual question answering" (in place of "Open Question Answering")

  • In the task hierarchy, create two new categories

    • answer generation-->extractive
    • answer generation-->abstractive.
  • Add "answer generation-->extractive" for Quoref and "answer generation-->abstractive" for MCTACO and CosmosQA as answers to these questions can not be extracted from the context.

Then it looks like every task would either be answer generation-->extractive or answer generation-->abstractive. So for task021, it's a grammar task so it would be answer generation-->extractive?

@swarooprm
Copy link
Contributor

task021 is a classification task, so no need to add this category ("answer generation-->extractive").
Any category of answer generation (answer generation--->X) should be added only to answer generation tasks. Let's not add it to other tasks such as question generation, classification etc.
Do you agree? @danyaljj

@garyhlai
Copy link
Contributor Author

@swarooprm yeah technically anything can be reformulated in the format of QA and classification can be seen as answer generation. So I'm not sure about this either @danyaljj

@@ -8,6 +8,8 @@
- `Answer Generation -> Fill in the Blank`
- `Answer Generation -> Multiple Choice Question Answering`
- `Answer Generation -> Open Question Answering`
- `Answer Generation -> Extractive`
- `Answer Generation -> Abstractive`
Copy link
Contributor

@danyaljj danyaljj Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since "extractive" and "abstractive" are only defined in the context of "contextual qa", it might make sense to make them nested:

  • Answer Generation -> Contextual Question Answering
    • Answer Generation -> Contextual Question Answering -> Extractive
    • Answer Generation -> Contextual Question Answering -> Abstractive

What do you all think? @ghlai9665 @swarooprm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, makes sense to me.

@danyaljj
Copy link
Contributor

danyaljj commented Oct 14, 2021

task021 is a classification task ...

I tend to agree w/ Swaroop on this. The task is formulated like a classification.

@danyaljj
Copy link
Contributor

Part of the confusion might be due to the naming of Answer Generation (which sounds a bit open-ended). A better naming is probably Question Answer Generation or simply Question Answering.

@garyhlai
Copy link
Contributor Author

garyhlai commented Oct 15, 2021

I feel like categories related to QA can be better structured. I think we can do something like:

  • Question Answering
    • Question Answering -> Answer Generation
      - Question Answering -> Answer Generation -> Commonsense Question Answering
      - Question Answering -> Answer Generation -> Contextual Question Answering
      - Question Answering -> Answer Generation -> Contextual Question Answering -> Extractive
      - Question Answering -> Answer Generation -> Contextual Question Answering -> Abstractive

      - Question Answering -> Answer Generation -> Fill in the Blank
      - Question Answering -> Answer Generation -> Multiple Choice Question Answering
      - Question Answering -> Answer Generation -> Open Question Answering
      - Question Answering -> Answer Generation -> Incorrect Answer Generation
    • Question Answering -> Question Generation
      - Question Answering -> Question Generation -> Contextual Question Generation
      - Question Answering -> Question Generation -> Question Composition

This should cover most of the QA-related categories, and we've agreed that classification categories would be mutually exclusive with these QA categories. What do you think? @swarooprm @danyaljj

@danyaljj
Copy link
Contributor

danyaljj commented Oct 15, 2021

I mostly like your structure, except that I wouldn't make Question Generation a sub-category of Question Answering.
How about this? And I thin you can merge Answer Generation and Question Answering.

  • Question Answering
    • Question Answering -> Commonsense Question Answering
    • Question Answering -> Contextual Question Answering
      - Question Answering -> Contextual Question Answering -> Extractive
      - Question Answering -> Contextual Question Answering -> Abstractive
    • Question Answering -> Fill in the Blank
    • Question Answering -> Multiple Choice Question Answering
    • Question Answering -> Open Question Answering
    • Question Answering -> Incorrect Answer Generation
  • Question Generation
    • Question Generation -> Contextual Question Generation
    • Question Generation -> Question Composition

I am actually not sure about the following:

we've agreed that classification categories would be mutually exclusive with these QA categories

A task might be QA and classification at the same time.

@swarooprm
Copy link
Contributor

I mostly like your structure, except that I wouldn't make Question Generation a sub-category of Question Answering. How about this? And I thin you can merge Answer Generation and Question Answering.

  • Question Answering

    • Question Answering -> Commonsense Question Answering
    • Question Answering -> Contextual Question Answering
      **- Question Answering -> Contextual Question Answering -> Extractive
      • Question Answering -> Contextual Question Answering -> Abstractive**
    • Question Answering -> Fill in the Blank
    • Question Answering -> Multiple Choice Question Answering
    • Question Answering -> Open Question Answering
    • Question Answering -> Incorrect Answer Generation
  • Question Generation

    • Question Generation -> Contextual Question Generation
    • Question Generation -> Question Composition

I am actually not sure about the following:

we've agreed that classification categories would be mutually exclusive with these QA categories

A task might be QA and classification at the same time.

Looks good to me.

@garyhlai
Copy link
Contributor Author

@danyaljj @swarooprm Task hierarchy is updated and all task files accordingly. Let me know how everything looks and I'll continue for the rest of the tasks.

@danyaljj
Copy link
Contributor

Thanks!

For task001_quoref_question_generation, we currently have:

    "Categories": [
        "Contextual Question Generation",
        "Coreference -> Entity Coreference",
        "Question Answering -> Contextual Question Answering -> Extractive"
    ],

since that is about generating questions (not answering them), we should drop "Question Answering -> Contextual Question Answering -> Extractive" and add Question Generation -> Contextual Question Generation.

@garyhlai
Copy link
Contributor Author

garyhlai commented Oct 18, 2021

Thanks!

For task001_quoref_question_generation, we currently have:

    "Categories": [
        "Contextual Question Generation",
        "Coreference -> Entity Coreference",
        "Question Answering -> Contextual Question Answering -> Extractive"
    ],

since that is about generating questions (not answering them), we should drop "Question Answering -> Contextual Question Answering -> Extractive" and add Question Generation -> Contextual Question Generation.

Good catch! I updated the test to extract the categories better from Task-Hierarchy -- now it raises a warning when it's just Contextual Question Generation instead of Question Generation -> Contextual Question Generation.

@danyaljj
Copy link
Contributor

Let's set another convention: if a category is used, let's not include its parent category. Say, the question is of type "Question Answering -> Contextual Question Answering -> Abstractive", let's not mention its parent category of "Question Answering -> Contextual Question Answering".

@garyhlai
Copy link
Contributor Author

@danyaljj You mean just in the Task-Hierarchy file or in every task file? Wouldn't it feel a little strange (and not too informative) to just say "Abstractive" in the category for the task files?

@danyaljj
Copy link
Contributor

@danyaljj You mean just in the Task-Hierarchy file or in every task file? Wouldn't it feel a little strange (and not too informative) to just say "Abstractive" in the category for the task files?

Looks like I didn't explain myself properly: What I'm saying is that, instead of writing, say:

    "Categories": [
        "Question Answering -> Contextual Question Answering, 
        "Question Answering -> Contextual Question Answering -> Extractive"
    ],

drop the parent category and write:

    "Categories": [
        "Question Answering -> Contextual Question Answering -> Extractive"
    ],

Does that make sense?

@@ -7,7 +7,6 @@
"quoref"
],
"Categories": [
"Question Answering -> Contextual Question Answering",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -7,7 +7,6 @@
"drop"
],
"Categories": [
"Question Answering",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -7,7 +7,10 @@
"mctaco"
],
"Categories": [
"Incorrect Answer Generation"
"Question Answering -> Incorrect Answer Generation",
"Question Answering -> Contextual Question Answering -> Abstractive",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be here? Didn't have it for tasks/task008_mctaco_wrong_answer_generation_transient_stationary.json

Copy link
Contributor Author

@garyhlai garyhlai Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. Just fixed. Thanks!

@danyaljj
Copy link
Contributor

Looks good to me, thanks!

Haven't heard back from @swarooprm yet and I am not sure when we will. I think we should move on with the PR. If there is anything missing, we can address it in another one.

@danyaljj danyaljj merged commit ff5f77d into allenai:master Oct 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants