Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks 1407-1417: DART, ajgt_twitter_ar, and youtube_caption_corrections. #429

Merged
merged 4 commits into from
Oct 31, 2021

Conversation

RushangKaria
Copy link
Contributor

Created a set of tasks from the DART, ajgt_twitter_ar and youtube_caption_corrections datasets available on huggingface.

Tasks

  • DART
    • Question Generation
    • Sentence Similarity Classification
    • Text Generation
    • Relationship Extraction
    • Subject/Predicate/Object Identification
  • ajgt_twitter_ar
    • Sentiment Classification
  • youtube_captions_corrections
    • Grammar Correction
    • Incorrect Grammar Classification
    • Slot Filling
    • Slot Generation

@swarooprm swarooprm changed the title Tasks 1334-1344: DART, ajgt_twitter_ar, and youtube_caption_corrections. Tasks 1407-1417: DART, ajgt_twitter_ar, and youtube_caption_corrections. Oct 12, 2021
@danyaljj
Copy link
Contributor

Please make sure to set the "Categories": [""] and "Domains": [""] fields according to the taxonomy presented here.

@swarooprm
Copy link
Contributor

@Mehrad0711, wondering if you can help review these tasks?

@Mehrad0711
Copy link
Contributor

Things look good in general. Although, I have some suggestions:

task1334_dart_question_generation:
Definition: "specified in the at least" --> "specified in at least"
Definition: "such that a it is a well-formed question" --> "such that it is a well-formed question"

"The input is a list of triplets of the form [subject, predicate, object] and the output is a list of questions that are sentences based on the triplets but with the subject and/or object replaced with slots."
It seems all outputs are single sentences not a list? Also slots need to be defined. Do you mean blank lines by slots?

Probably add a line that says: "All input triplets need not be used in the question for it to be valid."

task1336_dart_text_generation:
Do all input triplets need to be used in the question for it to be valid? Please indicate that in the Definition.
Is a question valid if it contains more information than what is provided in the input triplets?

I noticed input triplets like this: ['[TABLECONTEXT]', '[TITLE]', 'Olivia McKoy']] Are [.] indicating special tokens or placeholders? Please describe what these are in the definition.

task1342_youtube_caption_corrections_grammar_correction:
The positive examples can be more varied. It seems all three have the same mistakes. Also in the "explanation", you may wanna describe what is fixed.
In definition, if you can add more examples of grammar mistakes that need to be fixed, it would be helpful for the model to understand what needs attention. I see in the other task you have "unknown" and "special" errors. You may add some examples for those.

task1343_youtube_caption_corrections_incorrect_grammar_classification:
Please expand on what (unknown error) and (special error) each represents. What errors classify as unknown vs special?

task1344_youtube_caption_corrections_slot_filling:
I'm not sure if this is a reasonable task. There are a lot of words that can fill out the slots and achieve the objective but output only considers one of them to be correct. We may wanna drop this task.

@swarooprm
Copy link
Contributor

@RushangKaria, could you update your files based on reviewer comments?

@RushangKaria
Copy link
Contributor Author

Hi @swarooprm

Yes, ill make the necessary changes and push a new set of changes. Thanks for the reviews!

@swarooprm
Copy link
Contributor

Hi @swarooprm

Yes, ill make the necessary changes and push a new set of changes. Thanks for the reviews!

@RushangKaria, a gentle reminder!

@RushangKaria
Copy link
Contributor Author

RushangKaria commented Oct 29, 2021

@swarooprm Sorry, got caught up in some other stuff. Should have an update soon.

@RushangKaria
Copy link
Contributor Author

Made the changes. Some files have no matching or relevant domain so I've left those unchanged (without a domain).

@Mehrad0711
Copy link
Contributor

Thanks for making the changes. LGTM now!

@aarunku5 aarunku5 merged commit fbb3c31 into allenai:master Oct 31, 2021
aarunku5 added a commit that referenced this pull request Oct 31, 2021
swarooprm pushed a commit that referenced this pull request Oct 31, 2021
@swarooprm
Copy link
Contributor

@RushangKaria, can you rename the files to 1407-1417 so that we can go ahead and merge this PR?

@RushangKaria
Copy link
Contributor Author

Hi @swarooprm,

Yes, Ill rename the files and create a new PR for it.

@RushangKaria RushangKaria deleted the dart_and_others branch November 1, 2021 23:36
@RushangKaria RushangKaria restored the dart_and_others branch November 1, 2021 23:36
RushangKaria added a commit to RushangKaria/natural-instructions-expansion that referenced this pull request Nov 1, 2021
Renamed the files in PR allenai#429 to match the reserved filenames.

This PR only changes the filenames from allenai#429.
@RushangKaria RushangKaria deleted the dart_and_others branch November 1, 2021 23:38
aarunku5 pushed a commit that referenced this pull request Nov 1, 2021
Renamed the files in PR #429 to match the reserved filenames.

This PR only changes the filenames from #429.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants