Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task103 facts to story #34

Merged
merged 16 commits into from
Jul 24, 2021
Merged

Task103 facts to story #34

merged 16 commits into from
Jul 24, 2021

Conversation

Mihir3009
Copy link
Contributor

Added new task 103: Facts to Story. Please review.

@danyaljj
Copy link
Contributor

danyaljj commented Jul 22, 2021

Thanks for the PR!

Few very minor suggestions:

  • For "Sentence Generation": mention that the concept names are separated by "#".
  • Merge "There are 3 to 5 concepts in each concept set. " with the first sentence: "you are given concept set (with 3 to 5 concepts".
  • We need to use all the provided concepts in the output generation, right? If so, we can create a negative example that uses k-1 concepts (from a concept set of size k).
  • I see that some instances have more than one gold output (which is good!) If the dataset has more instances than 3.5k, I would suggest selecting a subset of instances that a higher number of gold outputs per instance.

Long text generation:

  • Merge "In this task, five ordered key facts are given. All the given facts are expressed in natural language." --> more concise
  • There are a bunch of <sep><cls> tokens; drop.

Please update the readme of task names as well.

@Mihir3009
Copy link
Contributor Author

I updated the task 102 file - Sentence Generation. Please review.

I have a question regarding Task 103 - Long text generation:
When you say "There are a bunch of "" tokens; drop.". Can you please explain it more? I didn’t get the point. So that I can review and resubmit.

@danyaljj
Copy link
Contributor

danyaljj commented Jul 23, 2021

When you say "There are a bunch of "" tokens; drop.". Can you please explain it more? I didn’t get the point. So that I can review and resubmit.

Sorry, I miss-typed. There are a bunch of non-natural(<cls> or other special tokens) that we should drop.

@Mihir3009
Copy link
Contributor Author

Got it.

I updated both the files according to your suggestions. Please review. Thanks!

Copy link
Contributor

@swarooprm swarooprm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Just a minor suggestion:
Change the file names that indicate the source dataset and the category e.g. if source for both tasks are 'commongen', then task name should be taskNNN_.json (e.g. 'task001_quoref_question_generation.json')

@Mihir3009
Copy link
Contributor Author

I renamed the file and uploaded the final ones.

@swarooprm
Copy link
Contributor

Accept from my side!

@danyaljj danyaljj merged commit 62e00f5 into allenai:master Jul 24, 2021
@danyaljj danyaljj mentioned this pull request Jul 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants