Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding adversarialQA dataset #1714

Merged
merged 4 commits into from
Jan 13, 2021

Conversation

maxbartolo
Copy link
Contributor

Adding the adversarialQA dataset (https://adversarialqa.github.io/) from Beat the AI (https://arxiv.org/abs/2002.00293)

@thomwolf
Copy link
Member

thomwolf commented Jan 8, 2021

Oh that's a really cool one, we'll review/merge it soon!

In the meantime, do you have any specific positive/negative feedback on the process of adding a datasets Max?
Did you follow the instruction in the detailed step-by-step?

@maxbartolo
Copy link
Contributor Author

Thanks Thom, been a while, hope all is well!

Yes, I followed the step by step instructions and found them pretty straightforward. The only things I wasn't sure of were what should go into the YAML tags field for the dataset card, and whether there was a list of options somewhere (maybe akin to the metrics?) of the possible supported tasks. I found the rest very intuitive and the automated metadata and dummy data generation very handy. Thanks!

@thomwolf
Copy link
Member

thomwolf commented Jan 9, 2021

Good point! pinging @yjernite here so he can improve this part!

@yjernite
Copy link
Member

yjernite commented Jan 11, 2021

@maxbartolo cool addition!

For the YAML tag, you should use the tagging app we provide to choose from a drop-down menu:
/~https://github.com/huggingface/datasets-tagging

The process is described toward the end of the step-by-step guide, do you have any suggestions for making it easier to find?

Otherwise, the dataset card is really cool, thanks for making it so complete!

@maxbartolo
Copy link
Contributor Author

maxbartolo commented Jan 11, 2021

@yjernite

Thanks, YAML tags added. I think my main issue was with the flow of the step-by-step guide. For example, the card creator is introduced in Step 4, right after creating an empty directory for your dataset. The first field it requires are the YAML tags, which (at least for me) was the last step of the process.

I'd suggest having the guide structured in the same order as the creation process. For me it was something like:

  • Step 1: Preparing your env
  • Step 2: Write the loading/processing code
  • Step 3: Automatically generate dummy data and dataset_infos.json
  • Step 4: Tag the dataset
  • Step 5: Write the dataset card using the card creator
  • Step 6: Open a Pull Request on the main HuggingFace repo and share your work!!

Thanks again!

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good to me !
Thank you for adding it :) and good dataset card as well

I just did one minor change in the dummy_data.zip files: I removed unused files to make them lighter.

@lhoestq lhoestq merged commit 0d4a686 into huggingface:master Jan 13, 2021
eusip pushed a commit to eusip/datasets that referenced this pull request Jan 21, 2021
* Adding adversarialQA dataset

* Added YAML tags

* reduce dummy_data.zip files sizes

Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants