Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset(aslg_pc12): initial loading script #731

Merged
merged 5 commits into from
Oct 28, 2020

Conversation

AmitMY
Copy link
Contributor

@AmitMY AmitMY commented Oct 14, 2020

This contains the only current public part of this corpus.

The rest of the corpus is not yet been made public, but this sample is still being used by researchers.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this one :)

To fix the CI you'll need to add the encodings in the open calls, and also add the dummy data.

datasets/aslg_pc12/aslg_pc12.py Outdated Show resolved Hide resolved
datasets/aslg_pc12/aslg_pc12.py Outdated Show resolved Hide resolved
@AmitMY
Copy link
Contributor Author

AmitMY commented Oct 16, 2020

Thanks @lhoestq
Are there any guidelines for the dummy data?
In this particular case for example, the dataset fetches from two hardcoded URLs.
Do I just head -n 10 both files and zip them?

@lhoestq
Copy link
Member

lhoestq commented Oct 16, 2020

Thanks @lhoestq
Are there any guidelines for the dummy data?
In this particular case for example, the dataset fetches from two hardcoded URLs.
Do I just head -n 10 both files and zip them?

Yes the idea is just to have a few examples to properly test the script and make sure it keeps working in the long run.

And FYI there's a command to help you name the dummy data files correctly. More info in the documentation here

@AmitMY
Copy link
Contributor Author

AmitMY commented Oct 28, 2020

@lhoestq passes all tests

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thanks !

@lhoestq lhoestq merged commit c3b76f5 into huggingface:master Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants