Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update downloading from MinIO to accommodate for new structure #1304

Closed
PGijsbers opened this issue Jan 8, 2024 · 1 comment · Fixed by #1314
Closed

Update downloading from MinIO to accommodate for new structure #1304

PGijsbers opened this issue Jan 8, 2024 · 1 comment · Fixed by #1314
Assignees
Labels

Comments

@PGijsbers
Copy link
Collaborator

Previously, each dataset had their own bucket: https://openml1.win.tue.nl/datasets61/dataset_61.pq

But we were advised to reduce the amount of buckets and
favor hosting many objects in hierarchical structure, so
we now have instead some prefixes to divide up the
dataset objects into separate subdirectories: https://openml1.win.tue.nl/datasets/0000/0061/dataset_61.pq

I started work on it here: /~https://github.com/openml/openml-python/tree/fix/new_minio
It works but is ugly and I didn't run any tests.
Just trying to get it to work for now so Taniya and Prabhant can continue on with their deep learning integration, but we need to integrate this in the next release.

@PGijsbers
Copy link
Collaborator Author

Jos is currently working on bringing minio and parquet to the test server. I assume that will also have the "new style" buckets and object prefixes. @josvandervelde is this correct, and can you give a ping here when that is done?

@eddiebergman @LennartPurucker it would be great if one of you could pick this up when Jos has indicated there's minio/parquet on the test server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants