Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: update fetching a bucket from MinIO #1314

Merged
merged 3 commits into from
Jan 17, 2024
Merged

Fix: update fetching a bucket from MinIO #1314

merged 3 commits into from
Jan 17, 2024

Conversation

eddiebergman
Copy link
Collaborator

@eddiebergman eddiebergman commented Jan 12, 2024

Closes: #1304

Branch by @PGijsbers, just created a PR from it


Previously, each dataset had their own bucket:
https://openml1.win.tue.nl/datasets61/dataset_61.pq

But we were advised to reduce the amount of buckets and favor hosting many objects in hierarchical structure, so we now have instead some prefixes to divide up the dataset objects into separate subdirectories:

https://openml1.win.tue.nl/datasets/0000/0061/dataset_61.pq

This commit has bypassed pre-commit. Tests should be updated too.

Reference Issue

What does this PR implement/fix? Explain your changes.

How should this PR be tested?

Any other comments?

Previously, each dataset had their own bucket:
  https://openml1.win.tue.nl/datasets61/dataset_61.pq

But we were advised to reduce the amount of buckets and
favor hosting many objects in hierarchical structure, so
we now have instead some prefixes to divide up the
dataset objects into separate subdirectories:

  https://openml1.win.tue.nl/datasets/0000/0061/dataset_61.pq

This commit has bypassed pre-commit. Tests should be
updated too.
@LennartPurucker LennartPurucker self-requested a review January 12, 2024 11:13
Copy link
Contributor

@LennartPurucker LennartPurucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the tests, we assume something breaks on the server side now.

Production-based tests like test_get_data_array do not finish. Is MinIO down? @josvandervelde

@eddiebergman
Copy link
Collaborator Author

Hi @josvandervelde,

Just pinging this issue again. I reran the tests to see where it fails which you can find here: /~https://github.com/openml/openml-python/actions/runs/7527320791

If you need to run them manually, you can do so from Actions -> Test -> Run Workflow. Otherwise, committing to this branch will also trigger them :)

@eddiebergman
Copy link
Collaborator Author

They seemed to pass, I'm going to try create an empty commit to have it display here

@codecov-commenter
Copy link

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (b06ecee) 38.26% compared to head (64df490) 84.19%.
Report is 2 commits behind head on develop.

❗ Current head 64df490 differs from pull request most recent head 96b72dc. Consider uploading reports for the commit 96b72dc to get more accurate results

Files Patch % Lines
openml/_api_calls.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           develop    #1314       +/-   ##
============================================
+ Coverage    38.26%   84.19%   +45.93%     
============================================
  Files           38       38               
  Lines         5240     5239        -1     
============================================
+ Hits          2005     4411     +2406     
+ Misses        3235      828     -2407     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eddiebergman
Copy link
Collaborator Author

Seems the only issue was uploading the code coverage with a server error, otherwise this seems all good now :)

@PGijsbers
Copy link
Collaborator

Hi @LennartPurucker you still have a blocking review here. are you ok with a merge?

@LennartPurucker
Copy link
Contributor

Yup, sorry, this is fine to merge now. And we can do a release afterwards @eddiebergman

@PGijsbers PGijsbers merged commit a1cb66b into develop Jan 17, 2024
9 of 10 checks passed
@PGijsbers PGijsbers deleted the fix/new_minio branch January 17, 2024 14:26
eddiebergman added a commit that referenced this pull request Jan 18, 2024
* Update fetching a bucket from MinIO

Previously, each dataset had their own bucket:
  https://openml1.win.tue.nl/datasets61/dataset_61.pq

But we were advised to reduce the amount of buckets and
favor hosting many objects in hierarchical structure, so
we now have instead some prefixes to divide up the
dataset objects into separate subdirectories:

  https://openml1.win.tue.nl/datasets/0000/0061/dataset_61.pq

This commit has bypassed pre-commit. Tests should be
updated too.

* ci: Trigger ci

* ci: Add some files to .gitignore

---------

Co-authored-by: PGijsbers <p.gijsbers@tue.nl>
LennartPurucker added a commit that referenced this pull request Jan 18, 2024
* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/psf/black: 23.3.0 → 23.7.0](psf/black@23.3.0...23.7.0)
- [github.com/pycqa/flake8: 6.0.0 → 6.1.0](PyCQA/flake8@6.0.0...6.1.0)

* Raise correct TypeError and improve type check

* Type check with isinstance instead of type() ==

* Docker enhancement #1277 (#1278)

* Add multiple build platforms

* Automatically update Dockerhub description

* Launch Python instead of Bash by default

* Change `omlp` directory name to less cryptic `openml`

* Change directory to `openml` for running purpose of running script

For mounted scripts, instructions say to mount them to `/openml`,
so we have to `cd` before invoking `python`.

* Update readme to reflect updates (python by default, rename dirs)

* Add branch/code for doc and test examples as they are required

* Ship docker images with readme

* Only update readme on release, also try build docker on PR

* Update the toc descriptions

* fix: carefully replaced minio_url with parquet_url (#1280)

* carefully replaced minio with parquet

* fix: corrected some mistakes

* fix: restored the instances of minio

* fix: updated the documentation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add #1280

I used a `next` header instead of a specific version since we don't know if it will be 0.15.0 or 0.14.2. We can change it before the next release.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pieter Gijsbers <p.gijsbers@tue.nl>
Co-authored-by: Lennart Purucker <lennart.purucker@uni-siegen.de>

* Pytest/utils (#1269)

* Extract mocked_perform_api_call because its independent of object

* Remove _multiprocess_can_split_ as it is a nose directive

and we use pytest

* Convert test list all

* Add markers and refactor test_list_all_for_tasks for pytest

* Add cache marker

* Converted remainder of tests to pytest

* Documented remaining Attributes of classes and functions (#1283)

Add documentation and type hints for the remaining attributes of classes and functions.

---------

Co-authored-by: Lennart Purucker <contact@lennart-purucker.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [pre-commit.ci] pre-commit autoupdate (#1281)

updates:
- [github.com/psf/black: 23.7.0 → 23.11.0](psf/black@23.7.0...23.11.0)
- [github.com/pre-commit/mirrors-mypy: v1.4.1 → v1.7.0](pre-commit/mirrors-mypy@v1.4.1...v1.7.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [pre-commit.ci] pre-commit autoupdate (#1291)

updates:
- [github.com/psf/black: 23.11.0 → 23.12.1](psf/black@23.11.0...23.12.1)
- [github.com/pre-commit/mirrors-mypy: v1.7.0 → v1.8.0](pre-commit/mirrors-mypy@v1.7.0...v1.8.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Bump actions/checkout from 3 to 4 (#1284)

Bumps [actions/checkout](/~https://github.com/actions/checkout) from 3 to 4.
- [Release notes](/~https://github.com/actions/checkout/releases)
- [Changelog](/~https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v3...v4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump docker/login-action from 2 to 3 (#1285)

Bumps [docker/login-action](/~https://github.com/docker/login-action) from 2 to 3.
- [Release notes](/~https://github.com/docker/login-action/releases)
- [Commits](docker/login-action@v2...v3)

---
updated-dependencies:
- dependency-name: docker/login-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump docker/metadata-action from 4 to 5 (#1286)

Bumps [docker/metadata-action](/~https://github.com/docker/metadata-action) from 4 to 5.
- [Release notes](/~https://github.com/docker/metadata-action/releases)
- [Upgrade guide](/~https://github.com/docker/metadata-action/blob/master/UPGRADE.md)
- [Commits](docker/metadata-action@v4...v5)

---
updated-dependencies:
- dependency-name: docker/metadata-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump docker/setup-qemu-action from 2 to 3 (#1287)

Bumps [docker/setup-qemu-action](/~https://github.com/docker/setup-qemu-action) from 2 to 3.
- [Release notes](/~https://github.com/docker/setup-qemu-action/releases)
- [Commits](docker/setup-qemu-action@v2...v3)

---
updated-dependencies:
- dependency-name: docker/setup-qemu-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump docker/build-push-action from 4 to 5 (#1288)

Bumps [docker/build-push-action](/~https://github.com/docker/build-push-action) from 4 to 5.
- [Release notes](/~https://github.com/docker/build-push-action/releases)
- [Commits](docker/build-push-action@v4...v5)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add more type annotations (#1261)

* Add more type annotations

* add to progress rst

---------

Co-authored-by: Lennart Purucker <git@lennart-purucker.com>

* Bump docker/setup-buildx-action from 2 to 3 (#1292)

Bumps [docker/setup-buildx-action](/~https://github.com/docker/setup-buildx-action) from 2 to 3.
- [Release notes](/~https://github.com/docker/setup-buildx-action/releases)
- [Commits](docker/setup-buildx-action@v2...v3)

---
updated-dependencies:
- dependency-name: docker/setup-buildx-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump actions/setup-python from 4 to 5 (#1293)

Bumps [actions/setup-python](/~https://github.com/actions/setup-python) from 4 to 5.
- [Release notes](/~https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v4...v5)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Rework Tagging Tests for New Server Specification  (#1294)

* rework tagging test adjusted for new server specification

* update progress.rst

* ci: Update tooling (#1298)

* ci: Migrate everything to pyproject.toml

* style: Apply ruff fixes

* style: Apply (more) ruff fixes

* style(ruff): Add some file specific ignores

* ci: Fix build with setuptools

* ci(ruff): Allow prints in cli.py

* fix: Circular import

* test: Use raises(..., match=)

* fix(cli): re-add missing print statements

* fix: Fix some over ruff-ized code

* add make check to documentation

* ci: Change to `build` for sdist

* ci: Add ruff to `"test"` deps

---------

Co-authored-by: Lennart Purucker <git@lennart-purucker.com>

* ci: Disable 3.6 tests (#1302)

* fix: Chipping away at ruff lints (#1303)

* fix: Chipping away at ruff lints

* fix: return lockfile path

* Update openml/config.py

* Update openml/runs/functions.py

* Update openml/tasks/functions.py

* Update openml/tasks/split.py

* Update openml/utils.py

* Update openml/utils.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update openml/config.py

* Update openml/testing.py

* Update openml/utils.py

* Update openml/config.py

* Update openml/utils.py

* Update openml/utils.py

* add concurrency to workflow calls

* adjust docstring

* adjust docstring

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Lennart Purucker <contact@lennart-purucker.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Lennart Purucker <git@lennart-purucker.com>

* Tagging constraints (#1305)

* update tagging constraints

* openml python tests

* small fix

* [pre-commit.ci] pre-commit autoupdate (#1306)

updates:
- /~https://github.com/charliermarsh/ruff-pre-commit/~https://github.com/astral-sh/ruff-pre-commit
- [github.com/astral-sh/ruff-pre-commit: v0.1.5 → v0.1.11](astral-sh/ruff-pre-commit@v0.1.5...v0.1.11)
- [github.com/python-jsonschema/check-jsonschema: 0.27.1 → 0.27.3](python-jsonschema/check-jsonschema@0.27.1...0.27.3)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Linting Everything - Fix All mypy and ruff Errors (#1307)

* style: Fix linting split.py

* typing: Fix mypy errors split.py

* typing: data_feature

* typing: trace

* more linting fixes

* typing: finish up trace

* typing: config.py

* typing: More fixes on config.py

* typing: setup.py

* finalize runs linting

* typing: evaluation.py

* typing: setup

* ruff fixes across different files and mypy fixes for run files

* typing: _api_calls

* adjust setup files' linting and minor ruff changes

* typing: utils

* late night push

* typing: utils.py

* typing: tip tap tippity

* typing: mypy 78, ruff ~200

* refactor output format name and minor linting stuff

* other: midway merge

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* typing: I'm runnign out of good messages

* typing: datasets

* leinting for flows and some ruff changes

* no more mypy errors

* ruff runs and setups

* typing: Finish off mypy and ruff errors

Co-authored-by: Bilgecelik <38037323+Bilgecelik@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style: File wide ignores of PLR0913

This is because the automated pre-commit.ci bot which made automatic
commits and pushes would think the `noqa` on the individualy overloaded
functions was not needed. After removing the `noqa`, the linter then
raised the issue

---------

Co-authored-by: eddiebergman <eddiebergmanhs@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bilgecelik <38037323+Bilgecelik@users.noreply.github.com>

* ci: Remove Python 3.6/7 (#1308)

* ci: remove 3.7 patch (#1309)

* Make Test Work Again After Ruff and Linter Changes (#1310)

* mark production tests

* make production test run

* fix test bug -1/N

* add retry raise again after refactor

* fix str dict representation

* test: Fix non-writable home mocks

* testing: not not a change

* testing: trigger CI

* typing: Update typing

* ci: Update testing matrix

* testing: Fixup run flow error check

* ci: Manual dispatch, disable double testing

* ci: Prevent further ci duplication

* ci: Add concurrency checks to all

* ci: Remove the max-parallel on test ci

There are a lot less now and they cancel previous
puhes in the same pr now so it shouldn't be a problem anymore

* testing: Fix windows path generation

* add pytest for server state

* add assert cache state

* some formatting

* fix with cache fixture

* finally remove th finally

* doc: Fix link

* update test matrix

* doc: Update to just point to contributing

* add linkcheck ignore for test server

---------

Co-authored-by: eddiebergman <eddiebergmanhs@gmail.com>

* Add Feature Descriptions Rebase Clean (#1316)

Co-authored-by: Jan van Rijn <janvanrijn@gmail.com>

* Make Class Label Retrieval More Lenient (#1315)

* mark production tests

* make production test run

* fix test bug -1/N

* add retry raise again after refactor

* fix str dict representation

* test: Fix non-writable home mocks

* testing: not not a change

* testing: trigger CI

* typing: Update typing

* ci: Update testing matrix

* testing: Fixup run flow error check

* ci: Manual dispatch, disable double testing

* ci: Prevent further ci duplication

* ci: Add concurrency checks to all

* ci: Remove the max-parallel on test ci

There are a lot less now and they cancel previous
puhes in the same pr now so it shouldn't be a problem anymore

* testing: Fix windows path generation

* add pytest for server state

* add assert cache state

* some formatting

* fix with cache fixture

* finally remove th finally

* doc: Fix link

* update test matrix

* doc: Update to just point to contributing

* add linkcheck ignore for test server

* add special case for class labels that are dtype string

* fix bug and add test

* formatting

---------

Co-authored-by: eddiebergman <eddiebergmanhs@gmail.com>

* [pre-commit.ci] pre-commit autoupdate (#1318)

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/astral-sh/ruff-pre-commit: v0.1.11 → v0.1.13](astral-sh/ruff-pre-commit@v0.1.11...v0.1.13)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix: update fetching a bucket from MinIO (#1314)

* Update fetching a bucket from MinIO

Previously, each dataset had their own bucket:
  https://openml1.win.tue.nl/datasets61/dataset_61.pq

But we were advised to reduce the amount of buckets and
favor hosting many objects in hierarchical structure, so
we now have instead some prefixes to divide up the
dataset objects into separate subdirectories:

  https://openml1.win.tue.nl/datasets/0000/0061/dataset_61.pq

This commit has bypassed pre-commit. Tests should be
updated too.

* ci: Trigger ci

* ci: Add some files to .gitignore

---------

Co-authored-by: PGijsbers <p.gijsbers@tue.nl>

* Update progress.rst for minor release (#1319)

* Update progress.rst for minor release

* update version number

* fix typo

* Prepare Develop for Merge with Main (#1321)

Co-authored-by: Pieter Gijsbers <p.gijsbers@tue.nl>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Pieter Gijsbers <p.gijsbers@tue.nl>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Devansh Varshney (देवांश वार्ष्णेय) <varshney.devansh614@gmail.com>
Co-authored-by: Lennart Purucker <lennart.purucker@uni-siegen.de>
Co-authored-by: Vishal Parmar <vishalm524112@gmail.com>
Co-authored-by: Lennart Purucker <contact@lennart-purucker.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de>
Co-authored-by: Lennart Purucker <git@lennart-purucker.com>
Co-authored-by: janvanrijn <janvanrijn@gmail.com>
Co-authored-by: Lennart Purucker <purucker@cs.uni-freiburg.de>
Co-authored-by: Bilgecelik <38037323+Bilgecelik@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update downloading from MinIO to accommodate for new structure
4 participants