Check environment supports target device in Dataset constructor #243

oliverholworthy · 2023-03-13T09:40:35Z

Add checks of the environment to make sure the target device is supported based on the cpu parameter.

Current behaviour

cpu=False is treated as equivalent to cpu=None.

cpu=None or cpu=False
- will use GPU device if cudf available, otherwise host CPU
cpu = True
- use CPU host device

After this PR

cpu=False is changed to mean we'd like to use the GPU device only and if for some reason we can't, we'd like to know about this and why.

cpu = None
- will use GPU device if available
cpu = False
- use GPU device
- raises RuntimeError if either device not detected with pynvml or cudf is not available.
cpu = True
- use CPU host device

Motivation

Making device specified for Dataset clearer.

Introducing earlier failure in the case of missing cudf or device availability reduces the chance of subsequent later on when we try to use functions in merlin.core.dispatch that then fail with an error that may be less clear.

karlhigley · 2023-03-13T12:56:11Z

This LGTM, but wonder if it’s worth adding some tests? They wouldn’t all be able to run in every environment but between the CPU and GPU CI, we should be able to test the four (?) possibilities.

karlhigley · 2023-03-21T23:21:55Z

I can't completely tell if the downstream failures are related, but they look like they could be

oliverholworthy · 2023-03-22T11:54:05Z

I can't completely tell if the downstream failures are related, but they look like they could be

Yes, the errors in NVTabular/dataloader are related. There are two more places that need to change (both are from test code where we pass cpu=False in environment without GPUs):

karlhigley · 2023-03-22T15:19:12Z

Looks like there are still some downstream issues in NVT and the dataloaders, but less than there used to be 👍🏻

…n detected (#236) * Run tests in GPU environment with no GPUs visible * Update TensorTable tests with checks for HAS_GPU * Remove unused `_HAS_GPU` variable from `test_utils` * Wrap cupy/cudf imports in HAS_GPU check in `compat` * Update tests to use HAS_GPU from compat module * Reformat test_tensor_table.py * Move HAS_GPU import to compat module * Add pynvml dependency * Update functions in `dispatch` to not use HAS_GPU * Raise RuntimeError in Dataset if we can't run on GPU when cpu=False * Update `convert_data` to handle unavailable cudf and dask_cudf * Remove use of `HAS_GPU` from dispatch * Keep cudf and cupy values representing presence of package * Revert changes to `dataset.py`. Now part of #243 * Revert changes to `dispatch.py`. Now part of #244 * Use branch-name action for branch selection * Remove unused ref_type variable * Extend reason in `test_tensor_column.py` Co-authored-by: Karl Higley <kmhigley@gmail.com> * Extend reason in `tests/unit/table/test_tensor_column.py` Co-authored-by: Karl Higley <kmhigley@gmail.com> * Remove cudf import from compat. Now unrelated to this PR * Remove use of branch-name action. `docker` not available in runner * Add HAS_GPU checks with cupy to support env without visible devices * Correct value of empty visible devices * Update deps for GPU envs to match others * Update get_lib to account for missing visible GPU * Check HAS_GPU in `make_df` to handle visible GPU devices * Update Dataset to handle default case when no visible GPUs are found * Update fixtures to handle cudf with no visible devices * Update tests to handle case of no visible GPUs --------- Co-authored-by: Karl Higley <kmhigley@gmail.com> Co-authored-by: Karl Higley <karlb@nvidia.com>

Use HAS_GPU as part of Dataset device choice

aa9f3b1

oliverholworthy added the bug Something isn't working label Mar 13, 2023

oliverholworthy added this to the Merlin 23.03 milestone Mar 13, 2023

oliverholworthy self-assigned this Mar 13, 2023

oliverholworthy added 4 commits March 13, 2023 11:22

Update import of HAS_GPU in dataset

8ce44aa

Raise RuntimeError in Dataset if we can't run on GPU when cpu=False

db1bb98

Update error message in case of failed Dataset initialization

f55393a

Set default value of cpu to None instead of False in fixture

d51365f

oliverholworthy changed the title ~~Use HAS_GPU as part of Dataset device choice~~ Check environment supports target device in Dataset constructor Mar 13, 2023

oliverholworthy added enhancement New feature or request and removed bug Something isn't working labels Mar 13, 2023

This was referenced Mar 13, 2023

Update default value of cpu to None in dataset fixture NVIDIA-Merlin/NVTabular#1779

Merged

Update default value of cpu to None in dataset fixture NVIDIA-Merlin/systems#294

Merged

oliverholworthy mentioned this pull request Mar 13, 2023

Support Dataset cpu-mode in environment with GPUs that have not been detected #236

Merged

oliverholworthy added a commit to oliverholworthy/core that referenced this pull request Mar 13, 2023

Revert changes to dataset.py. Now part of NVIDIA-Merlin#243

2390db4

karlhigley and others added 5 commits March 15, 2023 13:04

Merge branch 'main' into dataset-cpu-detect-gpu

765f0f0

Merge branch 'main' into dataset-cpu-detect-gpu

3a68692

Merge branch 'main' into dataset-cpu-detect-gpu

e412afa

Merge branch 'main' into dataset-cpu-detect-gpu

fdfe122

Merge branch 'main' into dataset-cpu-detect-gpu

0471fb4

This was referenced Mar 22, 2023

Update default value of cpu to None in dataset fixture NVIDIA-Merlin/dataloader#114

Merged

Use None as default value of cpu in test_column_similarity NVIDIA-Merlin/NVTabular#1787

Merged

oliverholworthy mentioned this pull request Mar 22, 2023

Use None as default value of cpu in test_torch_dataloader NVIDIA-Merlin/NVTabular#1788

Merged

Add test of expected Dataset cpu argument behaviour

92a0454

Merge branch 'main' into dataset-cpu-detect-gpu

663b35a

karlhigley approved these changes Mar 22, 2023

View reviewed changes

karlhigley merged commit 9d1467c into NVIDIA-Merlin:main Mar 22, 2023

oliverholworthy mentioned this pull request Mar 29, 2023

Run with import without gpu #261

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check environment supports target device in Dataset constructor #243

Check environment supports target device in Dataset constructor #243

oliverholworthy commented Mar 13, 2023 •

edited

Loading

karlhigley commented Mar 13, 2023

karlhigley commented Mar 21, 2023

oliverholworthy commented Mar 22, 2023

karlhigley commented Mar 22, 2023

Check environment supports target device in Dataset constructor #243

Check environment supports target device in Dataset constructor #243

Conversation

oliverholworthy commented Mar 13, 2023 • edited Loading

Current behaviour

After this PR

Motivation

karlhigley commented Mar 13, 2023

karlhigley commented Mar 21, 2023

oliverholworthy commented Mar 22, 2023

karlhigley commented Mar 22, 2023

oliverholworthy commented Mar 13, 2023 •

edited

Loading