-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check environment supports target device in Dataset constructor #243
Check environment supports target device in Dataset constructor #243
Conversation
HAS_GPU
as part of Dataset device choice
This LGTM, but wonder if it’s worth adding some tests? They wouldn’t all be able to run in every environment but between the CPU and GPU CI, we should be able to test the four (?) possibilities. |
I can't completely tell if the downstream failures are related, but they look like they could be |
Yes, the errors in NVTabular/dataloader are related. There are two more places that need to change (both are from test code where we pass cpu=False in environment without GPUs): |
Looks like there are still some downstream issues in NVT and the dataloaders, but less than there used to be 👍🏻 |
…n detected (#236) * Run tests in GPU environment with no GPUs visible * Update TensorTable tests with checks for HAS_GPU * Remove unused `_HAS_GPU` variable from `test_utils` * Wrap cupy/cudf imports in HAS_GPU check in `compat` * Update tests to use HAS_GPU from compat module * Reformat test_tensor_table.py * Move HAS_GPU import to compat module * Add pynvml dependency * Update functions in `dispatch` to not use HAS_GPU * Raise RuntimeError in Dataset if we can't run on GPU when cpu=False * Update `convert_data` to handle unavailable cudf and dask_cudf * Remove use of `HAS_GPU` from dispatch * Keep cudf and cupy values representing presence of package * Revert changes to `dataset.py`. Now part of #243 * Revert changes to `dispatch.py`. Now part of #244 * Use branch-name action for branch selection * Remove unused ref_type variable * Extend reason in `test_tensor_column.py` Co-authored-by: Karl Higley <kmhigley@gmail.com> * Extend reason in `tests/unit/table/test_tensor_column.py` Co-authored-by: Karl Higley <kmhigley@gmail.com> * Remove cudf import from compat. Now unrelated to this PR * Remove use of branch-name action. `docker` not available in runner * Add HAS_GPU checks with cupy to support env without visible devices * Correct value of empty visible devices * Update deps for GPU envs to match others * Update get_lib to account for missing visible GPU * Check HAS_GPU in `make_df` to handle visible GPU devices * Update Dataset to handle default case when no visible GPUs are found * Update fixtures to handle cudf with no visible devices * Update tests to handle case of no visible GPUs --------- Co-authored-by: Karl Higley <kmhigley@gmail.com> Co-authored-by: Karl Higley <karlb@nvidia.com>
Add checks of the environment to make sure the target device is supported based on the
cpu
parameter.Current behaviour
cpu=False
is treated as equivalent tocpu=None
.cpu=None
orcpu=False
After this PR
cpu=False
is changed to mean we'd like to use the GPU device only and if for some reason we can't, we'd like to know about this and why.Motivation
Making device specified for Dataset clearer.
Introducing earlier failure in the case of missing cudf or device availability reduces the chance of subsequent later on when we try to use functions in
merlin.core.dispatch
that then fail with an error that may be less clear.