This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
[FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility #20443
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR makes RTC (as invoked by our Python unittests and other model scripts) work with CUDA enhanced compatibility.
As such, it is an extension of PR #19364, which brought that functionality to the C++ backend. This PR keeps test_operator_gpu.py::test_cuda_rtc from failing on systems that rely on CUDA enhanced compatibility, though those systems may not be part of upstream CI at present.
The changes of this PR are:
Our current approach to RTC in Python code, which might fail under CUDA enhanced compatibility:
With this PR, the new approach that succeeds under CUDA enhanced compatibility:
get_rtc_compile_opts() will return a list of options that is most appropriate for the system and the gpu context. Currently this is a single option of the form
--gpu-architecture=compute_NN
or--gpu-architecture=sm_NN
as needed.Background
Starting with CUDA 11.1, a user can accept minor release upgrades of the CUDA toolkit (potentially picking up support for a newer GPU arch) without upgrading the driver (per https://docs.nvidia.com/deploy/cuda-compatibility/index.html). In such cases, the toolkit nvrtc compile toolchain should not only compile CUDA code to PTX, but also further translate the PTX to SASS, since the driver would be unable to JIT-compile to SASS for the newer GPU arch. This is controlled by the nvrtc compiler option used: for example, to compile to SASS for the Ampere A100 the option is
--gpu-architecture=sm_80
. To compile only to PTX, and so rely on the driver's ability to JIT-compile to SASS, the option is--gpu-architecture=compute_80
.Checklist
Essentials