[FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility #20443

DickJC123 · 2021-07-12T02:52:26Z

Description

This PR makes RTC (as invoked by our Python unittests and other model scripts) work with CUDA enhanced compatibility.
As such, it is an extension of PR #19364, which brought that functionality to the C++ backend. This PR keeps test_operator_gpu.py::test_cuda_rtc from failing on systems that rely on CUDA enhanced compatibility, though those systems may not be part of upstream CI at present.

The changes of this PR are:

break off the calculation of the max supported arch into a separate function GetMaxSupportedArch(), and enhance it to use nvrtcGetSupportedArchs() if CUDA_VERSION >= 11.2
wrap GetMaxSupportedArch() as MXGetMaxSupportedArch() and add it to the C api
use MXGetMaxSupportedArch() in a newly created Python utility function get_rtc_compile_opts(ctx)
enhance test_cuda_rtc to use this new function

Our current approach to RTC in Python code, which might fail under CUDA enhanced compatibility:

module = mx.rtc.CudaModule(source)

With this PR, the new approach that succeeds under CUDA enhanced compatibility:

ctx = < some GPU context, e.g. mx.gpu(0) >
module = mx.rtc.CudaModule(source, options=get_rtc_compile_opts(ctx))

get_rtc_compile_opts() will return a list of options that is most appropriate for the system and the gpu context. Currently this is a single option of the form --gpu-architecture=compute_NN or --gpu-architecture=sm_NN as needed.

Background

Starting with CUDA 11.1, a user can accept minor release upgrades of the CUDA toolkit (potentially picking up support for a newer GPU arch) without upgrading the driver (per https://docs.nvidia.com/deploy/cuda-compatibility/index.html). In such cases, the toolkit nvrtc compile toolchain should not only compile CUDA code to PTX, but also further translate the PTX to SASS, since the driver would be unable to JIT-compile to SASS for the newer GPU arch. This is controlled by the nvrtc compiler option used: for example, to compile to SASS for the Ampere A100 the option is --gpu-architecture=sm_80. To compile only to PTX, and so rely on the driver's ability to JIT-compile to SASS, the option is --gpu-architecture=compute_80.

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
[~] All changes have test coverage [Verified privately, but ideally upstream's CI would have systems that stress this PR]
Code is well-documented

mxnet-bot · 2021-07-12T02:52:29Z

Hey @DickJC123 , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [clang, website, miscellaneous, unix-cpu, centos-cpu, windows-cpu, unix-gpu, sanity, edge, windows-gpu, centos-gpu]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

src/common/rtc.cc

ptrendx

LGTM

Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts()

eb4799f

DickJC123 requested review from eric-haibin-lin and szha as code owners July 12, 2021 02:52

mseth10 added the pr-awaiting-testing PR is reviewed and waiting CI build and test label Jul 12, 2021

DickJC123 requested a review from ptrendx July 12, 2021 02:53

mseth10 added pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Jul 12, 2021

Fix rtc options vector handling

9241deb

mseth10 added pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Jul 12, 2021

Fix get_cuda_compute_capability(ctx) on Windows

7fa6f4b

mseth10 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jul 13, 2021

ptrendx reviewed Jul 13, 2021

View reviewed changes

src/common/rtc.cc Show resolved Hide resolved

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress labels Jul 13, 2021

ptrendx approved these changes Jul 13, 2021

View reviewed changes

mseth10 added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jul 13, 2021

ptrendx merged commit 8fd17ce into apache:master Jul 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility #20443

[FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility #20443

DickJC123 commented Jul 12, 2021

mxnet-bot commented Jul 12, 2021

ptrendx left a comment

[FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility #20443

[FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility #20443

Conversation

DickJC123 commented Jul 12, 2021

Description

Background

Checklist

Essentials

mxnet-bot commented Jul 12, 2021

ptrendx left a comment

Choose a reason for hiding this comment