Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[2.0] [BACKPORT] of [1.x][FEATURE] CUDA graphs support (#19142) #20324

Merged
merged 33 commits into from
Mar 18, 2022

Conversation

DickJC123
Copy link
Contributor

This PR ports the CUDA Graphs support (by default off, enabled by MXNET_ENABLE_CUDA_GRAPHS=1) to 2.0. For gluon models with static_shape and static_alloc both true, enabling CUDA Graphs can significantly reduce CPU overheads, leading to end-to-end perf improvements. See PR #19142 for additional details.

@mxnet-bot
Copy link

Hey @DickJC123 , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [windows-gpu, edge, centos-gpu, sanity, windows-cpu, unix-gpu, website, unix-cpu, centos-cpu, miscellaneous, clang]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@mseth10 mseth10 added the pr-work-in-progress PR is still work in progress label Jun 1, 2021
@chinakook
Copy link
Contributor

@DickJC123 Hi, guru. I've tried to test your commit on my faster rcnn, and it's great. It really get 17% performance increase on inferencing, but It can't be work with multi-thread such as one model shared with multiple threads. It will eat all the GPU memory gradually.

@leezu
Copy link
Contributor

leezu commented Aug 10, 2021

For 2.0, should cuda graphs be enabled by default?

Is there any update on "I'm still exploring the ways of automatically testing newly added operators in order for the feature to be able to be on by default, but I do not consider this the scope of this PR, as v1.x branch is not really supposed to get many more operators (I will do that in the PR to master). Generally this would involve testing operators with hybridize(static_alloc=True, static_shape=True) (which generally should be tested much more as right now testing of this functionality is really limited, even though it is widely used)." @ptrendx #19142 (comment)?

@chinakook
Copy link
Contributor

@DickJC123 Hi, guru. I've tried to test your commit on my faster rcnn, and it's great. It really get 17% performance increase on inferencing, but It can't be work with multi-thread such as one model shared with multiple threads. It will eat all the GPU memory gradually.

I've tried compile mxnet with cuda 11.6 and the latest cudnn version. There are no bugs with cuda graphs.

ptrendx and others added 7 commits February 14, 2022 20:35
* Initial cherry-pick

* Store NodeAttrs in OpExecutor

* Do not allow stateful operations in CUDA graphs and provide mechanism
for marking ops as safe

* Guard against using ops with synchronization

* Cleaning

* Properly guard graphs

* Limit graphs to CUDA 10.2+

* Fix the compilation when graphs are not available

* Guarding the libcuda.so usage behind RTC compilation flag

* Document the env variables

* Add test

* Fix the test

* Use with_environment
@DickJC123 DickJC123 force-pushed the backport_cuda_graphs branch from 2755a6a to 64e8555 Compare February 15, 2022 05:05
Copy link
Member

@ptrendx ptrendx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a few questions.

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress labels Mar 15, 2022
@mseth10 mseth10 added pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Mar 16, 2022
@mseth10 mseth10 added pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Mar 18, 2022
@DickJC123
Copy link
Contributor Author

Merging now based on the approval from @ptrendx, our successful experience with CUDA Graphs on the 1.x branch, and the notion that the CUDA Graphs is off by default (enabled via MXNET_ENABLE_CUDA_GRAPHS=1).

@DickJC123 DickJC123 merged commit 5ab7f64 into apache:master Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants