-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[2.0] [BACKPORT] of [1.x][FEATURE] CUDA graphs support (#19142) #20324
Conversation
Hey @DickJC123 , Thanks for submitting the PR
CI supported jobs: [windows-gpu, edge, centos-gpu, sanity, windows-cpu, unix-gpu, website, unix-cpu, centos-cpu, miscellaneous, clang] Note: |
@DickJC123 Hi, guru. I've tried to test your commit on my faster rcnn, and it's great. It really get 17% performance increase on inferencing, but It can't be work with multi-thread such as one model shared with multiple threads. It will eat all the GPU memory gradually. |
For 2.0, should cuda graphs be enabled by default? Is there any update on "I'm still exploring the ways of automatically testing newly added operators in order for the feature to be able to be on by default, but I do not consider this the scope of this PR, as v1.x branch is not really supposed to get many more operators (I will do that in the PR to master). Generally this would involve testing operators with hybridize(static_alloc=True, static_shape=True) (which generally should be tested much more as right now testing of this functionality is really limited, even though it is widely used)." @ptrendx #19142 (comment)? |
I've tried compile mxnet with cuda 11.6 and the latest cudnn version. There are no bugs with cuda graphs. |
* Initial cherry-pick * Store NodeAttrs in OpExecutor * Do not allow stateful operations in CUDA graphs and provide mechanism for marking ops as safe * Guard against using ops with synchronization * Cleaning * Properly guard graphs * Limit graphs to CUDA 10.2+ * Fix the compilation when graphs are not available * Guarding the libcuda.so usage behind RTC compilation flag * Document the env variables * Add test * Fix the test * Use with_environment
2755a6a
to
64e8555
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left a few questions.
Merging now based on the approval from @ptrendx, our successful experience with CUDA Graphs on the 1.x branch, and the notion that the CUDA Graphs is off by default (enabled via MXNET_ENABLE_CUDA_GRAPHS=1). |
This PR ports the CUDA Graphs support (by default off, enabled by MXNET_ENABLE_CUDA_GRAPHS=1) to 2.0. For gluon models with static_shape and static_alloc both true, enabling CUDA Graphs can significantly reduce CPU overheads, leading to end-to-end perf improvements. See PR #19142 for additional details.