-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Fix launch bounds in spatial transformer #13188
Fix launch bounds in spatial transformer #13188
Conversation
@mxnet-label-bot add [pr-awaiting-review] |
@samskalicky @access2rohit @anirudh2290 could you please review it thanks! |
@mxnet-label-bot add [Operator] @apeforest for review |
f322d69
to
c184b07
Compare
@eric-haibin-lin @anirudh2290 @samskalicky can you please review this PR? |
@ptrendx was there a corresponding issue filed for the V100 CI failure? |
It was our internal (NVIDIA's) CI failure, not external MXNet's CI, so no, there was not an issue for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ptrendx Changes look straightfoward, no issues but I dont quite understand the fix.
Is 1 thread the correct value? Why choose that value (ie. how about 2)?
Please add some comment regarding the reasoning for the 1 thread value.
Are there unit tests to check this change, how can we validate the correctness?
It is 1 block, not 1 thread. Basically what this change does is tell the compiler: I'm going to run this kernel with Actually, I just found an issue about spatial transformer giving exactly this error (that was "fixed" by changing the kernel and so luckily being below the threshold again): #11568 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samskalicky Can you check if your comment is addressed? Thanks |
@ptrendx Can you add a comment in the code to make it clear that these launch bounds are required to be set this way to avoid the "too many resources requested for launch" error? Maybe something like:
|
I added a comment with explanation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ptrendx for your help with this issue!
Is there anything else needed for this PR? |
@anirudh2290 Can you merge this PR if it looks good? |
Looks similar to a few fixes we've provided in the past when we had too many registers for a few kernels to run on a TX1. LGTM. |
* Fix launch bounds in spatial transformer * Adding explanation in comment.
* Fix launch bounds in spatial transformer * Adding explanation in comment.
Description
Without launch_bounds compiler is not required to use small enough number of registers to fit 1024 threads per block. Our internal CI with CUDA 10 build was failing on V100 because of this.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR:
Changes
Comments