-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-857] Add initial NVTX profiler implementation #12328
[MXNET-857] Add initial NVTX profiler implementation #12328
Conversation
ce4b518
to
60a5e52
Compare
print(sys.executable) | ||
|
||
print(os.path.realpath(__file__)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: launch subprocess that profiles the simple forward pass of network py. Verify that ranges are collected properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todo done. Testing in CI to make sure it correctly detects ranges.
One thing that would make the profile more readable is different colors for different ranges. Here is an example how to do this: /~https://github.com/NVIDIA/DALI/blob/master/dali/common.h#L149 You could for example use the first letter of the range as a very simple hash to the list of predetermined colors (don't do anything more fancy like actual hashing of the name since that would only introduce overhead which you don't want during profiling). |
Great idea. Totally agree.
…On Fri, Aug 24, 2018, 8:46 PM Przemyslaw Tredak ***@***.***> wrote:
One thing that would make the profile more readable is different colors
for different ranges. Here is an example how to do this:
/~https://github.com/NVIDIA/DALI/blob/master/dali/common.h#L149
You could for example use the first letter of the range as a very simple
hash to the list of predetermined colors (don't do anything more fancy like
actual hashing of the name since that would only introduce overhead which
you don't want during profiling).
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#12328 (comment)>,
or mute the thread
</~https://github.com/notifications/unsubscribe-auth/AHGTE8BdwlbGFLcacaCC2pyFUX6eu1FOks5uUEobgaJpZM4WLDxa>
.
|
@mxnet-label-bot [pr-work-in-progress] |
@KellenSunderland Is the PR still WIP? |
@vandanavk Yes, I'll loop around to this once I finish some other tasks. Does it cause problems for your team to leave it in a WIP state? For me it's quite handy to be able to have WIP PRs that I can put aside and resume later. |
@KellenSunderland It doesn't cause problems for our team. But we do try to go through all the stale PRs in the repo to make sure someone is working on it and PRs are getting timely reviewed. |
Ok still planning to loop around on this after checking out some code from another implementation to see if there's any lessons to learn there. |
@KellenSunderland thanks for the contribution! Any updates on progress/ETA? |
Hey @lupesko. Sorry still in my backlog. I've reviewed a few suggestions so no longer blocked on anyone. I just need to find the time to finish this off. |
@KellenSunderland pinging for update. |
@KellenSunderland ping |
@mxnet-label-bot add [pr-awaiting-testing] |
@KellenSunderland ping for update and trigger ci. thanks! |
This will be very helpful addition to profiler utility. @KellenSunderland - looking forward for the update. |
@KellenSunderland ping for the update. Thanks |
@stu1130 @sandeep-krishnamurthy @roywei @anirudh2290 @lupesko |
b3f48b5
to
53865e1
Compare
@mxnet-label-bot add[pr-awaiting-review] |
@mxnet-label-bot remove[pr-work-in-progress] |
Still making progress on this. I believe the last TODO is either use static linking or to make sure the nvExt library is copied to the lib path for testing on windows. It's probably best for the customer to use static linking, so I'll try to implement that. Edit: I think windows support is going to be problematic. If we reference NVTX in Windows by default NVIDIA recommends shipping a the dll with the application:
I think this would be more effort than it's worth for 99% of our Windows users. On Linux the library is installed and on the path for everyone with Cuda installed, so I don't have concerns turning NVTX on by default for profiling on that platform. |
72a740e
to
379e2c8
Compare
@sandeep-krishnamurthy ready for review if you've got a second. |
b43e422
to
6d96c14
Compare
LGTM. Will this be enabled by default (e.g. via pip)? |
Yes, it'll be on by default for CUDA builds. |
@eric-haibin-lin Any other questions about this PR? I've done a bunch of perf tests offline and don't see any regressions, but don't mind doing some additional investigations if you'd like. If you're ok with the change would you be able to set this PR to approved, I'm hesitant to merge without green checkboxes :-). |
if(NVTX_FOUND) | ||
include_directories(${NVTX_INCLUDE_DIRS}) | ||
list(APPEND mxnet_LINKER_LIBS ${NVTX_LIBRARIES}) | ||
add_definitions(-DMXNET_USE_NVTX=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add support in makefile too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending @szha 's comment
@KellenSunderland any update? look forward to this feature |
These extensions mark readable ranges in the NVIDIA Visual Profiler which helps show correlations between kernel launches and graph node executions. Example shown here: https://user-images.githubusercontent.com/7443219/33946110-34296d18-e021-11e7-8d18-6d40b797405c.png The additional information enabled is in the 'Markers and Ranges' row.
This commit removes NVTX headers from the Amalgamation build process, but this is a CUDA/CMake only feature, so it's not relevant to Amalagamation builds.
6d96c14
to
4ad4cba
Compare
Thanks @eric-haibin-lin. Updated with a Makefile build. Sorry it took a while, just wanted to make sure I tested it well with that build method. |
@@ -80,6 +80,9 @@ ENABLE_CUDA_RTC = 1 | |||
# whether use CuDNN R3 library | |||
USE_CUDNN = 0 | |||
|
|||
# whether to use NVTX when profiling | |||
USE_NVTX = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KellenSunderland do you recommend enabling this in pip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did measurements with a few models and didn't see any performance deltas. I think it would be safe to enable in pip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as long as you are not running under nvprof nvtx calls are basically noops.
* [MXNET-857] Enable CUDA NVTX extensions for profiler These extensions mark readable ranges in the NVIDIA Visual Profiler which helps show correlations between kernel launches and graph node executions. Example shown here: https://user-images.githubusercontent.com/7443219/33946110-34296d18-e021-11e7-8d18-6d40b797405c.png The additional information enabled is in the 'Markers and Ranges' row. * [MXNET-857] Add initial NVTX profiler implementation This commit removes NVTX headers from the Amalgamation build process, but this is a CUDA/CMake only feature, so it's not relevant to Amalagamation builds. * [MXNET-857] Use macro for NVTX specific code * [MXNET-857] Add integration test. * Turn on NVTX by default in Unix. * Fixed typos and added NTVX info to profiler.md * Add NVTX example to profiling tutorial * Add NVTX flags for make
Description
This PR builds on-top of current profiler support to allow profiling via NVIDIA's NVTX APIs. These extensions mark readable ranges in the NVIDIA Visual Profiler which helps show correlations between kernel launches and graph node executions.
Example shown here: https://user-images.githubusercontent.com/7443219/33946110-34296d18-e021-11e7-8d18-6d40b797405c.png (The additional information enabled is in the 'Markers and Ranges' row.)
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Notes: