Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend current profiler for timeline and more features. #8542

Merged
merged 3 commits into from
Feb 27, 2018

Conversation

panyx0718
Copy link
Contributor

@panyx0718 panyx0718 commented Feb 24, 2018

Timeline has been a go-place for TF developers when doing performance profiling.
It visualizes the multi-device executions as different time-series. Arrows can be generated
for cross device data transfer. Additional features such as memory allocation/deallocations
are also very useful.

Some examples:

This is the first PR for the timeline feature.

  1. It collects cuda kernel execution stats with user-defined names.
  2. It stores the stats into a proto for future analysis.

Near-term next steps:

  1. Some more clean up and collect other cuda events such as memcpy.
  2. Generate timeline visualization with the protobuf.

@panyx0718 panyx0718 force-pushed the test branch 3 times, most recently from 6d85f5c to 6ceffd6 Compare February 24, 2018 13:31
@@ -0,0 +1,285 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2016 ==> 2018, the same below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@qingqing01
Copy link
Contributor

qingqing01 commented Feb 26, 2018

  1. The API in /~https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/platform/cuda_profiler.h also can be used to get timeline, but only for the CUDA kernels. So a new timeline tracer for any code is good.
  2. What is the difference between CUPTI API and NVTX API? The NVTX API seems also can get timeline, and the timeline can be visualized by NVIDIA Visual Profiler tool. It seems the PyTorch uses NVTX and the code is more simply /~https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/profiler.h

@panyx0718
Copy link
Contributor Author

For 1.
That API seems to be used to profile the time of a code block and dumpy to file? But we want to profile the whole model execution and programmatically get the execution start/end of all activities. Hence, I think it doesn't give the coverage we expect. Also, it dumps the data directly to file, but we want it to return from API.

For 2.
When I google NVTX, it directs me to https://docs.nvidia.com/gameworks/..., maybe it's designed for something else but happened to work with cuda? Plus, we don't want to depend on it's nvidia visualization tool. We want to have customized CPU and memory data. We also want to add our multi-machine information to our visualization.
In contrast, CUPTI is always shipped with cuda library. quoted: "The CUDA Profiling Tools Interface (CUPTI) enables the creation of profiling and tracing tools that target CUDA applications"

@panyx0718 panyx0718 merged commit decaad5 into PaddlePaddle:develop Feb 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants