Extend current profiler for timeline and more features. #8542

panyx0718 · 2018-02-24T08:30:04Z

Timeline has been a go-place for TF developers when doing performance profiling.
It visualizes the multi-device executions as different time-series. Arrows can be generated
for cross device data transfer. Additional features such as memory allocation/deallocations
are also very useful.

Some examples:

This is the first PR for the timeline feature.

It collects cuda kernel execution stats with user-defined names.
It stores the stats into a proto for future analysis.

Near-term next steps:

Some more clean up and collect other cuda events such as memcpy.
Generate timeline visualization with the protobuf.

chengduoZH · 2018-02-24T15:43:20Z

paddle/fluid/platform/device_tracer.cc

@@ -0,0 +1,285 @@
+/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.


2016 ==> 2018, the same below

qingqing01 · 2018-02-26T14:57:27Z

The API in /~https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/platform/cuda_profiler.h also can be used to get timeline, but only for the CUDA kernels. So a new timeline tracer for any code is good.
What is the difference between CUPTI API and NVTX API? The NVTX API seems also can get timeline, and the timeline can be visualized by NVIDIA Visual Profiler tool. It seems the PyTorch uses NVTX and the code is more simply /~https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/profiler.h

panyx0718 · 2018-02-27T00:43:33Z

For 1.
That API seems to be used to profile the time of a code block and dumpy to file? But we want to profile the whole model execution and programmatically get the execution start/end of all activities. Hence, I think it doesn't give the coverage we expect. Also, it dumps the data directly to file, but we want it to return from API.

For 2.
When I google NVTX, it directs me to https://docs.nvidia.com/gameworks/..., maybe it's designed for something else but happened to work with cuda? Plus, we don't want to depend on it's nvidia visualization tool. We want to have customized CPU and memory data. We also want to add our multi-machine information to our visualization.
In contrast, CUPTI is always shipped with cuda library. quoted: "The CUDA Profiling Tools Interface (CUPTI) enables the creation of profiling and tracing tools that target CUDA applications"

panyx0718 force-pushed the test branch 3 times, most recently from 6d85f5c to 6ceffd6 Compare February 24, 2018 13:31

chengduoZH reviewed Feb 24, 2018

View reviewed changes

chengduoZH requested review from reyoung, kuke and qingqing01 February 26, 2018 06:07

panyx0718 added 2 commits February 26, 2018 04:38

Extend current profiler for timeline and more features.

b9ec24c

Fix version date.

9bbce49

panyx0718 force-pushed the test branch from 6690ad6 to 9bbce49 Compare February 26, 2018 12:39

move test_nvprof to new location

1783ab1

panyx0718 force-pushed the test branch from 7ee7a56 to 1783ab1 Compare February 27, 2018 04:47

qingqing01 approved these changes Feb 27, 2018

View reviewed changes

panyx0718 merged commit decaad5 into PaddlePaddle:develop Feb 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend current profiler for timeline and more features. #8542

Extend current profiler for timeline and more features. #8542

panyx0718 commented Feb 24, 2018 •

edited by wangkuiyi

Loading

chengduoZH Feb 24, 2018

panyx0718 Feb 26, 2018

qingqing01 commented Feb 26, 2018 •

edited

Loading

panyx0718 commented Feb 27, 2018

		@@ -0,0 +1,285 @@
		/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.

Extend current profiler for timeline and more features. #8542

Extend current profiler for timeline and more features. #8542

Conversation

panyx0718 commented Feb 24, 2018 • edited by wangkuiyi Loading

chengduoZH Feb 24, 2018

Choose a reason for hiding this comment

panyx0718 Feb 26, 2018

Choose a reason for hiding this comment

qingqing01 commented Feb 26, 2018 • edited Loading

panyx0718 commented Feb 27, 2018

panyx0718 commented Feb 24, 2018 •

edited by wangkuiyi

Loading

qingqing01 commented Feb 26, 2018 •

edited

Loading