Optimize NMS part 2 #14352

ptrendx · 2019-03-06T23:39:15Z

Description

This PR changes the batch_start calculation in the BoxNMSForward op to the custom kernel, much faster than the mshadow generated one. In MaskRCNN model it changes the runtime of that part from 20 ms to 2 us, speeding up the single GPU training by 20% in fp16 mode.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Comments

I'm pretty sure that on a CPU path a simple for loop would be much better than the mshadow generated kernel as well, but since I did not have experimental data, I did not change it. FYI @zhreshold

vandanavk · 2019-03-07T00:53:54Z

@mxnet-label-bot add [Operator, pr-awaiting-review]

arcadiaphy · 2019-03-07T06:12:58Z

src/operator/contrib/bounding_box-inl.cuh

+                                                 int num_batch) {
+  size_t tid = blockIdx.x * blockDim.x + threadIdx.x;
+  if (tid < N) {
+    const int32_t previous = tid > 0 ? __ldg(valid_batch_id + tid - 1) : -1;


Using __ldg intrinsic will fail to compile on some early cuda architectures.

It will fail on sm 3.0 and earlier (so Fermi and the first Kepler). I can put ifdef there, but do we care about those?

In Makefile, sm 30 is in KNOWN_CUDA_ARCHS.
/~https://github.com/apache/incubator-mxnet/blob/master/Makefile#L385

Then we do ;-). I will introduce the guard, thanks!

* Optimize NMS part 2 * Guarding ldg intrinsics

Optimize NMS part 2

bade7a6

marcoabreu added Operator pr-awaiting-review PR is waiting for code review labels Mar 7, 2019

zhreshold approved these changes Mar 7, 2019

View reviewed changes

zhreshold mentioned this pull request Mar 7, 2019

add backgroud class in box_nms #14058

Merged

7 tasks

arcadiaphy reviewed Mar 7, 2019

View reviewed changes

Guarding ldg intrinsics

2b970af

zhreshold merged commit 838e256 into apache:master Mar 8, 2019

vdantu pushed a commit to vdantu/incubator-mxnet that referenced this pull request Mar 31, 2019

Optimize NMS part 2 (apache#14352)

fe19790

* Optimize NMS part 2 * Guarding ldg intrinsics

nswamy pushed a commit that referenced this pull request Apr 5, 2019

Optimize NMS part 2 (#14352)

d503bb4

* Optimize NMS part 2 * Guarding ldg intrinsics

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019

Optimize NMS part 2 (apache#14352)

183d99d

* Optimize NMS part 2 * Guarding ldg intrinsics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize NMS part 2 #14352

Optimize NMS part 2 #14352

ptrendx commented Mar 6, 2019

vandanavk commented Mar 7, 2019

arcadiaphy Mar 7, 2019 •

edited

Loading

ptrendx Mar 7, 2019

arcadiaphy Mar 7, 2019

ptrendx Mar 7, 2019

Optimize NMS part 2 #14352

Optimize NMS part 2 #14352

Conversation

ptrendx commented Mar 6, 2019

Description

Checklist

Essentials

Comments

vandanavk commented Mar 7, 2019

arcadiaphy Mar 7, 2019 • edited Loading

Choose a reason for hiding this comment

ptrendx Mar 7, 2019

Choose a reason for hiding this comment

arcadiaphy Mar 7, 2019

Choose a reason for hiding this comment

ptrendx Mar 7, 2019

Choose a reason for hiding this comment

arcadiaphy Mar 7, 2019 •

edited

Loading