Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Speedup _contrib_index_copy #14359

Merged
merged 4 commits into from
Mar 17, 2019
Merged

Speedup _contrib_index_copy #14359

merged 4 commits into from
Mar 17, 2019

Conversation

haojin2
Copy link
Contributor

@haojin2 haojin2 commented Mar 7, 2019

Description

Re-writing the Map kernel of contrib.index_copy to speed up index_copy operator for DGL usage.
Plus a small fix to operator registration to fix problem with symbolic API.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Custom CPU kernel for best performance on CPU
  • Custom GPU kernel for best performance on GPU

Comments

benchmark results:
CPU

  • forward only (ms): 26.154723167419434->25.8462176322937 ~1x speedup
  • forward+backward (ms): 63094.7536945343->79.10973119735718 ~800x speedup
    GPU:
  • forward only (ms): 52.20915079116821->4.262630462646484 ~12.25x speedup
  • forward+backward (ms): 36445.19102573395->14.284788131713867 ~2551x speedup
import mxnet as mx

# uncomment line to use corresponding ctx
# ctx = mx.cpu()
# ctx = mx.gpu(0)

orig_row = 40000
new_row = 20000
col = 512

import random

indices = [i for i in range(orig_row)]
random.shuffle(indices)
indices = mx.nd.array(indices[0:new_row], ctx=ctx, dtype='int32')

from mxnet.test_utils import check_speed, rand_ndarray

mx_orig = rand_ndarray((orig_row, col)).as_in_context(ctx)
mx_new = rand_ndarray((new_row, col)).as_in_context(ctx)

orig = mx.sym.Variable("orig")
idx = mx.sym.Variable("idx")
new = mx.sym.Variable("new")
mx_sym = mx.sym.contrib.index_copy(old_tensor=orig, index_vector=idx, new_tensor=new)

print(check_speed(mx_sym, typ='forward', location={"orig": mx_orig, "idx": indices, "new": mx_new}, ctx=ctx, N=1000) * 1000)
print(check_speed(mx_sym, typ='whole', location={"orig": mx_orig, "idx": indices, "new": mx_new}, ctx=ctx, N=10) * 1000)

@haojin2
Copy link
Contributor Author

haojin2 commented Mar 7, 2019

@zheng-da @szha @eric-haibin-lin Please review

@vandanavk
Copy link
Contributor

@mxnet-label-bot add [Operator, pr-awaiting-review]

@marcoabreu marcoabreu added Operator pr-awaiting-review PR is waiting for code review labels Mar 7, 2019
@haojin2 haojin2 force-pushed the indexed_copy branch 5 times, most recently from befeddb to 19e00c6 Compare March 9, 2019 10:43
@wkcn
Copy link
Member

wkcn commented Mar 11, 2019

Great! Thanks for your contribution!
I have some comments, the rest LGTM: )

src/operator/contrib/index_copy.cc Show resolved Hide resolved
src/operator/contrib/index_copy.cc Outdated Show resolved Hide resolved
src/operator/contrib/index_copy.cu Outdated Show resolved Hide resolved
@wkcn
Copy link
Member

wkcn commented Mar 11, 2019

The old code supports OpReq, but it is cancelled in this PR.

src/operator/contrib/index_copy.cc Outdated Show resolved Hide resolved
@haojin2
Copy link
Contributor Author

haojin2 commented Mar 11, 2019

@wkcn I don't think ReqType really matters here, cause kAddTo for grad would hardly make any sense...

@wkcn
Copy link
Member

wkcn commented Mar 11, 2019

So I think it is better to check whether OpReq is kWrite or kInplace.

@szha
Copy link
Member

szha commented Mar 11, 2019

What if this op is used during training where gradient accumulation is used?

@haojin2
Copy link
Contributor Author

haojin2 commented Mar 12, 2019

@wkcn kWriteTo and kWriteInplace actually share the same code in KERNEL_ASSIGN: /~https://github.com/apache/incubator-mxnet/blob/master/src/operator/mxnet_op.h#L304-L319 so no need to distinguish between them, I'll add the kAddTo support as per @szha's request.

@wkcn
Copy link
Member

wkcn commented Mar 12, 2019

@haojin2 Sorry that I forgot it. After adding the kAddTo, LGTM : )

@haojin2 haojin2 force-pushed the indexed_copy branch 2 times, most recently from 0036fee to c4b4cf4 Compare March 13, 2019 22:38
@wkcn
Copy link
Member

wkcn commented Mar 14, 2019

I think we should CHECK(req[0]==kWriteTo||req[0]==kInplace); in the forward function.

Copy link
Member

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!
Hope add CHECK(req[0]==kWriteTo||req[0]==kInplace); in the forward function.

src/operator/contrib/index_copy.cc Show resolved Hide resolved
@haojin2 haojin2 force-pushed the indexed_copy branch 3 times, most recently from 4e181d9 to 0cc9d04 Compare March 15, 2019 23:20
@haojin2
Copy link
Contributor Author

haojin2 commented Mar 17, 2019

@szha @zheng-da Ready for merge I think.

Copy link
Member

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I think it is ready to merge.
Thank you for your contribution!

@szha szha merged commit 020e832 into apache:master Mar 17, 2019
vdantu pushed a commit to vdantu/incubator-mxnet that referenced this pull request Mar 31, 2019
* speedup _contrib_index_copy

* use copy in backward

* add support for kAddTo req type for grad

* change template to argument for req types
nswamy pushed a commit that referenced this pull request Apr 5, 2019
* speedup _contrib_index_copy

* use copy in backward

* add support for kAddTo req type for grad

* change template to argument for req types
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* speedup _contrib_index_copy

* use copy in backward

* add support for kAddTo req type for grad

* change template to argument for req types
@haojin2 haojin2 deleted the indexed_copy branch August 19, 2019 00:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Operator pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants