Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Adding sparse support to MXTensor for custom operators #17569

Merged
merged 28 commits into from
Mar 22, 2020

Conversation

guanxinq
Copy link
Contributor

@guanxinq guanxinq commented Feb 11, 2020

Description

Add support for sparse custom operators. It will support row sparse and CSR formats.
This is a continuation of custom operators project, initial CPU support is implemented here: #15921 and GPU support is implemented here: #17270 .

  • Added the MXSparse structure. The data_ptr of MXTensor points to an object of this structure for sparse tensors and point to data for dense tensors.
  • Added data storage type enum for MXTensor. It supports dense, row sparse and CSR formats.
  • Added alloc_sparse() function within OpResource which fixed the sparse output size issue at run time.
  • Added inferSType() function which enables customized storage type inference implementation.
  • Created new example libraries in "example/extensions/lib_custom_op" for CSR and row sparse tensor transpose operators.

Design

The function alloc_sparse() in lower level call function CheckAndAlloc(). To call this member function of NDArray, we added lambda functions just as what we did for alloc_cpu().

auto sparse_alloc = [&](int index, int indices_len, int idxptr_len,
                      void** data, int64_t** indices, int64_t** indptr) {
  // Row Sparse
  if(idxptr_len == 0) {
    outputs[index].CheckAndAlloc({mshadow::Shape1(indices_len)});
    *data = outputs[index].data().dptr_;
    *indices = (int64_t*)outputs[index].aux_data(rowsparse::kIdx).dptr_;
  }
  // CSR
  else {
    outputs[index].CheckAndAlloc({mshadow::Shape1(idxptr_len), mshadow::Shape1(indices_len)});
    *data = outputs[index].data().dptr_;
    *indices = (int64_t*)outputs[index].aux_data(csr::kIdx).dptr_;
    *indptr = (int64_t*)outputs[index].aux_data(csr::kIndPtr).dptr_;
  }
};
typedef decltype(sparse_alloc) alloc_type_sparse;
auto sparse_malloc = [](void* _sparse_alloc, int index, int indices_len, int idxptr_len,
                       void** data, int64_t** indices, int64_t** indptr) {
  alloc_type_sparse* sparsealloc = static_cast<alloc_type_sparse*>(_sparse_alloc);
  (*sparsealloc)(index, indices_len, idxptr_len, data, indices, indptr);
};

The lambda function could be called by alloc_sparse() within OpResource.

void alloc_sparse(MXSparse* sparse, int index, int indices_len, int indptr_len = 0) {
  sparse_malloc(sparse_alloc, index, indices_len, indptr_len,
               &(sparse->data), &(sparse->indices), &(sparse->indptr));
}

In the customized implementation, users are able to set output tensor size by

MXReturnValue forward(std::map<std::string, std::string> attrs,
                      std::vector<MXTensor> inputs,
                      std::vector<MXTensor> outputs,
                      OpResource res) {
  // implementation of operators.
  MXSparse*  ptr = outputs[index].data<MXSparse>();
  res.alloc_sparse(ptr, index , indices_len, indptr_len);
  // implementation of operators reply on output size.
  return MX_SUCCESS;
}

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@guanxinq guanxinq changed the title [WIP] Support sparse custom operator [WIP] Adding sparse support to MXTensor for custom operators Feb 13, 2020
Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look forward to the complete API with example and documentation :)

include/mxnet/lib_api.h Outdated Show resolved Hide resolved
src/c_api/c_api.cc Outdated Show resolved Hide resolved
include/mxnet/lib_api.h Outdated Show resolved Hide resolved
Copy link
Contributor

@mseth10 mseth10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @guanxinq . Left a few comments.

example/extensions/lib_custom_op/transposecsr_lib.cc Outdated Show resolved Hide resolved
example/extensions/lib_custom_op/transposecsr_lib.cc Outdated Show resolved Hide resolved
include/mxnet/lib_api.h Show resolved Hide resolved
include/mxnet/lib_api.h Show resolved Hide resolved
Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u resolve conflicts ?

@guanxinq guanxinq force-pushed the sparseCustomOps branch 2 times, most recently from fc65f7d to 79d7d64 Compare March 19, 2020 03:34
@guanxinq guanxinq force-pushed the sparseCustomOps branch 2 times, most recently from efd3dbb to cafd3a3 Compare March 19, 2020 23:17
@samskalicky
Copy link
Contributor

can you update MX_LIBRARY_VERSION to 5?

Copy link
Contributor

@rondogency rondogency left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the contribution!

@guanxinq
Copy link
Contributor Author

can you update MX_LIBRARY_VERSION to 5?

Updated.

@rondogency
Copy link
Contributor

@wkcn @eric-haibin-lin this PR is ready to merge!

Copy link
Member

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for the contribution!

Copy link
Contributor

@mseth10 mseth10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wkcn wkcn added the pr-awaiting-merge Review and CI is complete. Ready to Merge label Mar 21, 2020
@wkcn wkcn merged commit f01dc80 into apache:master Mar 22, 2020
anirudh2290 added a commit to anirudh2290/mxnet that referenced this pull request Mar 27, 2020
* 'master' of /~https://github.com/apache/incubator-mxnet: (192 commits)
  * impl - FFI for np einsum (apache#17869)
  [Numpy] FFI for diag/diagonal/diag_indices_from (apache#17789)
  [Numpy] Kron operator (apache#17323)
  cmake: Set DMLC_LOG_FATAL_THROW only for building mxnet and not for tvm (apache#17878)
  Add simplified HybridBlock.forward without F (apache#17530)
  Use FP32 copy of weights for norm (multitensor LAMB optimizer) (apache#17700)
  Use multi-tensor sumSQ in clip_global_norm (apache#17652)
  [Numpy] Add op fmax, fmin, fmod (apache#17567)
  Adding sparse support to MXTensor for custom operators (apache#17569)
  Update 3rdparty/mkldnn to v1.2.2 (apache#17313)
  Dynamic subgraph compile support (apache#17623)
  Refactor cpp-package CMakeLists.txt & add missing inference/imagenet_inference (apache#17835)
  staticbuild: Fix potential user-assisted execution of arbitrary code  (apache#17860)
  * FFI for np.argmax and np.argmin (apache#17843)
  ffi for roll/rot90 (apache#17861)
  Skip test_multi_worker_dataloader_release_pool on OS X (apache#17797)
  add ffi for full_like, binary (apache#17811)
  HybridBlock.export() to return created filenames (apache#17758)
  Fix SoftReLU fused operator numerical stability (apache#17849)
  CI: Test clang10 cpu & gpu builds with -WError (apache#17830)
  ...
MoisesHer pushed a commit to MoisesHer/incubator-mxnet that referenced this pull request Apr 10, 2020
* Added enum for sparse storage

* Add structure for Dense and Sparse

* redesign the data structure for MXSparse

* pull out aux data from sparse NDArray

* Added more sparse arguments to API interface

* Passed sparse from c_api to lib_api.h and set in MXTensor

* Fix indent

* fix segfault

* Fix NDArray to MXTensor errors

* Add a sample of sparse(CSR) transpose

* Make CSR transpose temporarily work by hardcoding

* Fixed sparse output size(Refined)

* Add tests for symbolic and stateful ops

* Added a sample for row sparse transpose

* Added real row sparse transpose

* Fix output size issue by adding lambda for CheckAndAlloc()

* Fix mixed storage formats error

* Added infer storage type function

* resolve comments

* Set inferSType as optional function

* Resolve comments

* Add error messages

* Resolve comments

* verify transpose ops results

* fix sanity check

* update MX_LIBRARY_VERSION to 5
samskalicky pushed a commit to samskalicky/incubator-mxnet that referenced this pull request Apr 15, 2020
* Added enum for sparse storage

* Add structure for Dense and Sparse

* redesign the data structure for MXSparse

* pull out aux data from sparse NDArray

* Added more sparse arguments to API interface

* Passed sparse from c_api to lib_api.h and set in MXTensor

* Fix indent

* fix segfault

* Fix NDArray to MXTensor errors

* Add a sample of sparse(CSR) transpose

* Make CSR transpose temporarily work by hardcoding

* Fixed sparse output size(Refined)

* Add tests for symbolic and stateful ops

* Added a sample for row sparse transpose

* Added real row sparse transpose

* Fix output size issue by adding lambda for CheckAndAlloc()

* Fix mixed storage formats error

* Added infer storage type function

* resolve comments

* Set inferSType as optional function

* Resolve comments

* Add error messages

* Resolve comments

* verify transpose ops results

* fix sanity check

* update MX_LIBRARY_VERSION to 5
pengzhao-intel pushed a commit that referenced this pull request Apr 16, 2020
…18069)

* Dynamic subgraph compile support (#17623)

This PR adds support for passing the NDArrays from the existing optimize_for API down to the reviewSubgraph function in an external library. It also adds a new API for HybridBlock called optimize_for that can partition the model without running a forward pass.

Feature changes

    Adds new API to HybridBlock optimize_for that partitions the model but does not call the cachedOp
    Modifies the subgraph library example to optionally require args to be provided
    Adds annotation on subgraph inputs for the name of the original param so that inputs can be mapped and passes annotations to input nodes of subgraphs
    Adds support for tensors in MKLDNN format, calls Reorder2Default

New tests

    Adds a new test to partition operators that directly consume params
    add a new model to test where ops to be partitioned have args/params

Bug Fixes

    fixes bug in passing ids vector by value instead of by reference
    fixes bug in passing copies of attributes instead of by reference
    fixes bug where _cached_graph was not updated after partitioning
    fixes memory leak where user-specified attributes on subgraph ops were not freed if subgraph was rejected
    fixes problem incorrectly indexing into shape/dtype maps when annotating the graph

Docs

    Updates the README doc with the latest changes described above

* Adding sparse support to MXTensor for custom operators (#17569)

* Added enum for sparse storage

* Add structure for Dense and Sparse

* redesign the data structure for MXSparse

* pull out aux data from sparse NDArray

* Added more sparse arguments to API interface

* Passed sparse from c_api to lib_api.h and set in MXTensor

* Fix indent

* fix segfault

* Fix NDArray to MXTensor errors

* Add a sample of sparse(CSR) transpose

* Make CSR transpose temporarily work by hardcoding

* Fixed sparse output size(Refined)

* Add tests for symbolic and stateful ops

* Added a sample for row sparse transpose

* Added real row sparse transpose

* Fix output size issue by adding lambda for CheckAndAlloc()

* Fix mixed storage formats error

* Added infer storage type function

* resolve comments

* Set inferSType as optional function

* Resolve comments

* Add error messages

* Resolve comments

* verify transpose ops results

* fix sanity check

* update MX_LIBRARY_VERSION to 5

* Custom Operator Random Number Generator Support (#17762)

Add random number generator support for custom operator libraries.

Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow:

mx.random.seed(128)
r1 = mx.nd.some_custom_random_op(data)
mx.random.seed(128)
r2 = mx.nd.some_custom_random_op(data)
assert (r1 == r2)

This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet

This is a continuation of the custom operator project #15921 and #17270

Co-authored-by: guanxinq <58794120+guanxinq@users.noreply.github.com>
Co-authored-by: Ziyi Mu <ziyi.mu@columbia.edu>
pengzhao-intel pushed a commit that referenced this pull request Apr 16, 2020
* Dynamic subgraph compile support (#17623)

This PR adds support for passing the NDArrays from the existing optimize_for API down to the reviewSubgraph function in an external library. It also adds a new API for HybridBlock called optimize_for that can partition the model without running a forward pass.

Feature changes

    Adds new API to HybridBlock optimize_for that partitions the model but does not call the cachedOp
    Modifies the subgraph library example to optionally require args to be provided
    Adds annotation on subgraph inputs for the name of the original param so that inputs can be mapped and passes annotations to input nodes of subgraphs
    Adds support for tensors in MKLDNN format, calls Reorder2Default

New tests

    Adds a new test to partition operators that directly consume params
    add a new model to test where ops to be partitioned have args/params

Bug Fixes

    fixes bug in passing ids vector by value instead of by reference
    fixes bug in passing copies of attributes instead of by reference
    fixes bug where _cached_graph was not updated after partitioning
    fixes memory leak where user-specified attributes on subgraph ops were not freed if subgraph was rejected
    fixes problem incorrectly indexing into shape/dtype maps when annotating the graph

Docs

    Updates the README doc with the latest changes described above

* Adding sparse support to MXTensor for custom operators (#17569)

* Added enum for sparse storage

* Add structure for Dense and Sparse

* redesign the data structure for MXSparse

* pull out aux data from sparse NDArray

* Added more sparse arguments to API interface

* Passed sparse from c_api to lib_api.h and set in MXTensor

* Fix indent

* fix segfault

* Fix NDArray to MXTensor errors

* Add a sample of sparse(CSR) transpose

* Make CSR transpose temporarily work by hardcoding

* Fixed sparse output size(Refined)

* Add tests for symbolic and stateful ops

* Added a sample for row sparse transpose

* Added real row sparse transpose

* Fix output size issue by adding lambda for CheckAndAlloc()

* Fix mixed storage formats error

* Added infer storage type function

* resolve comments

* Set inferSType as optional function

* Resolve comments

* Add error messages

* Resolve comments

* verify transpose ops results

* fix sanity check

* update MX_LIBRARY_VERSION to 5

* Custom Operator Random Number Generator Support (#17762)

Add random number generator support for custom operator libraries.

Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow:

mx.random.seed(128)
r1 = mx.nd.some_custom_random_op(data)
mx.random.seed(128)
r2 = mx.nd.some_custom_random_op(data)
assert (r1 == r2)

This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet

This is a continuation of the custom operator project #15921 and #17270

Co-authored-by: guanxinq <58794120+guanxinq@users.noreply.github.com>
Co-authored-by: Ziyi Mu <ziyi.mu@columbia.edu>
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request May 29, 2020
* Added enum for sparse storage

* Add structure for Dense and Sparse

* redesign the data structure for MXSparse

* pull out aux data from sparse NDArray

* Added more sparse arguments to API interface

* Passed sparse from c_api to lib_api.h and set in MXTensor

* Fix indent

* fix segfault

* Fix NDArray to MXTensor errors

* Add a sample of sparse(CSR) transpose

* Make CSR transpose temporarily work by hardcoding

* Fixed sparse output size(Refined)

* Add tests for symbolic and stateful ops

* Added a sample for row sparse transpose

* Added real row sparse transpose

* Fix output size issue by adding lambda for CheckAndAlloc()

* Fix mixed storage formats error

* Added infer storage type function

* resolve comments

* Set inferSType as optional function

* Resolve comments

* Add error messages

* Resolve comments

* verify transpose ops results

* fix sanity check

* update MX_LIBRARY_VERSION to 5
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants