Support multi-threading for Custom Operator #14363

wkcn · 2019-03-08T03:42:37Z

Description

Hi! I found that there was single-thread to execute custom operator. It drops the performance.
Is it a better way to support custom operator in multi-thread?

Test Case:

import mxnet as mx
import time

class TestOP(mx.operator.CustomOp):
    def __init__(self):
        super(TestOP, self).__init__()
    def forward(self, is_train, req, in_data, out_data, aux):
        print("sleep 5s")
        time.sleep(5)
        print("sleep finished")
        out_data[0][:] = in_data[0]
    def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
        pass

@mx.operator.register("test_op")
class TestOPProp(mx.operator.CustomOpProp):
    def __init__(self):
        super(TestOPProp, self).__init__()
    def list_arguments(self):
        return ['x']
    def list_outputs(self):
        return ['y']
    def infer_shape(self, in_shape):
        return in_shape, in_shape
    def create_operator(self, ctx, shapes, dtypes):
        return TestOP()

a = mx.nd.array([1,2,3])
tic = time.time()
b = mx.nd.Custom(a, op_type='test_op')
c = mx.nd.Custom(a, op_type='test_op')
d = mx.nd.Custom(a, op_type='test_op')
mx.nd.waitall()
print(time.time() - tic)

Cost Time:
Before: 15s
After: 5s

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Support multi-threading for Custom Operator
Add environmental variable MXNET_CUSTOM_OP_NUM_THREADS

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

wkcn · 2019-03-08T03:48:32Z

I will check whehter it is correct to use the lock.
it seems that there is no problem.

eric-haibin-lin

Nice enhancement. Two questions:

src/operator/custom/custom-inl.h

vandanavk · 2019-03-08T05:48:31Z

@mxnet-label-bot add [Operator, pr-awaiting-review]

anirudh2290

Thanks for the PR! Nice work.

anirudh2290 · 2019-03-09T00:54:25Z

src/operator/custom/custom-inl.h

@@ -139,38 +143,51 @@ class CustomOperator {
      destructing_ = true;
      cv_.notify_all();
    }
-    worker_.join();
+    for (auto &worker : workers_)
+      worker.join();


Have you evaluated this when the custom op is part of a bigger graph and if there is any performance impact ? Since the CustomOperator is static its lifetime is till the end of the program and destructor of customoperator gets called at the end. This means there is one thread that is waiting on the condition variable while other 11 threads tryign to obtain the lock and in a blocked state. Since these are idle threads, I am not sure if the impact will be significant but good to verify. Will also help us come up with a good default for MXNET_CUSTOM_OP_NUM_THREADS

Thank you!

Sorry that I do not have any server with multiple GPUs to evaluate a big graph.

In this PR, the number of threads will increase when threads are not enough, and the maximum number is MXNET_CUSTOM_OP_NUM_THREADS.

There are always threads which get the lock and execute the operator function, so I think the idle threads do not drop the performance.

I think the maximum MXNET_CUSTOM_OP_NUM_THREADS is the number of CPU cores.

Yes it's fine during op execution but even after custom op execution there is one thread waiting on CV and other 11 trying to acquire lock.

I see.
I ever thought that decrease the number of threads when threads are idle, however it is difficult to estimate the number.

can you run the example/reinforcement-learning/dqn which includes a custom op on CPU to check for performance.

It may be not available to check for performance on only CPU，since there Is only a computational stream usually. I will find a server with multiple GPUs and check on it.

Yes this change is mainly for performance improvements on a GPU machine, but we should be careful not to impact cpu performance for inference. Otherwise we should keep the default small.

Sorry that I met the problem when running the RL demo:

(gdb) bt #0 0x00007ffff7e1ff73 in free () at /usr/lib/libc.so.6 #1 0x00007fffbf2c9ad1 in ALEInterface::welcomeMessage[abi:cxx11]() () at /usr/lib/python2.7/site-packages/ale_python_interface/libale_c.so #2 0x00007fffbf2c9d0a in ALEInterface::ALEInterface() () at /usr/lib/python2.7/site-packages/ale_python_interface/libale_c.so #3 0x00007fffbf2c7fbb in ALE_new () at /usr/lib/python2.7/site-packages/ale_python_interface/libale_c.so #4 0x00007ffff6d236d0 in ffi_call_unix64 () at /usr/lib/libffi.so.6 #5 0x00007ffff6d230a0 in ffi_call () at /usr/lib/libffi.so.6 #6 0x00007ffff6cc9d77 in _ctypes_callproc () at /usr/lib/python2.7/lib-dynload/_ctypes.so #7 0x00007ffff6cc37d0 in () at /usr/lib/python2.7/lib-dynload/_ctypes.so #8 0x00007ffff7c8cb43 in PyObject_Call () at /usr/lib/libpython2.7.so.1.0 #9 0x00007ffff7c1c10e in PyEval_EvalFrameEx () at /usr/lib/libpython2.7.so.1.0 #10 0x00007ffff7c156ba in PyEval_EvalCodeEx () at /usr/lib/libpython2.7.so.1.0

I'm checking it. The problem may be related to GPERFTOOLS and JEMALLOC. I close them and plan to rebuild MXNet.

In my experiment, the performance does not drop in DQN example. The FPS keeps 1500 on my laptop with only CPU i7-7500U(2c4t).

anirudh2290 · 2019-03-09T00:57:03Z

src/operator/custom/custom-inl.h

+      while (!q_.empty()) {
+        --num_free_threads;
+        auto fn = q_.front();
+        q_.pop();


Nice ! Popping early will prevent the race condition.

anirudh2290

LGTM!

wkcn · 2019-03-15T01:16:59Z

@eric-haibin-lin @szha Hi! Is it available to merge this PR?

* Support multi-threading for Custom Operator * update * Update custom-inl.h

RogerChern · 2019-04-05T05:45:24Z

Great work!

* Support multi-threading for Custom Operator * update * Update custom-inl.h

Support multi-threading for Custom Operator

4da4a9a

wkcn requested a review from szha as a code owner March 8, 2019 03:42

jermainewang mentioned this pull request Mar 8, 2019

List of missing support from MXNet in DGL project dmlc/dgl#435

Closed

7 tasks

update

dff6c5a

eric-haibin-lin reviewed Mar 8, 2019

View reviewed changes

src/operator/custom/custom-inl.h Show resolved Hide resolved

src/operator/custom/custom-inl.h Show resolved Hide resolved

Update custom-inl.h

f6fd6ca

marcoabreu added Operator pr-awaiting-review PR is waiting for code review labels Mar 8, 2019

wkcn mentioned this pull request Mar 8, 2019

Namescope is None when hybridize in multi-threading environment. AttributeError: 'NoneType' object has no attribute '__exit__' #13199

Open

anirudh2290 reviewed Mar 9, 2019

View reviewed changes

anirudh2290 approved these changes Mar 13, 2019

View reviewed changes

szha merged commit 74c2274 into apache:master Mar 15, 2019

arcadiaphy mentioned this pull request Mar 16, 2019

mx.nd.Custom not working in subprocess #14396

Closed

wkcn mentioned this pull request Mar 26, 2019

mx.nd.Custom conflicts with memory management #14522

Open

vdantu pushed a commit to vdantu/incubator-mxnet that referenced this pull request Mar 31, 2019

Support multi-threading for Custom Operator (apache#14363)

4dd5ce8

* Support multi-threading for Custom Operator * update * Update custom-inl.h

RogerChern mentioned this pull request Apr 5, 2019

Future Roadmap Suggestions tusen-ai/simpledet#66

Open

nswamy pushed a commit that referenced this pull request Apr 5, 2019

Support multi-threading for Custom Operator (#14363)

207c6c1

* Support multi-threading for Custom Operator * update * Update custom-inl.h

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019

Support multi-threading for Custom Operator (apache#14363)

db1fd5e

* Support multi-threading for Custom Operator * update * Update custom-inl.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multi-threading for Custom Operator #14363

Support multi-threading for Custom Operator #14363

wkcn commented Mar 8, 2019 •

edited

Loading

wkcn commented Mar 8, 2019 •

edited

Loading

eric-haibin-lin left a comment

vandanavk commented Mar 8, 2019

anirudh2290 left a comment

anirudh2290 Mar 9, 2019

wkcn Mar 9, 2019 •

edited

Loading

anirudh2290 Mar 9, 2019

wkcn Mar 9, 2019

anirudh2290 Mar 12, 2019

wkcn Mar 12, 2019 •

edited

Loading

anirudh2290 Mar 13, 2019

wkcn Mar 13, 2019

wkcn Mar 13, 2019 •

edited

Loading

wkcn Mar 13, 2019

anirudh2290 Mar 9, 2019

anirudh2290 left a comment

wkcn commented Mar 15, 2019

RogerChern commented Apr 5, 2019

Support multi-threading for Custom Operator #14363

Support multi-threading for Custom Operator #14363

Conversation

wkcn commented Mar 8, 2019 • edited Loading

Description

Checklist

Essentials

Changes

Comments

wkcn commented Mar 8, 2019 • edited Loading

eric-haibin-lin left a comment

Choose a reason for hiding this comment

vandanavk commented Mar 8, 2019

anirudh2290 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wkcn Mar 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wkcn Mar 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wkcn Mar 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anirudh2290 left a comment

Choose a reason for hiding this comment

wkcn commented Mar 15, 2019

RogerChern commented Apr 5, 2019

wkcn commented Mar 8, 2019 •

edited

Loading

wkcn commented Mar 8, 2019 •

edited

Loading

wkcn Mar 9, 2019 •

edited

Loading

wkcn Mar 12, 2019 •

edited

Loading

wkcn Mar 13, 2019 •

edited

Loading