-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Low CPU usage of MXNet in subprocesses #13593
Comments
@TaoLv to help look at this issue |
@YutingZhang Thanks for your issue reporting! @anirudh2290 @apeforest @azai91 @samskalicky please take a look in here. |
Hi @YutingZhang, please try:
Please let me know if it works for you. Thanks. |
Related issue: #12255 |
The limitation of 1 thread per worker is deliberately set to avoid thread contention. Per offline discussion, I think a good solution is to use a ENV variable to control the limit of threads per worker can use (which defaults to 1 now). |
@zhreshold this would also require rebuild with modified |
@anirudh2290 Yes, I mean a PR is required to address this issue. |
Thanks everyone for discussing and solving the issue! |
@zhreshold I tried the latest version of mxnet, and do |
@YutingZhang MXNET_MP_WORKER_NTHREADS can only control how many mxnet operators run in parallel, in the case of some transformations, it might not be able to parallelize as much op as possible. Due to a openmp bug, it's disabled for the worker so unfortunately it is the case. You might want to enable opencv multithreading for each worker which might be the most time consuming part in worker process |
@pengzhao-intel @TaoLv @anirudh2290 @zhreshold Thank you for everyone's help, and happy new year! This problem seems more complicated (it might be multiple problems in the beginning). @zhreshold's fix solved the problem in most cases. Code (one-line difference):
Launch 10 workers ( But running it only in the main process is fine: By the way, another issue I found with |
@YutingZhang thanks for the case, we will look into the issue. |
@YutingZhang If you just want to utilize 100% cpu for each process, please try If you want enable openmp multi-threading to utilize >100% cpu for each process, you need to make below change for MXNet: Then you can use If you don't want to change MXNet and just want to increase the efficiency of MKL dot, you can try |
@zhreshold do you know some backgrounds why fixed the thread number to 1 in the worker processor as below line shown? |
Got some info from @YutingZhang #13449 #12380 thanks a lot. |
@pengzhao-intel The thread limit is set to 1 according to comment: #13606 (comment) If you have better understanding of the problem please let me know. |
@YutingZhang For example, |
MXNet has low CPU usage when running CPU operations in multiple process scenarios. Specifically, for MXNet computation in a subprocess, MxNet can use only 1 or 2 CPUs to do its job. This issue shows different behavior for different variants of MxNet (see below) and on different machines ...
This issue is critical because it slows down the multiprocess object-detection data-loading in gluoncv very significantly, making Faster-RCNN training in gluoncv unusable.
This is tested on the 20181207 version, and other versions (e.g., 1.3.1) show similar problems.
Code to reproduce the issue
Filename:
mxnet_cpu_test.py
Detailed experiments:
Run in the main process:
python3 mxnet_cpu_test.py --num-workers=0
Working fine for all mxnet variants (GPU or CPU-only).
Run in two subproceses
--
mxnet-cu90
on p3.16x:python3 mxnet_cpu_test.py --num-workers=2
It uses only 2 CPUs per subprocess.
--
mxnet-mkl
on p3.16x:python3 mxnet_cpu_test.py --num-workers=2
Same here. It uses only 2 CPUs per subprocess.
--
mxnet-mkl
on CPU-only machine c5.18x:python3 mxnet_cpu_test.py --num-workers=2
Even worse. It uses only 1.5 CPUs per subprocess.
-- However, for vanilla CPU-version
mxnet
on c5.18x:python3 mxnet_cpu_test.py --num-workers=2
It is working better. At least, it uses 5 CPUs per subprocess.
-- Weirdly, still vanilla CPU-version
mxnet
but on GPU machine p3.16x:python3 mxnet_cpu_test.py --num-workers=2
It is working worse, i.e., 2 CPUs per subprocesses.
This problem seems relevant to how MXNet manage the thread per subprocess. If I do not
import mxnet
in the main process and insteadimport mxnet
in each subprocess:python3 mxnet_cpu_test.py --num-workers=2 --late-import
Then everything is working fine.
The text was updated successfully, but these errors were encountered: