Pretty high cpu load when import mxnet #12255

fighting-liu · 2018-08-20T18:16:32Z

When i import mxnet in 8 processes simultaneously, all cpu resources will be used and the program stagnates for almost 5 minutes.

It works fine for mxnet1.1 but failed for mxnet1.2 and mxnet1.3
Following is the sample code

import multiprocessing

def mxnet_worker():
    print 'before import'
    import mxnet 
    print 'after import'

read_process = [multiprocessing.Process(target=mxnet_worker) for i in range(8)]
for p in read_process:
    p.daemon = True
    p.start()

Any solution to this for mxnet1.2 and mxnet1.3? Thanks.

The text was updated successfully, but these errors were encountered:

vdantu · 2018-08-20T18:28:47Z

@fighting-liu : Just to understand your problem a little further, could you give details of your environment? This is given in the issue template. Also, what was the issue? Was the "after import" being printed after 5 minutes? How long did it take for 1.1 and also, how long does it take for one process import'ing mxnet with .12? System/environment information would be useful to help you further.

For reference of the issue template: /~https://github.com/apache/incubator-mxnet/blob/master/.github/ISSUE_TEMPLATE.md

@mxnet-label-bot : [Python, Question]

fighting-liu · 2018-08-21T06:55:22Z

@vdantu Thanks for your attention.

1. Environment info

----------System Info----------
('Platform     :', 'Linux-3.10.0-693.17.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core')
('system       :', 'Linux')
('node         :', '*****************')
('release      :', '3.10.0-693.17.1.el7.x86_64')
('version      :', '#1 SMP Thu Jan 25 20:13:58 UTC 2018')
----------Hardware Info----------
('machine      :', 'x86_64')
('processor    :', 'x86_64')
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                56
On-line CPU(s) list:   0-55
Thread(s) per core:    2
Core(s) per socket:    14
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
Stepping:              1
CPU MHz:               2401.000
CPU max MHz:           2401.0000
CPU min MHz:           1200.0000
BogoMIPS:              4800.04
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              35840K
NUMA node0 CPU(s):     0-13,28-41
NUMA node1 CPU(s):     14-27,42-55
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
----------Network Test----------
Setting timeout: 10
Timing for MXNet: /~https://github.com/apache/incubator-mxnet, DNS: 0.3104 sec, LOAD: 5.1832 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.5159 sec, LOAD: 6.2714 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.5885 sec, LOAD: 2.0516 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.5425 sec, LOAD: 1.0500 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 1.4049 sec, LOAD: 4.5601 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 2.1899 sec, LOAD: 2.9143 sec.

2. Sample code

import multiprocessing
import time

def mxnet_worker():
    b_time = time.time()
    import mxnet 
    print 'time consumes: {}'.format(time.time()-b_time)

read_process = [multiprocessing.Process(target=mxnet_worker) for i in range(8)]
for p in read_process:
    p.daemon = True
    p.start()

3. Test results
3.1 pip install mxnet-cu90==1.1

3.2 pip install mxnet-cu90==1.2

3.3 pip install --pre mxnet-cu90 (mxnet_cu90-1.3.0b20180820)

4. Cpu status from htop monitor

It takes too much time to import mxnet with mxnet 1.2 and mxnet 1.3

vdantu · 2018-08-23T06:14:42Z

Thanks for sharing this info. This is a good observation. I think it would be good to have a larger audience for this, could you also start a thread on discuss.mxnet.io?

fighting-liu · 2018-08-24T17:09:11Z

@vdantu Can you solve this problem?? I've start a thread on discuss.mxnet.io, but nobody answers.

kardoszc · 2018-08-31T10:12:35Z

import ctypes
import time

start = time.time()
ctypes.CDLL('libmxnet_1.1.so', ctypes.RTLD_LOCAL)
print 'load libmxnet1.1.so', time.time() - start

start = time.time()
ctypes.CDLL('libmxnet_1.3.so', ctypes.RTLD_LOCAL)
print 'load libmxnet1.3.so', time.time() - start

load libmxnet1.1.so 0.0203261375427
load libmxnet1.3.so 1.06584215164

FYI，this might the reason why it costs too much time.

kardoszc · 2018-09-02T14:05:57Z

@vdantu

vrakesh · 2018-10-02T23:10:48Z

@kardoszc using the example provided by you , i was able to get the same result on run 1 on a MAC, on consecutive runs of the above, the speed was better, or similar

$python mxnet11.py 
load libmxnet1.1.so  0.019122838973999023
load libmxnet1.3.so  0.035729169845581055
$python mxnet11.py 
load libmxnet1.1.so  0.01851797103881836
load libmxnet1.3.so  0.034262895584106445
$ python mxnet11.py 
load libmxnet1.1.so  0.019358158111572266
load libmxnet1.3.so  0.03675413131713867
$ python mxnet11.py 
load libmxnet1.1.so  0.01902318000793457
load libmxnet1.3.so  0.03872489929199219

The files are about 10MB different is size, so the loading time itself is slightly higher on 1.3

ls -l libmxnet1.*.so

27494556 Oct  2 15:50 libmxnet1.1.so
37263368 Oct  2 15:51 libmxnet1.3.so

vrakesh · 2018-10-03T01:26:48Z

@fighting-liu
similar results to above were observed running your example

>>> def mxnet_worker():
...     b_time = time.time()
...     import mxnet 
...     print 'time consumes: {}'.format(time.time()-b_time)
... 
>>> read_process = [multiprocessing.Process(target=mxnet_worker) for i in range(8)]
>>> for p in read_process:
...     p.daemon = True
...     p.start()
... 
>>> time consumes: 15.8834888935
time consumes: 15.884565115
time consumes: 15.8791670799
time consumes: 15.8853030205
time consumes: 15.8832161427
time consumes: 15.882764101
time consumes: 15.8819229603
time consumes: 15.8869299889

>>> read_process = [multiprocessing.Process(target=mxnet_worker) for i in range(8)]
>>> for p in read_process:
...     p.daemon = True
...     p.start()
... 
>>> time consumes: 1.01575899124
time consumes: 1.02250099182
time consumes: 1.03319501877
time consumes: 1.03118515015
time consumes: 1.03451776505
time consumes: 1.03348302841
time consumes: 1.03426003456
time consumes: 1.03685307503

The second run gets the speed up again, I observed a similar trend in 1.1 as well, although the gap is smaller.

I am investigating this issue further

vrakesh · 2018-10-05T22:58:10Z

More updates this issue seems to specific to linux, reproduced Original posters issue on a linux box
mxnet 1.1

>>> import multiprocessing
>>> import time
>>> 
>>> def mxnet_worker():
...     b_time = time.time()
...     import mxnet 
...     print 'time consumes: {}'.format(time.time()-b_time)
... 
>>> read_process = [multiprocessing.Process(target=mxnet_worker) for i in range(8)]
>>> for p in read_process:
...     p.daemon = True
...     p.start()
... 
>>> time consumes: 0.513573884964
time consumes: 0.518635988235
time consumes: 0.553323984146
time consumes: 0.549813985825
time consumes: 0.558361053467
time consumes: 0.556171894073
time consumes: 0.566649913788
time consumes: 0.569785118103

mxnet 1.3

python
Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> import time
>>> 
>>> def mxnet_worker():
...     b_time = time.time()
...     import mxnet 
...     print 'time consumes: {}'.format(time.time()-b_time)
... 
>>> read_process = [multiprocessing.Process(target=mxnet_worker) for i in range(8)]
>>> for p in read_process:
...     p.daemon = True
...     p.start()
... 
>>> time consumes: 8.75463604927
time consumes: 17.4193239212
time consumes: 17.656072855
time consumes: 18.1875190735
time consumes: 18.4937279224
time consumes: 18.5608999729
time consumes: 18.5980598927
time consumes: 18.6172778606

lupesko · 2018-10-07T08:38:13Z

@fighting-liu is your MXNet built from source, or installed via pip? if installed via pip - which version of MXNet? (mxnet/mxnet-mkl/mxnet-cu90mkl/...), if built from source can you provide the build flags?

@vrakesh the original report is that program stagnates for 5 mins... what is the total time you are seeing?

vrakesh · 2018-10-08T06:48:52Z

@lupesko I do not see times as high was 5 minutes, overall, the who run is under 30s. But it should take take more than a second or so to complete a C library import

When a single instance is imported it takes about half a second to import , as the number of parallel import increases , the time taken drastically increases.

samskalicky · 2018-11-30T00:28:52Z

Hey Guys, we need to start getting consistent in what and how were testing this.

Here was I tried:

spin up a C5.large instance with DLAMI v19.0
pip install mxnet-cu90 --user
time python -c "import mxnet"

Here was the output:

real 0m24.955s
user 0m0.827s
sys 0m0.189s

Doing it a second time resulted in the following (and for every subsequent time too).

real 0m0.855s
user 0m0.798s
sys 0m0.123s

Spinning up a new instance, reinstalling the pip and running the script above for running 8 processes in parallel resulted in:

$ time python load_time.py
time consumes: 24.740694046020508
time consumes: 24.74588418006897
time consumes: 24.746330499649048
time consumes: 24.759278059005737
time consumes: 24.75693655014038
time consumes: 24.76559042930603
time consumes: 24.780791997909546
time consumes: 24.783067226409912

real 0m24.829s
user 0m8.753s
sys 0m0.742s

And running it a 2nd (and every subsequent) time:

$ time python load_time.py
time consumes: 4.612242221832275
time consumes: 4.625608682632446
time consumes: 4.641973257064819
time consumes: 4.690966844558716
time consumes: 4.7061262130737305
time consumes: 4.699116945266724
time consumes: 4.703948259353638
time consumes: 4.718777418136597

real 0m4.770s
user 0m8.823s
sys 0m0.650s

I terminated that instance and spun up a new one and after installing the pip wheel I modified the file in

/home/ec2-user/.local/lib/python3.6/site-packages/mxnet/__init__.py

(where the packages are installed with --user) to get an idea which imports where causing the most delay. Heres what I found were the biggest offenders:

from .context import Context, current_context, cpu, gpu, cpu_pinned
from . import engine
from .base import MXNetError
from . import base
from . import contrib
from . import ndarray
from . import ndarray as nd
from . import name

time = 20.388737678527832

from . import random as rnd
from . import random
from . import optimizer
from . import model
from . import metric
from . import notebook
from . import initializer

time = 0.5453829765319824

from . import image
from . import image as img

from . import test_utils

from . import rnn

from . import gluon

time = 0.4957273006439209

Running the 2nd time resulted in 0.6350719928741455 for the first block and 0.01 for the other two.

Now that we know the biggest delay is coming from that first group, i terminated and spun up a new instance, reinstalled and timed around those imports and found these two were causing the most delay:

from .context import Context, current_context, cpu, gpu, cpu_pinned
from . import contrib

9.019979238510132
14.68500018119812

So those two make up the majority of the 24 seconds we're seeing. If someone else could help out and jump in and follow a similar approach for the context and contrib, we can zoom in on the culprit.

samskalicky · 2018-11-30T17:34:04Z

I took the script from above and added another loop to try from 1 to 36 processes testing a local build from source (not pip package):

import multiprocessing
import time

def mxnet_worker():
    #t1 = time.time()
    import mxnet
    #t2 = time.time()
    #elapsed = t2-t1
    #print(times)

times = []
for i in range(37):
    t1 = time.time()
    read_process = [multiprocessing.Process(target=mxnet_worker) for i in range(i)]
    for p in read_process:
        p.daemon = True
        p.start()

    for p in read_process:
        p.join()
    t2 = time.time()
    times.append(t2-t1)

for i in times:
    print(i)

Here are the results when compiling with the following cmake flags:

cmake -DUSE_CUDA=OFF -DUSE_CUDNN=OFF -DUSE_MKLDNN=OFF -DBLAS=Open -DCMAKE_BUILD_TYPE=Debug  ..

1: 5.77136611938
2: 7.65716195107
3: 13.9892320633
4: 16.6815569401
5: 22.9886288643
6: 27.6006569862
7: 30.7331540585
8: 33.8466141224
9: 34.18151021
10: 37.1062369347
11: 43.6272640228
12: 44.1143600941
13: 45.8406460285
14: 46.6692020893
15: 47.8332960606
16: 52.4621579647
17: 56.1070458889
18: 56.8046569824
19: 54.1124491692
20: 65.2930281162
21: 62.0744900703
22: 60.4670469761
23: 69.6229948997
24: 71.4172370434
25: 70.9572968483
26: 74.8509230614
27: 77.0419559479
28: 78.2489409447
29: 80.1934709549
30: 74.9342000484
31: 84.4639661312
32: 83.6565339565
33: 91.3137798309
34: 88.20520401
35: 96.2017951012
36: 96.4477438927

Then I tested against the pip wheel:

1: 1.86075401306
2: 2.52445602417
3: 27.84821105
4: 272.775645971
5: 532.317739964
6: 785.189717054

and i killed it after 6 processes. I think we get the picture.

Heres another set of results when compiling without openmp:

cmake -DUSE_CUDA=OFF -DUSE_CUDNN=OFF -DUSE_MKLDNN=OFF -DBLAS=Open -DCMAKE_BUILD_TYPE=Debug -DUSE_OPENMP=OFF ..

1: 0.827432
2: 0.859651
3: 0.858839
4: 0.833471
5: 0.884956
6: 0.883090
7: 0.862174
8: 0.888009
9: 0.891180
10: 0.917642
11: 0.894244
12: 0.947771
13: 0.944967
14: 0.956380
15: 0.932657
16: 0.991420
17: 0.956935
18: 0.924413
19: 0.935913
20: 0.944736
21: 0.996702
22: 0.934430
23: 0.966333
24: 1.022540
25: 1.038306
26: 1.175906
27: 1.056674
28: 1.022513
29: 1.083556
30: 1.151226
31: 1.078056
32: 1.046550
33: 1.220279
34: 1.256747
35: 1.334894
36: 1.377328

Clearly theres a problem with OpenMP seeing as the results are very reasonable to load when OpenMP is not used.

@szha, can you take a look at this? Theres a huge discrepancy between building from source and the pip wheel. Is there something different that is done when building the pip wheel related to OpenMP?

Vikas-kum · 2018-11-30T23:30:40Z

Looks like the processes are stuck at gomp_team_start if I use multiprocessing

#0  0x00007f5a797a774a in do_spin (val=22256, addr=addr@entry=0x55f45a6451c4)
    at /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libgomp/config/linux/x86/futex.h:130
#1  do_wait (addr=addr@entry=0x55f45a6451c4, val=val@entry=22256) at /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libgomp/config/linux/wait.h:66
#2  0x00007f5a797a7813 in gomp_barrier_wait_end (bar=0x55f45a6451c0, state=22256)
    at /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libgomp/config/linux/bar.c:48
#3  0x00007f5a797a6a1d in gomp_simple_barrier_wait (bar=<optimized out>)
    at /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libgomp/config/posix/simple-bar.h:60
#4  gomp_team_start (fn=<optimized out>, data=<optimized out>, nthreads=7, flags=<optimized out>, team=0x55f45a646790)
    at /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libgomp/team.c:829
#5  0x00007f5a4f6cd8a8 in ?? () from /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so
#6  0x00007f5a4f6dee3c in ?? () from /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so
#7  0x00007f5a4f6df9fd in ?? () from /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so
#8  0x00007f5a4f6dfb53 in ?? () from /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so
#9  0x00007f5a4cb5f794 in ?? () from /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so
#10 0x00007f5a7f5336ba in call_init (l=<optimized out>, argc=argc@entry=2, argv=argv@entry=0x7ffd800cd128, env=env@entry=0x55f45a0422b0) at dl-init.c:72
#11 0x00007f5a7f5337cb in call_init (env=0x55f45a0422b0, argv=0x7ffd800cd128, argc=2, l=<optimized out>) at dl-init.c:30
#12 _dl_init (main_map=main_map@entry=0x55f45a485c90, argc=2, argv=0x7ffd800cd128, env=0x55f45a0422b0) at dl-init.c:120
#13 0x00007f5a7f5388e2 in dl_open_worker (a=a@entry=0x7ffd800c8610) at dl-open.c:575

But instead if I use threads, like below, threads don't get stuck.

def mxnet_worker():
    print("before import: pid:{}".format(getpid()))
    st_time = time.time()
    import mxnet
    end_time = time.time()
    print("after import: pid:{} time:{}".format(getpid(), end_time - st_time))

#read_process = [multiprocessing.Process(target=mxnet_worker) for i in range(8)]
from threading import Thread
i=0
while i<8:
    t1 = Thread(target=mxnet_worker)
    t1.start()
    i=i+1
#for p in read_process:
#    p.daemon = True
#    time.sleep(3)
#    p.start()
#time.sleep(100000)

Looks like there is issue with fork + openmp ,
We should check if this is related : https://bisqwit.iki.fi/story/howto/openmp/#OpenmpAndFork

samskalicky · 2018-12-01T01:07:22Z

I added a 2 second delay between launching each process in the script and found that most processes complete the import between 1.7-1.9 seconds. This 2 second delay prevents processes from competing for resources at the same time. There appears to be some bad contention going on thats causing the large delay. Adding the 2 second delay prevented large exponential increase in import delay. Heres the data for 1-36 processes with this 2 second delay using the pip wheel:

1: 2.002468
2: 4.005214
3: 6.007431
4: 8.617790
5: 10.012702
6: 12.015214
7: 14.018058
8: 16.020579
9: 18.022354
10: 20.025505
11: 22.027436
12: 24.030987
13: 26.033214
14: 28.035729
15: 30.037884
16: 32.041215
17: 34.043598
18: 36.044175
19: 38.048719
20: 40.051221
21: 42.053396
22: 44.055702
23: 46.058514
24: 48.060241
25: 50.063950
26: 52.727346
27: 54.068652
28: 56.071813
29: 58.073548
30: 60.076997
31: 62.079071
32: 64.081306
33: 66.716197
34: 68.085373
35: 70.088400
36: 72.092169

So while the 12 seconds for 6 processes isnt ideal, it much better than the 785 that was found earlier without the 2 second delay.

So short-term workaround is to measure about how long it takes to "import mxnet" in a single process and then add an appropriately longer (~2second) delay between launching each process to avoid contention. The delay length might need to be tuned for each use case to avoid contention.

We'll continue debugging and trace the OpenMP problem and try to resolve it.

cjolivier01 · 2018-12-05T00:00:27Z

if you fork and then try to use OMP in thenforked process and are using libgomp, the process will hang because libgomp does not support forking.

cjolivier01 · 2018-12-05T00:22:02Z

OMP has a very high overhead in ec2 instances with 36+ cores. this is a known problem with ec2 instances.

Vikas-kum · 2018-12-05T01:01:29Z

There is huge number of threads getting created in operator-tune-inl.h that causes huge cpu load, threads take long time to determine whether they should use openmp for operator or not.
This problem gets worse with machines with more cores.
And multiplies with more processes.
As an experiment if I change the omp parallelism to thread count 1, and launch 32 processes it just take 6 seconds to load.

Vikas-kum · 2018-12-05T01:07:50Z

This is the code in /~https://github.com/apache/incubator-mxnet/blob/master/src/operator/operator_tune-inl.h#L359

const auto max_cores = static_cast<size_t>(omp_get_num_procs()) >> 1;
    if (max_cores >= 2) {
      std::vector<duration_t> core_times;
      // Take care of any OMP lazy-init with a throwaway call
      for (size_t omp_threads = 2; omp_threads <= max_cores; ++omp_threads) {
        GetOMPLoopOverhead(omp_threads);
}

If there are 36 cores, total threads will be 2+3+4+...18 = 170 threads
if there are 32 mxnet python processes = 170*32 = 5440 threads launched to determine tuning. Then there is resource conetention + omp_badness .
We would need to reduce the num of threads to determine tuning, making it bounded if possible.

larroy · 2018-12-05T18:19:22Z

@cjolivier01 Would running with MXNET_USE_OPERATOR_TUNING=0 mitigate this issue?

samskalicky · 2018-12-06T17:54:57Z

Hi @larroy , setting that env var does seem to avoid the issue. Heres the testing with and without it:

$ python parallel_test.py 
1: 3.71833109856
2: 2.17086696625
3: 46.48721385

ubuntu@ip-172-31-83-54:~$ export MXNET_USE_OPERATOR_TUNING=0
ubuntu@ip-172-31-83-54:~$ python parallel_test.py 
1: 0.798185110092
2: 0.792401075363
3: 0.818596124649

Notice that in the 2nd run with the env var, that all runs take ~0.8 seconds

@cjolivier01 what are the performance implications for setting this env var? Does it only affect non-MKL/MKLDNN operators executing on CPU context?

doing import mxnet in multiple processes take very long time. Details : apache#12255 One of the reason we have OMP tuning code which iterates to find OMP tune overhead. We are reducing this iteration count to reduce the overehead of tuning code. Also, We added an environment variable where users can set the number of cores that should be used to determine tuning.

…13602) * #12255 doing import mxnet in multiple processes take very long time. Details : #12255 One of the reason we have OMP tuning code which iterates to find OMP tune overhead. We are reducing this iteration count to reduce the overehead of tuning code. Also, We added an environment variable where users can set the number of cores that should be used to determine tuning. * cpplint fix * Adding new environment variable: MXNET_USE_NUM_CORES_OPERATOR_TUNING to doc * fixing formatting in doc

…pache#13602) * apache#12255 doing import mxnet in multiple processes take very long time. Details : apache#12255 One of the reason we have OMP tuning code which iterates to find OMP tune overhead. We are reducing this iteration count to reduce the overehead of tuning code. Also, We added an environment variable where users can set the number of cores that should be used to determine tuning. * cpplint fix * Adding new environment variable: MXNET_USE_NUM_CORES_OPERATOR_TUNING to doc * fixing formatting in doc

…13602) * #12255 doing import mxnet in multiple processes take very long time. Details : #12255 One of the reason we have OMP tuning code which iterates to find OMP tune overhead. We are reducing this iteration count to reduce the overehead of tuning code. Also, We added an environment variable where users can set the number of cores that should be used to determine tuning. * cpplint fix * Adding new environment variable: MXNET_USE_NUM_CORES_OPERATOR_TUNING to doc * fixing formatting in doc

anirudh2290 · 2018-12-17T19:28:13Z

Closing this issue since the related PR is merged. Will track in a new issue, turning off omp tuning for the subprocess.

* Support full convention in quantized pooling (#13260) * fix quantized pooling and enable it in INT8 SqueezeNet * add test * fix test * address review comments * refine the test for quantized pooling * Add utility slave (#13383) * A few operators on graphs stored as CSR (#13290) * edge_id op csr forward on CPU (#34) * add node subgraph generator. (#35) * create DGLSubgraph. * fix. * return old eids in node_subgraph. * accelerate subgraph construction. * Add neighborhood op (#37) * add csr_neighborhood op * update neighborhood sample * Update csr_neighborhood_sample-inl.h * Update csr_neighborhood_sample-inl.h * Update csr_neighborhood_sample.cc * add graph compact operator. * fix a bug in dgl_subgraph. * fix a bug in dgl_graph_compact. * Update csr sample op (#39) * add csr_neighborhood op * update neighborhood sample * Update csr_neighborhood_sample-inl.h * Update csr_neighborhood_sample-inl.h * Update csr_neighborhood_sample.cc * Update csr_neighborhood_sample-inl.h * Update csr_neighborhood_sample.cc * Update csr_neighborhood_sample-inl.h * remove space. * move to dgl_graph to contrib. * move code. * move edge id. * fix compilation error. * add test for subgraph. * cleanup. * fix. * fix. * fix compile error. * fix compile error. * fix compile error. * fix. * add operator doc. * remove graph_compact * update doc. * address comments. * retrigger. * address comments. * retrigger * fix a bug in test. * retrigger * add check_format * Fixes #13386 - Refer Warnings (#13387) * Updated the paths for images for java tutorial (#13361) * Updated the paths for images * Empty commit * Empty commit * Nudge to CI * Fix/env disable mkldnn cache map (#13324) * add flag to disable mkldnn cache * update docs * fix typos * update var name * fix ordering * set cache size * fix log message * update docs * fix lint * fix lint * fix comparison * update method name * fix missing * fix logging * remove random item when cache exceeded * update helper name * update hash namespace * make ophash template * udpate function params * fix return * fix return * update return for helper * chagne class to typename * add typename * fix lint * update doc * pass ptr to cache * retrigger * retrigger * retrigger * change env var name to MXNET_MKLDNN_CACHE_NUM * fix log env name * retrigger * Initial website documentation for Java API (#13289) * Initial website documentation for Java API * Changing paths to be relative * Refactoring Java API website landing page * Update Java web docs based on feedback * Minor formatting fixes * Update maven repo to nightly build so that java will be available prior to 1.4.0 release * Adding java tutorial index to test_sanity_tutorials whitelist * Fix link to javadocs * Fix javadoc for infer package and minor install doc fix * Minor path fix * Replace mxnetci dockcross with public dockcross due to missing image (#13402) * Replace mxnetci dockcross with public dockcross due to missing image * Remove source lists change * Disable Jetson * Move to mxnetcipinned * Correct shapes of images in cifar10 and cifar100 (#13348) * Correct shapes of images in cifar10 and cifar100 cifar10 and cifar100 have 3 channels * Retrigger build * Updated recommenders example (#13041) * initial modification recommender * Recommender updates * fix notebooks * Update README.md * trigger build * Update README.md * Retrigger build * Improving multi-processing reliability for gluon DataLoader (#13318) * improving multi-processing reliability for gluon dataloader I found some multi-processing-related issues in the Gluon DataLoader. 1) Each time a _MultiWorkerIter shuts down, it could leave some dangling processes. The shutdown mechanism could not guarantee that all worker processes can be terminated. As a result, after running for several epochs, more and more dangling processes will stay there. This problem barely happens during training. In this case, there is a decent time interval between the last-batch data prefetching and the _MultiWorkerIter's shutting down). But the problem frequently happens 1) when I stop the iter before the end of an epoch, and 2) when I use the DataLoader for a data loading service and load the data as fast as possible. In both cases, the time interval between the most recent data prefetching and the iter shutdown are short. I guess that the _MultiWorkerIter iter is unable to shut down properly during active data prefetching. To fix this, I explicitly terminate the worker processes inside the shutdown function. 2) When loading data fast (still mostly during testing and data serving), there seems to be a risk of data racing. The data iter uses a _MultiWorkerIter to cache prefetched data, but the dict does not seem to be thread-safe for concurrent inserting and deleting elements. So occasionally, the data can be missing from the dict. To prevent this, I use a scope lock to guard the dict access. * do not wait for the workers to join, and kill any alive wokers as soon as possible * Onnx multi output (#13390) * Fix ONNX export to support multi-output graphs * Add ONNX unit-test * Added multi-output shape inference. - Removed unnecessary forward_pass() call - Modified infer_output_shape to return multiple shapes for multiple outputs as well as output names. * Fixed pylint * Change docker login (#13408) * Fixing doc links and minor edits for Java API (#13405) Update the main website links * Fix repeated typo in mxnet_op.h (#13406) * Use dynamic omp schedule for sparse dot with large matrix (#13398) * dynamic omp for dot update heuristic * add doc * Update mxnet_op.h * Update dot-inl.h * Added proper default value in cpp-package for optional<bool> (#13415) * Fix infoGan Gluon tutorial errors. (#13416) - Update notebook to avoid divide by 0 causing a warning. - Add MXBoard dependency. * :memo: Fixes #13388 Adds Clojure to MXNet installation docs (#13393) * Minor fixes to documentation (#13412) * Minor fixes to documentation * Updated the Maven Repository URL to point to staging repo * [Example] fix cpp example inception-bn and training acc issue (#13284) * fix inception-bn and training acc issue * add parameter initialization, fix lint * fix comparison * change optimizer to sgd * update sgd and update model name * add inception_bn in jenkins build * make max epoch an argument * remove inception_bn test * trigger ci * remove ci test * trigger ci * [Example]Fix mlp_csv example (#13273) * add instruction to get the data and fix typo * fix typo * update file name * trigger CI * add unit_test for unit_test_mlp_csv * add mlp_csv to jenkinsfile * revert jenkinsfile to another PR * trigger CI * trigger CI * Java doc (#13368) * Fix scaladoc and javadoc errors * Stop on errors starting on scala 1.3.x build * Adding Java to ubuntu setup install page and minor fixes to docs (#13420) * Adding Java to ubuntu setup install page and minor fixes to other java api docs * Improving javadoc for java-api predictor class Mostly documentation changes * [MXNET-1029] Feature request: randint operator (#12749) * randint operator add along with add optional tag to params * register param * lint space issue * randn issue fix * uniform_int_distribution doesn't support int8, uint8 fix * dtype ftype * ftype to dtype - invalid template arg * fix template arg issue * test with int dtype for windows * removed int8,uint8 from test * gpu implementation * gpu engine state diff * removed gpu support * empty commit * temporary fix : batchnorm flaky test skip * removed randn symbol specific code since other PR is on it * revert ndarray/randn for compatibility * added unit test for checking extremes and uniform distribution for sufficient samples * increased the high val * int32 to int64 support, indentation fix, check for optype correctly based on type of random function * gpu support, revert finfertype using template specialization, remove defaults, prints, test other low high val * fix for invalid template arg by checking for int32,int64 * gpu randint in random_generator * sample_uniform issue and param, removed old flaky test skip line * replaced discrete_uniform function by rand_int64 for consistency * formula update and removed itype * change ctx to include gpu, randint samepl_op.cu typo * trigger ci * doc fix, check fix, whitespace remove * added the without dtype testcase * Java demo file-path fix (#13358) * fix on ubuntu * add readme instruction * fix intellij Tutorials * fix intelliJ tutorial * fix the document * update demo * revert the change on intelliJ tutorial * fix make process * fix documentation * Updated README and NEWS with 1.3.1 release information (#13423) * Be more explicit about the exit status of the container (#13425) * [MKLDNN]Add quantized concat (#13297) * Add quantized concat * Fix non-mkldnn build * Add size check for MKLDNNQuantizedConcatForward * use all capital for constant * Rename constant with Google C++ style. * Address apeforest comments * Address apeforest comments * fix lint * Add frontend interface. * Retrigger CI * Add ARMv7 builds to dev_menu.py (#13432) * Add ARMv7 builds to dev_menu.py * Add Python3 CPU Intel MKLDNN unittests to dev_menu * [MXNET-1110] find the path to include header files (#13359) * add find_include_path API * address reviewer comment * change return type from list to string * add unit test * address reviewer comment * address reviewer comment * address reviewer comment * address reviewer comment * add subgraph adjacency operator. (#13396) * add adjacency. * fix lint. * add GPU impl. * retrigger * address comments. * Update dgl_graph.cc * Java added to install page (#13404) * added java install option * update maven blocks * update maven button url to snapshot search for java * add version; remove formatting on dependency * merge clojure updates * merge clojure updates - give code some breathing room * merge clojure updates - give code even more breathing room * 1.3.1 website updates (#13444) * 1.3.1 website updates * Java added to install page (#13404) * added java install option * update maven blocks * update maven button url to snapshot search for java * add version; remove formatting on dependency * merge clojure updates * merge clojure updates - give code some breathing room * merge clojure updates - give code even more breathing room * remove redundant link (#13428) * remove redundant link * retrigger * retrigger * [MXNET-886] ONNX export: HardSigmoid, Less, Greater, Equal (#12812) * ONNX export: Comparison operators * ONNX export: Hard sigmoid * Correct Inception Reference for Pertained Model (#13360) I noticed that the symbols and parameters in the model zoo are infact from /~https://github.com/apache/incubator-mxnet/blob/master/example/image-classification/symbols/inception-bn.py, which is not inception v3. It is inception + batch normalization. In this commit, I update the documentation and link to the correct research basis. * exclude the error folder from sphinx toc (#13354) * exclude the error folder from sphinx toc * bumping commit for CI * Update MKL-DNN to fix LSTM perf regression (#13417) * update MKL-DNN CI id * fix the reorder perf issue * bumped version to v0.17.1 * bumped to MKL-DNN v0.17.1 * pin to v0.17.1 * Mitigate #13341 (#13343) - KL never succeeds so it always goes exponential - Too many weight matrices were rejected because of zero weights, simplify generation to not include 0 weight edges * parallelize NDArray::Copy<cpu, cpu> when data size is large (#12926) * parallelize NDArray::Copy<cpu, cpu> by OpenMP when data size > MXNET_CPU_PARALLEL_COPY_SIZE * code specification according to reviewer's suggestions * align with std::memcpy api * add descriptive error message * update MXNET_CPU_PARALLEL_COPY_SIZE doc * update MXNET_CPU_PARALLEL_COPY_SIZE doc again * fix property not updating bug (#13085) * [MXNET-1222] Scala Inference enable different shapes input (#13330) * init commit with Predictor Improvement * add predictor Example * change into dArr * add img config * add new line and fix code style important bug fixes * Fix deconvolution / PR 13421 (#13433) * add test case * revert refactor * use with seed decorator * retrigger * remove seed * remove iteration * remove old test * update deconvolution test to have filter length that triggers mkldnn reorder * Add DGL subgraph sampling op (#13392) * add csr sample op * fix compile error in some platform * update * update openmp * speedup sampling * update csr * update csr * update time seed * update * fix compiler error * update doc * fix ci error * fix quantize_graph pass error when there're multiple outputs from a single node (#13000) * fix quantize_graph pass error when there're multiple outputs from a single node that need to insert 'contrib_quantize', 'min' and 'max' nodes for these outputs. * fix lint * Make the single output align with multiple outputs when inserting contrib_quantize * Change op comparing from its name to itself * skip unsupported quantize_concat * retrigger ci * Get the correct include path in pip package (#13452) * add find_include_path API * address reviewer comment * change return type from list to string * add unit test * address reviewer comment * address reviewer comment * address reviewer comment * address reviewer comment * fix include path problem in pip package * add comment * fix lint error * address reviewer comment * address reviewer comment * Use ~/.ccache as default ccache directory so is not cache is not erased on reboot (#13431) * Skip flaky test /~https://github.com/apache/incubator-mxnet/issues/13446 (#13480) * Rewrite dataloader with process pool, improves responsiveness and reliability (#13447) * fix recordio.py * rewrite dataloader with pool * fix batch as tuple * fix prefetching * fix pylint * picklable function * use pickle * add missing commit * Fix errors in docstrings for subgraph op; use code directive (#13463) * [MXNET-1158] JVM Memory Management Documentation (#13105) * update train_mnist * Add documentation for JVM Memory Management * update doc * address nit picks * address nit picks * Grammar and clarity edits for memory management doc * Edits for scala memory management * Update memory-management.md * Update memory-management.md * Update memory-management.md * capitalization fix * Update row_sparse tutorial (#13414) Update row_sparse tutorial * Add resiliency to onnx export code (#13426) * Added resiliency to onnx export code - With previous infer-shape implementation, if input shape was list instead of tuple or if extra non-existent parameters were provided, the code would still work. The fixes in this commit make sure that behavior is restored to prevent any compatibility issues with existing export code. * Fixed name of net in unittest * Fix pylint * [MXNET-1185] Support large array in several operators (part 1) (#13418) * fix a few operators with large arrays (# of elements) * fix bug in broadcast_div and add tests * address reviewer comment * add unit test * add empty line * retrigger CI * [MXNET-1210 ] Gluon Audio - Example (#13325) * Initialized the example * Addressed PR comments, about existing synset.txt file - no overwrite * RST - docstring issues fixed * added README * Addressed PR comments * Addressed PR comments, checking Divide by 0 * Raising error if format is not supported. * changed a line for ndarray of labels * Trigger CI * Trigger CI * PR comments addressed around skip_header argument * Addressed PR comments around librosa import * PR Comments * Passing lazy=lazy from argument * Added PR comments, labels to README.MD * Trigger CI * Addressing PR Comments in README * Modified README.md * Added example under audio folder * Retrigger CI * Retrigger CI * ONNX export: Instance normalization, Shape (#12920) * ONNX import/export: Make backend_rep common * ONNX export: Instance Normalization * ONNX export: Shape operator * Clarify dependency on OpenCV in CNN Visualization tutorial. (#13495) * clarify ops faq regarding docs strings (#13492) * Add graph_compact operator. (#13436) * add graph_compact. * fix. * add doc. * add tests for graph_compact. * address comments. * update docs. * trigger CI * Deprecate Jenkinsfile (#13474) * update github location for sampled_block.py (#13508) Updated to /~https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py * #13453 [Clojure] - Add Spec Validations to the Optimizer namespace (#13499) * ONNX export: Logical operators (#12852) * Fix cmake options parsing in dev_menu (#13458) Add GPU+MKLDNN unittests to dev_menu * Revert "Manually track num_max_thread (#12380)" (#13501) This reverts commit 75410210e07a5fab5e044348aee276d578d5857e. * Feature/mkldnn static 2 (#13503) * build mkldnn as static lib * update makefile to statically build mkldnn * build static mkldnn * fix static name * fix static name * update static for mac * rename mkldnn dep in ci * remove moving mkldnn dynamic lib * remove commented code * remove mkldnn dnaymic for unitest * force static for mkldnn lib * remove dynamic mkldnn bind * only link windows * add mkldnn.mk * try force linking * remove mkldnn dynanmic check * remove test mkldnn install * fix spacing * fix index * add artifacts * add comment about windows * remove static * update makefile * fix toctree Sphinx errors (#13489) * fix toctree errors * nudging file for CI * Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (#13527) * [MXNET-1234] Fix shape inference problems in Activation backward (#13409) * Provide a failing test for ReLU activation shape inference bug * Fix Activation backward shape inference fixes: #13333 * Add softsign Activation to test_gluon.py * Use activation in GPU if we are using CUDNN and not MKLDNN as it's happening right now * Don't disable MKLDNN * Docs & website sphinx errors squished 🌦 (#13488) * fix scala ndarray docs; remove interpreter style * fix docs error in kvstore * remove interpreter format in examples * remove python indicator for these non-functioning python code blocks; clears a sphinx error * remove old table that was not being used and was triggering a sphinx error * get rid of curly braces that was causing a pygments error * fix ambiguous reference causing sphinx error * nudging file for CI * [MXNET-1235] Add a test for AdaMax optimizer (#13467) * Add a test for AdaMax optimizer * Modify nested for loop with itertools.product and left tolerance to default * Trigger * Adadelta optimizer test (#13443) * adadelta test * comments * Update java setup docs for 1.4.0 (#13536) * Update java setup docs for 1.4.0 * Update Java-demo to 1.4.0 * Revert "Feature/mkldnn static 2 (#13503)" (#13540) This reverts commit 65edc9500b10a3404945d6d79acbae54a2833890. * doc fix (#13465) * [MXAPPS-1020] Clean up some Sphinx warnings. (#13539) * [MXNET-1110] Add header files required by horovod (#13062) * Add header files required by horovod * Add symbolic link and cherry picked required header * add python API to return include path * update link * fix windows CI * fix windows build * fix dlpack link * merge with master * exclude 3rd party header files from license check * exclude license check * exclude include directory * remove commented lines * Bumped minor version from 1.4.0 to 1.5.0 on master, updated License file (#13478) * updated to v1.5.0 * Bumped minor version from 1.4.0 to 1.5.0 on master * added Anirudh as maintainer for R package ... adding something useful and re-trigger PR check * Updated license file for clojure, onnx-tensorrt, gtest, R-package * Get the correct include path in pip package (#13452) * add find_include_path API * address reviewer comment * change return type from list to string * add unit test * address reviewer comment * address reviewer comment * address reviewer comment * address reviewer comment * fix include path problem in pip package * add comment * fix lint error * address reviewer comment * address reviewer comment * Use ~/.ccache as default ccache directory so is not cache is not erased on reboot (#13431) * Skip flaky test /~https://github.com/apache/incubator-mxnet/issues/13446 (#13480) * Rewrite dataloader with process pool, improves responsiveness and reliability (#13447) * fix recordio.py * rewrite dataloader with pool * fix batch as tuple * fix prefetching * fix pylint * picklable function * use pickle * add missing commit * Fix errors in docstrings for subgraph op; use code directive (#13463) * [MXNET-1158] JVM Memory Management Documentation (#13105) * update train_mnist * Add documentation for JVM Memory Management * update doc * address nit picks * address nit picks * Grammar and clarity edits for memory management doc * Edits for scala memory management * Update memory-management.md * Update memory-management.md * Update memory-management.md * capitalization fix * Update row_sparse tutorial (#13414) Update row_sparse tutorial * Add resiliency to onnx export code (#13426) * Added resiliency to onnx export code - With previous infer-shape implementation, if input shape was list instead of tuple or if extra non-existent parameters were provided, the code would still work. The fixes in this commit make sure that behavior is restored to prevent any compatibility issues with existing export code. * Fixed name of net in unittest * Fix pylint * [MXNET-1185] Support large array in several operators (part 1) (#13418) * fix a few operators with large arrays (# of elements) * fix bug in broadcast_div and add tests * address reviewer comment * add unit test * add empty line * retrigger CI * [MXNET-1210 ] Gluon Audio - Example (#13325) * Initialized the example * Addressed PR comments, about existing synset.txt file - no overwrite * RST - docstring issues fixed * added README * Addressed PR comments * Addressed PR comments, checking Divide by 0 * Raising error if format is not supported. * changed a line for ndarray of labels * Trigger CI * Trigger CI * PR comments addressed around skip_header argument * Addressed PR comments around librosa import * PR Comments * Passing lazy=lazy from argument * Added PR comments, labels to README.MD * Trigger CI * Addressing PR Comments in README * Modified README.md * Added example under audio folder * Retrigger CI * Retrigger CI * ONNX export: Instance normalization, Shape (#12920) * ONNX import/export: Make backend_rep common * ONNX export: Instance Normalization * ONNX export: Shape operator * Clarify dependency on OpenCV in CNN Visualization tutorial. (#13495) * clarify ops faq regarding docs strings (#13492) * Add graph_compact operator. (#13436) * add graph_compact. * fix. * add doc. * add tests for graph_compact. * address comments. * update docs. * trigger CI * Deprecate Jenkinsfile (#13474) * update github location for sampled_block.py (#13508) Updated to /~https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py * #13453 [Clojure] - Add Spec Validations to the Optimizer namespace (#13499) * ONNX export: Logical operators (#12852) * Fix cmake options parsing in dev_menu (#13458) Add GPU+MKLDNN unittests to dev_menu * Revert "Manually track num_max_thread (#12380)" (#13501) This reverts commit 75410210e07a5fab5e044348aee276d578d5857e. * Feature/mkldnn static 2 (#13503) * build mkldnn as static lib * update makefile to statically build mkldnn * build static mkldnn * fix static name * fix static name * update static for mac * rename mkldnn dep in ci * remove moving mkldnn dynamic lib * remove commented code * remove mkldnn dnaymic for unitest * force static for mkldnn lib * remove dynamic mkldnn bind * only link windows * add mkldnn.mk * try force linking * remove mkldnn dynanmic check * remove test mkldnn install * fix spacing * fix index * add artifacts * add comment about windows * remove static * update makefile * fix toctree Sphinx errors (#13489) * fix toctree errors * nudging file for CI * Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (#13527) * [MXNET-1234] Fix shape inference problems in Activation backward (#13409) * Provide a failing test for ReLU activation shape inference bug * Fix Activation backward shape inference fixes: #13333 * Add softsign Activation to test_gluon.py * Use activation in GPU if we are using CUDNN and not MKLDNN as it's happening right now * Don't disable MKLDNN * Fixing a 404 in the ubuntu setup doc (#13542) * [MXNET-1249] Fix Object Detector Performance with GPU (#13522) * Reduce post processing time * fix ssd * fix the CI * add comments * [MXNET-769] Use MXNET_HOME in a tempdir in windows to prevent access denied due t… (#13531) * Use MXNET_HOME in cwd in windows to prevent access denied due to concurrent data downloads Fixes #13484 * Revert "Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (#13527)" This reverts commit 3d499cb3584919b767142c5596211a7f7fb18d50. * Add a retry to qemu_provision (#13551) Fixes #13504 * Fix #13521 (#13537) * fix pool release * fix * Simplifications and some fun stuff for the MNIST Gluon tutorial (#13094) * Simplify mnist Gluon tutorial and add mislabelled sample plotting * Add mnist Gluon tutorial images * Gluon MNIST tutorial: Use modern Gluon constructs, fix some wordings * [Gluon] Move to data loaders and improve wording in MNIST tutorial * Fix broken links * Fix spelling of mislabeled * Final rewordings and code simplifications * Fix things according to review - Apply hybrid blocks - Move outputs outside of code blocks and mark for notebooks to ignore - Remove images, use external link - Fix a few formulations * Change activations to sigmoid in MNIST tutorial * Remove superfluous last layer activations in MNIST tutorial * Updated docs for randint operator (#13541) * updated docs for randint * added randint in __all__ and reordered acc to categorical then alphabetical * Trigger CI * minus mxnet.symbol and alphabetical for ndarray,symbol.md * alphabetical order * Chi_square_check for discrete distribution fix (#13543) * check for bucket instead of index * enumerate instead of range(len()) * count instead of sum to solve attribute error * revert to sum * seperate discrete and continuous * Trigger CI * Revert "Bumped minor version from 1.4.0 to 1.5.0 on master, updated License file" (#13558) * Revert "Chi_square_check for discrete distribution fix (#13543)" This reverts commit cf6e8cbd035bf315b3e8280416468a629c780d03. * Revert "Updated docs for randint operator (#13541)" This reverts commit e0ff3c36ee171386fef01fb86c54c343e4b04c14. * Revert "Simplifications and some fun stuff for the MNIST Gluon tutorial (#13094)" This reverts commit 8bbac827742c21607a863137792f03bd09847419. * Revert "Fix #13521 (#13537)" This reverts commit f6b4665995f8f8ff32862a029b2074475d8467eb. * Revert "Add a retry to qemu_provision (#13551)" This reverts commit f6f840110d74111f98c20eab5b08d64a46ebf0cd. * Revert "[MXNET-769] Use MXNET_HOME in a tempdir in windows to prevent access denied due t… (#13531)" This reverts commit bd8e0f8356676749ecae16ec38a366b4cc00bf15. * Revert "[MXNET-1249] Fix Object Detector Performance with GPU (#13522)" This reverts commit 1c8972c3c8f832519364916865541f48597581c7. * Revert "Fixing a 404 in the ubuntu setup doc (#13542)" This reverts commit cb0db290adcfd0fce956d02c234f81d453e41013. * Revert "Bumped minor version from 1.4.0 to 1.5.0 on master, updated License file (#13478)" This reverts commit 40db61908000ee86d21aac847ff2225807d6c168. * #13441 [Clojure] Add Spec Validations for the Random namespace (#13523) * Adding test for softmaxoutput (#13116) * Add workspace cleaning after job finished (#13490) * Add workspace cleaning after job finished * Update Jenkinsfile_utils.groovy * Update Jenkinsfile_utils.groovy * Fix flaky test test_random:test_randint_generator (#13498) * updated seed, alpha value, comments * typo in comment fix * added nrepeat * removed unusued variable, added link for scipy alpha, rephrased the sentence for discrete distribution buckets * removed fixed seed, alpha * Update version to v1.5.0 including clojure package (#13566) * Update DESCRIPTION * update version to v1.5.0 except for clojure * update version from 1.4.0 to 1.5.0 - add utility script to help bump versions in future - fix README to correct to current maven versions * License update (#13565) * Update LICENSE * update license for Clojure, R, ONNX-TRT and location of 3rd party dependencies. * fixed typo * Fix use-before-assignment in convert_dot (#13511) * fix the situation where idx didn't align with rec (#13550) minor fix the image.py add last_batch_handle for imagedeiter remove the label type refactor the imageiter unit test fix the trailing whitespace fix coding style add new line move helper function to the top of the file * Update MXNetTutorialTemplate.ipynb (#13568) Fix typos * ONNX import/export: Size (#13112) * fix link for gluon model zoo (#13583) * Fix exception handling api doc (#13519) * Fix exception handling api doc * Update waitall api doc Co-Authored-By: anirudh2290 <anirudh2290@apache.org> * add cpp example inception to nightly test (#13534) * add inception test * fix max iter for mlp * rename and add comment * rename epoch num * Add notes about debug with libstdc++ symbols (#13533) * Add imresize and copyMakeBorder to mx.image (#13357) * Add imresize API to docs * address comments * copyMakeBorder * [MXNET-1253] fix control_flow_op (#13555) * fix control_flow_op * change type for M * add test for sparse where op * Add Intel MKL blas to Jenkins (#13607) * add mkl blas to Jenkins * add mkl install script * fix bug in mkl script * remove python2 ut and add cpu-mkl node * #13385 [Clojure] - Turn examples into integration tests (#13554) * fix the Float not showing correctly problem (#13617) Merge this PR for 1.4.x * [MXNET-1155] Add scala packageTest utility (#13046) * [MXNET-1155] Add scala packageTest utility * Clean up utility * Safe change directory in Makefile for scala * mvn install file instructions with details * [MXNET-1224]: improve scala maven jni build and packing. (#13493) Major JNI feature changes. Please find more info here: https://cwiki.apache.org/confluence/display/MXNET/Scala+maven+build+improvement * [MXNET-1225] Always use config.mk in make install instructions (#13364) * Always use config.mk in make install instructions * Specify Cuda 0 for ubuntu with mkldnn * Scala install doc avoid build_from_source Minor doc fixes * Fix build_from_source CMake usage * CPP Install Instruction with CMake * Use cmake out of source build * Fix warning in waitall doc (#13618) * Optimize C++ API (#13496) * Optimize C++ API Pass parameter with reference instead of value. Add const as well as it is not changed. * fix docs/architecture/overview.md Fix BinaryShapeFunction typedef Add a right brace for SmoothL1Shape_ * fix quantize pass error when the quantization supported Op are excluded in the model (#13596) * Scripts for building dependency libraries of MXNet (#13282) * openblas script * ps-lite dependencies * USE_S3 dependencies * image libraries * license * add batch norm test (#13625) * add batch norm test * fix formatting * use out_arr as input * fix typo * remove const * use ptr * eval ptr * Set install path for libmxnet.so dynamic lib on Mac OS (#13629) * Fix the bug of BidirectionalCell (#13575) * Fix the bug of BidirectionalCell I did hybridize( ) and pass "valid_length" to the unroll( ) function of BidirectionalCell, then returned AssertionError in line 79. Because symbol.split( ) return a symbol but not a symbol list. Result in the length of inputs dont equal parameter "length" when call unroll( ) to compute r_outputs and r_states. * add a test for BidirectionalCell * Fix the bug of BidirectionalCell I did hybridize( ) and pass "valid_length" to the unroll( ) function of BidirectionalCell, then returned AssertionError in line 79. Because symbol.split( ) return a symbol but not a symbol list. Result in the length of inputs dont equal parameter "length" when call unroll( ) to compute r_outputs and r_states. * fix test_bidirectional_unroll_valid_length( ) Fix the error of parameter. * Fix the bug of BidirectionalCell I did hybridize( ) and pass "valid_length" to the unroll( ) function of BidirectionalCell, then returned AssertionError in line 79. Because symbol.split( ) return a symbol but not a symbol list. Result in the length of inputs dont equal parameter "length" when call unroll( ) to compute r_outputs and r_states. * fix test_bidirectional_unroll_valid_length( ) * Feature/mkldnn static (#13628) * Revert "Revert "Feature/mkldnn static 2 (#13503)" (#13540)" This reverts commit a3eca5f5c96eed0bc29bd4e58e470997091a1fb3. * include headers on mkldnn lib * retrigger * retrigger * build config for maven and pip (#13556) * config for pip * symbol whitelist * maven build config * Fix for import mxnet taking long time if multiple process launched (#13602) * /~https://github.com/apache/incubator-mxnet/issues/12255 doing import mxnet in multiple processes take very long time. Details : #12255 One of the reason we have OMP tuning code which iterates to find OMP tune overhead. We are reducing this iteration count to reduce the overehead of tuning code. Also, We added an environment variable where users can set the number of cores that should be used to determine tuning. * cpplint fix * Adding new environment variable: MXNET_USE_NUM_CORES_OPERATOR_TUNING to doc * fixing formatting in doc * Add reshape op supported by MKL-DNN (#12980) * Add reshape op supported by MKL-DNN * fix build issue * fix lint * fix lint * fix lint * fix lint * fix lint * fix lint * fix white space * add unit test * merge if blocks * Improve dev_menu usability, local build and virtualenv (#13529) * Improve dev_menu, add build command and virtualenv creation with local builds for easy testing * Update dev_menu.py Co-Authored-By: larroy <pedro.larroy.lists@gmail.com> * Cuda off by default, use ccache * address CR * [Clojure] Correct the versions in the README so they correspond to the latest maven.org release (#13507) * Correct the versions so they correspond to the latest maven.org release * trigger build * feedback from @kohr-h * Optimization of metric evaluation (#13471) * Change argsort to argpartition * Global statistics in metrics * Fix lint * Fixes from review * Trigger * Fixes from review, fix to F1, MCC and perplexity metrics, added test for global stats * Fix lint * Fix compatibility with Python 2 * Revert "Feature/mkldnn static (#13628)" (#13638) This reverts commit 5bcf2bd6e8b48fa27bfcfdafd06401ec2d28978b. * support mkl log when dtype is fp32 or fp64 (#13150) * support mkl log when dtype is fp32 or fp64 * remove macro * ensure data size less than or equal MKL_INT_MAX * code specification * fix indent * for retrigger * [MXNET-1209] Tutorial transpose reshape (#13208) * transpose tutorial * Adding Anirudhs comments * Update tutorial with some more examples * Adding links * Fixing the links, adding more examples * Update reshape_transpose.md * Fixing spelling mistakes * Updating image resolution * Adding Simon's comments * Small fixes * Update reshape_transpose.md * Update reshape_transpose.md * empty commit * empty commit * updated reference to Apache MXNet (#13645) * Complimentary gluon DataLoader improvements (#13606) * init * add tests * doc * lint * fix openmp * Improve CCache handling (#13456) * Remove gitignore entries * Modify Makefile * Modify user permissions * Add new ccache wrapper function * Change PATH rewrite to a different one to resolve CUDA issues * Add ccache to gpu cmake * Enable ccache for every build * Set permissions for arm dockerfiles * Disable ccache for ASAN * Remove g++-8 ccache redirect * Update Android Dockerfiles for user permissions * Fix ASAN compiler typo * Remove sanity for speed * Move build dir creation in android armv8 * Revert "Remove sanity for speed" This reverts commit e8386a774dafe96337930b9cac36cb24fc36585e. * Add ccache for NVCC in Makefile * [MXNET-918] Random module (#13039) * introduce random API * revert useless changes * shorter types in APIDoc gen code * fix after merge from master * Trigger CI * temp code / diag on CI * cleanup type-class code * cleanup type-class code * fix scalastyle * Fix incorrect delete in MXExecutorReshape exception handling (#13376) * Fix bad delete. Delete the pointed-to handle on cleanup, not the location of the handle itself. Also don't delete it if we didn't set it in the first place. * Remove unusued 'exec' var from MXExecutorBindEX. * [MXNET-1251] Basic configuration to do static-linking (#13621) * Basic configuration to do static-linking * update build script and place it in the install part * clean up the code further * revert maven into build-from-source * add curl to deps * [MXNET-1195] Cleanup Scala README file (#13582) * Updated the Scala-Readme with upto-date information * Updated the header * Removed redundant build status * Minor formatting changes * Addressed the PR feedback * Added section on Scala training APIs * Removed mention of deprecated Model API * scripts for building libmxnet binary and wheel (#13648) * add script for making all dependencies * tools for building pip package * build scripts for lib and wheel * [MXNET-1083] Add the example to demonstrate the inference workflow using C++ API (#13294) * [MXNET-1083] Add the example to demonstrate the inference workflow using C++ API * [MXNET-1083] Add the example to demonstrate the inference workflow using C++ API * Updated the code to address the review comments. * Added the README file for the folder. * Addressed the review comments * Addressed the review comments to use argmax and default mean values. * Update MKLDNN_README.md (#13653) * Support Quantized Fully Connected by INT8 GEMM (#12922) * add quantized fully connect support * disable qfc cpu case since s8u8s32 is only supported by MKL BLAS library * retrigger to ci testing * move implementation to cc file and add STORAGE_TYPE_ASSIGN_CHECK * fix typo bug * retrigger the ci test * fix typo bug * retrigger ci * retrigger the ci test * retrigger the ci * retrigger the ci test * retrigger ci test * fix indent issue * retrigger the ci * retrigger the ci test * add verbose message * update log message * using range for loop * using for auto range * enable MKL BLAS ci test * fix typo issue * use TYPE_ASSIGN_CHECK * retrigger the ci * add build fix for Scala/Java build (#13655) * Fix Jetson compilation (#13532) * remove omp which can cause ssd accuracy variance (#13622) * Revert "[MXNET-43] Fix Jetson compilation" (#13665) * Revert "remove omp which can cause ssd accuracy variance (#13622)" This reverts commit 655f1c6f7a0706dd622f73db9af2e6df895ca213. * Revert "Fix Jetson compilation (#13532)" This reverts commit 48e25c4cae355753dd96ea7afe004bf78e0719e4. * Fix Jetson compilation (#13666) * turn on Sphinx warnings as errors (#13544) * turn on warnings as errors * move warnings as error logic to build_all_version * fix typo in comment * add warning as error option for docs pipeline * bump ci to test again; use this chance to add notes on this feature * fix bugs in image.py docs * Update CODEOWNERS, add Pedro Larroy. (#13579) * Revert "Revert "[MXNET-43] Fix Jetson compilation" (#13665)" (#13672) This reverts commit 3433776dac7be75928082bbc1d552fca248fb8e8. * Accelerate DGL csr neighbor sampling (#13588) * Speedup and fix bug in dgl_csr_sampling op * Update dgl_graph.cc * simplify functions. * avoid adding nodes in the last level in the queue. * remove a hashtable lookup in neigh_pos. * reduce a hashtable lookup in sub_ver_mp. * merge copying vids and layers. * reduce hashtable lookup when writing to output csr. * fix a bug. * limit the number of sampled vertices. * fix lint. * fix a compile error. * fix compile error. * fix compile. * remove one hashtable lookup per vertex and hashtable iteration. * remove queue. * use vector for neigh_pos. * fix lint * avoid init output arrays. * fix tests. * fix tests. * update docs. * retrigger * retrigger * [MXNET-1252][1 of 2] Decouple NNVM to ONNX from NNVM to TenosrRT conversion (#13659) * fix unpicklable transform_first on windows (#13686) * Move the debug output message into MXNET_MKLDNN_DEBUG (#13662) * NEWS.md backport from v1.4.x to master (#13693) * merge NEWS.md from 1.4.x to master * NEWS.md backport from v1.4.x to master * Fallback to dense version for grad(reshape), grad(expand_dims) (#13599) * fallback to dense version for grad(reshape), grad(expand_dims) * add _backward_reshape gpu version * reshape test case comments * fix gpu test * remove mkldnn support for _backward_reshape * ONNX export: Add Flatten before Gemm (#13356) * Add Flatten before Gemm * ONNX export test: Allow multiple inputs in forward pass * ONNX export: Test for fully connected * [MXNET-1164] Generate the document for cpp-package using Doxygen (#12977) * Adding cpp-package directory to the Doxyfile. Updating the index.md file in c++ api directory. * Updating the link to classes in C++ API to point to correct html file. * Updated the links to use relative paths. * Removed the extra slash character in the url * Excluded the 3rdparty folder as per the review comment. * Update git clone location to apache github (#13706) * Add timeout/retry logic to docker cache download (#13573) * Added timeout/retry (linear backoff) to docker cache download * Units changed, as time.sleep takes seconds as argument * Improved error handling * Using retry decorator * Added retry decorator to _login_dockerhub method * Fixed wrong import * Fix NDArray ToDLPack Bug (#13698) * Added javadocs and improved example instructions (#13711) * Rearrange tests written only for update_on_kvstore = True (#13514) * Update test_gluon_trainer.py * Update test_gluon_trainer.py * test * Update mshadow to support batch_dot with fp16. (#13716) * fp16 dot * update mshadow * update mshadow * update mshadow * Fix the quantization script to support Python2 (#13700) * fix the quantization script to support python2 * Fix comments, fix similiar issue in imagenet_inference.py * ONNX test code cleanup (#13553) * ONNX test code cleanup * Make tests use the common test case list * Remove import test_cases * Make Gluon backend rep common * Partially enable broadcast tests * Common function to populate tests * Make backend common * test models * Test nodes * ONNX export: Test for fully connected * Edit CI scripts mxnet export test cleanup * Further cleanup backend tests * README * Some corrections * test case format for test_models * update social media section (#13705) * script for installing gpu libraries and build tools (#13646) * Port of scala infer package to clojure (#13595) * Port of scala infer package to clojure * Add inference examples * Fix project.clj * Update code for integration tests * Address comments and add unit tests * Add specs and simplify interface * Minor nit * Update README * update code owner (#13737) * AdamW operator (Fixing Weight Decay Regularization in Adam) (#13728) * tests * remove optimizer and move op to contrib * rename parameter * ONNX import/export: Add missing tests, ONNX export: LogSoftMax (#13654) * Logsoftmax, missing tests * Support multiple outputs in Gluon backendrep * Remove repeated unsqueeze test * Allow multiple output support * ONNX test code cleanup - part 2 (#13738) * Common test caller * Remove incorrect comment * Make corrections to CI * fix ci script * Update basic_layers.py (#13732) * ONNX import: Hardmax (#13717) * ONNX import: Hardmax * Fix lint errors * add github link for issue with reshape * gluon docfix (#13631) * Fixes for trainer with update_on_kvstore=False (#13721) * add clarification for param_dict * more tests for dist kvstore * more unittests * fix a bug * more dist exception test * revert optimizer list * fix bug and comment * fix doc rendering and lint * add invalid sched test * fix website * trigger * update doc * Reorder module import orders for dist-kvstore (#13742) * Reorder module import orders for dist-kvstore * more code comments * CMake: Enable installation of cpp-package headers (#13339) * Allow CMake based installation of cpp-package * Add installation of missing nnvm headers * Add documentation as to where public headers will be installed * disable error checking when building old versions (#13725) * Integrate MKLDNN Conv1d and support 3d layout (#13530) * add 3d layout support for MKLDNN Conv and Activation * fix lint * code refactor * add testcase for group1 conv and skip quantization for conv1d * fix lint * avoid conv1d quantization * code refactor and add activation ut * del todo * Making MKL-DNN default on MXNet master (#13681) * mkldnn is default makefile and explicitly turn off for buidls * add endif * retrigger * retrigger * build mkldnn as static lib * update makefile to statically build mkldnn * build static mkldnn * fix static name * fix static name * update static for mac * rename mkldnn dep in ci * remove moving mkldnn dynamic lib * retrigger * remove commented code * retrigger * remove mkldnn dnaymic for unitest * retrigger * retrigger * force static for mkldnn lib * turn of mkldnn on arm builds * remove dynamic mkldnn bind * update jenkins to use only mkldnn * remove last flag * turn mkldnn by default on mac * move mkldnn files for GPU MKLDNN build * copy lib mxnet in gpu build * only link windows * add mkldnn.mk * try force linking * retrigger * retrigger * remove mkldnn dynanmic check * use ifndef * remove test mkldnn install * fix spacing * fix index * remove cp of mkldnn since statically linked * add libmkldnn.a to list of files to pack * include mkl_ml * add mkldnn to pack * add libiomp to ci pack * move static libs * fix typo * pack mkldnn * retrigger * add linux artifacts * move libmkldnn in gpu cmake build * move libmkldnn and libiomp5 on gpu workspace * move linked files * fix typo * fix typo * add artifacts for tensorrt * move mkldnn lib in scala build * move mkldnn lib on cpu scala * create dir for binding * rename libmkldnn in scala * move mklml dep in scala builds * move mkl to another linked folder * move libmkl to another dir * add libmklml * move mkldnn * move mkldnn on centos * specify new dynamic path * retrigger * remove mkldnn dynamic lib * remove moving mkldnn artifact * add ld path * retrigger * Revert "remove moving mkldnn artifact" This reverts commit 16cca196e9e1ad92db74f4e8a01b3b052076d268. * Revert "remove mkldnn dynamic lib" This reverts commit d51043622d4ef7fcb95aff6a3e84d91ab71b48c9. * update makefile * Revert RPATH change and trigger CI * correcting use-mkldnn flags for two tests * mkldnn default on linux for starters * reverting naming rules of pack_lib * adding mkldnn=0 flags to centos non mkldnn builds * adding mkldnn=0 flags to ubuntu gpu non mkldnn builds * removing mkldnn binary operation for ubuntu gpu cmake non mkldnn build * removing mkldnn binary operation for centos non-mkldnn unittest * adding explicit USE_MKLDNN=0 flags for clang builds * adding explicit USE_MKLDNN=0 flags for cpu ubuntu builds * removing mkldnn binaries from non mkldnn builds scala gpu * adding explicit flag mkldnn=0 for tensorrt gpu build * adding explicit flag mkldnn=0 for ubuntu cmake asan * adding centos cpu mkldnn tests to CI * adding CentOS GPU MKLDNN build and unittest * not keeping mkldnn default for mac os * setting mkldnn default for x86_64 only * running docs with mkldnn=0 flag * removing CentOS CPU Scala MKLDNN test * setting mkldnn default for x86_64 only * not making mkldn default on windows * removing Centos MKLDNN tests from CI * retrigger * retrigger * retrigger * use relative links; update links (#13741) * [MXNET-1231] Allow not using Some in the Scala operators (#13619) * add initial commit * update image classifier as well * create Util class make Some conversion * add test changes * adress Comments * fix the spacing problem * fix generator base * change name to Option * fix bug in profiler tutorial when using cpu (#13695) try except approach only goes to ctx=mx.gpu() because test_utils.list_gpus() at least returns empty array and do not producing error * local docs build feature (#13682) * make ROIAlign support position-sensitive pooling (#13088) * make ROIAlign support position-sensitive pooling * add unittest for RoIAlign op * fix ccplint error * fix python3 compability for unittest * change OMP for better performance * delete blank line to trigger CI * add shape check when position_sensitive is true * fix the typo * typo: shuold -> should * remove private() clause in omp statement * add examples and fix the dependency problem (#13620) * add examples and fix the dependency problem * add Nightly run and optimized script * add explanation for the line * Update Adam optimizer documentation (#13754) * Less cudaGet/SetDevice calls in Gluon execution (#13764) * Remove unnecessary cudaGetDevice/cudaSetDevice calls * Fixes for the DeviceGuard * Retrigger CI * Fix for possible invalid device ordinal when using DeviceStore while driver is unloading * Fix for RTC when the driver API call is the first call * Added DeviceStore to pooled engine * Scope requests so it's not needed for dev_menu (#13771) * Fix USE_MKLDNN check in Makefile (#13775) * fix makefile * change make/config.mk * add comments * retrigger ci * fix c complier to clang (#13778) * Fixed mailing list addresses (#13766) * [MXNET-1255] update hybridize documentation (#13597) * update hybridize documentation * address review comments * improve doc * address comments * address comments * [MXNET-244] Work around likely compiler bug on nested inlines and temporary acces… (#13535) * Work around likely compiler bug on nested inlines and temporary access to stream * Don't compile khatri_rao tests if we don't have LAPACK * Address CR comment * Use curl to download sample data instead of wget. (#13761) * fix bipartite match memory corruption (#13727) * remove attributs clear on TRT nodes for GetOptimizedSymbol (#13703) * Add CPU test coverage and refine cmake builds (#13338) * add license (#13793) * [MXNET-862] Basic maven jenkins pipeline (#13450) * Jenkins Publish Nightly Maven Progress * Seperate Build, Test, and Deploy Stages with parallel * Re-organize Scala maven build (#13626) * Re-organize scala maven build 1. Automatically detect which platform to build for scala. 2. Remove platform dependend submodules 3. Fix cyclic module dependencies 4. Fix scalatype style check 5. Now mvn can be executed in submodule 6. Maven build can be executed from any directory not only in root project 7. Checkin javah header file, and use verify task to detect native API changes 8. Improve incremental build performance 9. Remove unittest and integration-test profile, use proper task instead 10. Delete generated scala file during maven clean. * Redo maven deploy related tasks. 1. Removed maven release plugin. 2. Make maven build friendly to CI, allow cli override version. 3. Moved gpg signing to deploy stage. 4. Created a separeated deploy module. 5. Updated Makefile to new maven build change. 6. Remove unused nexus-staging-plugin 7. Added nightly and staging profile for CI. * Support mkldnn for Scala. * Add extra header file to export for error checking (#13795) * add extra header file to include * fix sanity check * fix sanity * move c_api_common.h to include folder * fix build error * keep c_api_common.h internal * strip out error handling API into a separate header * consolidate comment into one paragraph per review * remove unnecessary include * fix redirection issues; set default version to master (#13796) * [MXNET-898] ONNX import/export: Sample_multinomial, ONNX export: GlobalLpPool, LpPool (#13500) * ONNX import/export: Sample_multinomial * ONNX export: GlobalLpPool, LpPool * Handle default p_value * Add tests for multinomial, lppool, globallppool * add a comment about shape test * whitelist symbols for using MXNet error handling externally (#13812) * fix for params with no dims in onnx (#13413) * fix for params with no dims * fix * fix * retrigger build * test added * retrigger CI * retrigger ci * Remove semicolon in libmxnet.sym file (#13822) * Remove semicolon in libmxnet.sym file * empty commit to trigger CI * Clojure example for fixed label-width captcha recognition (#13769) * Clojure example for fixed label-width captcha recognition * Update evaluation * Better training and inference (w/ cleanup) * Captcha generation for testing * Make simple test work * Add test and update README * Add missing consts file * Follow comments * Update LICENSE File with subcomponents (#13808) * Update LICENSE File with subcomponents * Fix JavaScript licenses * Dockerfiles for Publish Testing (#13707) * Add new Maven build for Scala package (#13819) * clean up build * fix minor issue and add mkldnn * fix mx_dist problem * fix clojure build * fix skip test * ONNX ops: norm exported and lpnormalization imported (#13806) * ReduceL1, l2 export, lpnormalization import added * fix * fix * fix * fix * remove useless code (#13777) * Fixing a symlink issue with R install (#13708) * fix minor indentation (#13827) * [MXNET-880] ONNX export: Random uniform, Random normal, MaxRoiPool (#13676) * ONNX export: Random uniform, Random normal * ONNX export: MaxRoiPool * tests for maxroipool, randomnormal, randomuniform * onnx export ops (#13821) * onnx export ops * retrigger ci * retrigger ci * fix * [MXNET-1260] Float64 DType computation support in Scala/Java (#13678) * Added Float64 as a supported datatype in NDArray * Added unit tests for Float64 in NDArray * Fix for failing Clojure unit tests * Added Float and Double as MX_PRIMITIVES for computation in Scala * Trying out second approach --> Private Impl methods with generic signature, and public methods calling the Impls * Fixed errors in *= method * Added Float64 in IO.scala and DataIter.scala * Added another testcase for IO.DataDesc creation * Fixed failing CI * Added Float64 in Predictor class * Added Float64 in Classifier class * Added Double as a possible return type to : classifyWithNDArray * Added unit tests for Classifier and Predictor.scala classes for Float64/Double * Approach 3 --> Using a trait to mirror Float and Double in Scala * Added comments on MX_PRIMITIVES.scala * Added Float64/Double support for inference in ImageClassifier APIs * Added unary- and compareTo in MX_NUMBER_LIKE * Renamed MX_NUMBER_LIKE to MX_PRIMITIVE_TYPE * Fixed linting issue * Now specifying dType from the available data in copyTo and MXDataIter.scala for creating a new DataIterator * Add primitives support handling to the generator for proper conversion * Reduced code duplication in classify method in Classifier.scala * Fix infer package for new signatures and address some bugs * Removed code duplication in getPixelsArray * remove debugging * Changed classifyWithNDArray method in Classifier.scala * Removed code duplication in predictImpl * Satisfying lint god _/\_ * Fixed failing PredictorSuite test * Renamed MX_FLOAT to Camel case * Revert "Renamed MX_FLOAT to Camel case" This reverts commit 9d7c3ce6f9c4d6ed2c46041a02e23c0f1df8dfe5. * Added an implicit conversion from int--> float to support int operations in NDArrays. (These ops were already supported in the previous versions) * Added Float64 as a training option to ImClassification Suite. Also added integration tests for it * Satisfy Lint God _/\_ * Added Float64 support in Java NDArray * Added Float64 support in Java's Predictor API * Added yours truly to the Contributors list * Added method comments on Predictor.predict with Array[Double] as a possible input * Added method comments explaining what MX_PRIMITIVE_TYPE is * Fixed errors cause by rebasing with master * Added licences to the files * [MXNET-1263] Unit Tests for Java Predictor and Object Detector APIs (#13794) * Added unit tests for Predictor API in Java * Added unit tests for ObjectDetectorOutput * Added unit tests for ObjectDetector API in Java * Addressed PR comments * Added Maven SureFire plugin to run the Java UTs * Pom file clean up -- moved surefire plugin to parent pom.xml * Renamed skipTests to SkipJavaTests * Fix scala doc build break for v1.3.1 (#13820) * Fix doc build break for v1.3.1 * ignore errors on v1.3.x during scala docs gen * Remove MXNET_STORAGE_FALLBACK_LOG_VERBOSE from test_autograd.py (#13830) * Add Local test stage and option to jump directly to menu item from commandline (#13809) * Removes unneeded nvidia driver ppa installation (#13814) * Improve license_header tool by only traversing files under revision c… (#13803) * Improve license_header tool by only traversing files under revision control * use HEAD instead of master for CI * Disabled flaky test (#13758) * change to compile time (#13835) * fix Makefile for rpkg (#13590) * fix Makefile for rpkg * update R and roxygen2 requirements * add roxygen requirement * add roxygen requirement * [CI] Prevent timeouts when rebuilding containers with docker. (#13818) * Prevent timeouts when rebuilding containers with docker. Increase timeout from 120 to 180 for pipelines * Increase docker cache timeout * Increase timeout also for docs * limit parallel builds to 10 * Code modification for testcases of various network models in directory example (#12498) * example testcase modified * rcnn file add * license add * license init * CI test trigger * rcnn modify give up * trigger * modify for better user experience * change the default parameter to xpu=None * Update bdk_demo.py * Update fcn_xs.py * Update test.py * Update train.py * Update bdk_demo.py * Update bdk_demo.py * modify review comments * refine * modify Readmes according to the changed code. * finetune READMEs * re-trigger ci * re-trigger ci twice * Add copyrights for third party licenses to license file (#13851) * Fix Tree Reduction on new instance type p3dn.24xlarge (#13852) * add fallback for gpu topology detection using CUDA 9.2 * add fallback for gpu topology detection using CUDA 9.2 * add log * update 3rdparty to master * add fallback for gpu topology detection using CUDA 9.2 * add log * update 3rdparty to master * bring 3rdparty packages to upstream/master * rebase to master * Update gpu_topology.h * [Clojure] package infer tweaks (#13864) * change object detection prediction to be a map * change predictions to a map for image-classifiers * change return types of the classifiers to be a map - add tests for base classifier and with-ndarray as well * tweak return types and inputs for predict - add test for plain predict * updated infer-classify examples * adjust the infer/object detections tests * tweak predictor test * Feedback from @kedarbellare review * put scaling back in * put back predict so it can handle multiple inputs * restore original functions signatures (remove first) * Modifying clojure CNN text classification example (#13865) * Modifying clojure CNN text classification example * Small fixes * Another minor fix * adding tolerance to flaky test (#13850) * adding tolerance * retrigger ci * retrigger ci * Julia v0.7/1.0 support and drop v0.6 support (#12845) * Fix cpp examples build on Mac. (#13826) This is a regression of addning @rpath name to libmxnet.so on Mac, example executable is not able to find libmxnet.so anymore. Add @rpath search path to fix this issue. * Fix launch bounds in spatial transformer (#13188) * Fix launch bounds in spatial transformer * Adding explanation in comment. * Update example scripts classpath. (#13849) * [MXNET-1177]Adding Scala Demo to be run as a part of Nightly CI (#13823) * Adding Scala Demo to be run as a part of Nightly CI * Addressed PR feedback : making a profile to fetch nightly jars only on CI * Changed name from scalacidemo to scala_ci_demo * Synchronized the scala-demo and java-demo for nightly CI runs * Pruned the maven command to simply maven install * changed running from ./.sh to bash .sh to be consistent * Add CODEOWNERS for Julia package (#13872) * fix ssd quantization script error (#13843) * fix ssd quantization script error * update readme for ssd * move quantized SSD instructions from quantization/README.md to ssd/README.md * update ssd readme and accuracy * update readme for SSD-vGG16 * Rename to avoid merge conflict with upstream. * Update submodule versions. - update mkldnn and mshadow to version used by upstream master - update ngraph-mxnet-bridge to current master Renames nGraph README to follow MXnet conventions. * Fix merge error for nGraph support in CMakeLists.txt * Fixes CMake file error.

* fix link for gluon model zoo (#13583) * Fix exception handling api doc (#13519) * Fix exception handling api doc * Update waitall api doc Co-Authored-By: anirudh2290 <anirudh2290@apache.org> * add cpp example inception to nightly test (#13534) * add inception test * fix max iter for mlp * rename and add comment * rename epoch num * Add notes about debug with libstdc++ symbols (#13533) * Add imresize and copyMakeBorder to mx.image (#13357) * Add imresize API to docs * address comments * copyMakeBorder * [MXNET-1253] fix control_flow_op (#13555) * fix control_flow_op * change type for M * add test for sparse where op * Add Intel MKL blas to Jenkins (#13607) * add mkl blas to Jenkins * add mkl install script * fix bug in mkl script * remove python2 ut and add cpu-mkl node * #13385 [Clojure] - Turn examples into integration tests (#13554) * fix the Float not showing correctly problem (#13617) Merge this PR for 1.4.x * [MXNET-1155] Add scala packageTest utility (#13046) * [MXNET-1155] Add scala packageTest utility * Clean up utility * Safe change directory in Makefile for scala * mvn install file instructions with details * [MXNET-1224]: improve scala maven jni build and packing. (#13493) Major JNI feature changes. Please find more info here: https://cwiki.apache.org/confluence/display/MXNET/Scala+maven+build+improvement * [MXNET-1225] Always use config.mk in make install instructions (#13364) * Always use config.mk in make install instructions * Specify Cuda 0 for ubuntu with mkldnn * Scala install doc avoid build_from_source Minor doc fixes * Fix build_from_source CMake usage * CPP Install Instruction with CMake * Use cmake out of source build * Fix warning in waitall doc (#13618) * Optimize C++ API (#13496) * Optimize C++ API Pass parameter with reference instead of value. Add const as well as it is not changed. * fix docs/architecture/overview.md Fix BinaryShapeFunction typedef Add a right brace for SmoothL1Shape_ * fix quantize pass error when the quantization supported Op are excluded in the model (#13596) * Scripts for building dependency libraries of MXNet (#13282) * openblas script * ps-lite dependencies * USE_S3 dependencies * image libraries * license * add batch norm test (#13625) * add batch norm test * fix formatting * use out_arr as input * fix typo * remove const * use ptr * eval ptr * Set install path for libmxnet.so dynamic lib on Mac OS (#13629) * Fix the bug of BidirectionalCell (#13575) * Fix the bug of BidirectionalCell I did hybridize( ) and pass "valid_length" to the unroll( ) function of BidirectionalCell, then returned AssertionError in line 79. Because symbol.split( ) return a symbol but not a symbol list. Result in the length of inputs dont equal parameter "length" when call unroll( ) to compute r_outputs and r_states. * add a test for BidirectionalCell * Fix the bug of BidirectionalCell I did hybridize( ) and pass "valid_length" to the unroll( ) function of BidirectionalCell, then returned AssertionError in line 79. Because symbol.split( ) return a symbol but not a symbol list. Result in the length of inputs dont equal parameter "length" when call unroll( ) to compute r_outputs and r_states. * fix test_bidirectional_unroll_valid_length( ) Fix the error of parameter. * Fix the bug of BidirectionalCell I did hybridize( ) and pass "valid_length" to the unroll( ) function of BidirectionalCell, then returned AssertionError in line 79. Because symbol.split( ) return a symbol but not a symbol list. Result in the length of inputs dont equal parameter "length" when call unroll( ) to compute r_outputs and r_states. * fix test_bidirectional_unroll_valid_length( ) * Feature/mkldnn static (#13628) * Revert "Revert "Feature/mkldnn static 2 (#13503)" (#13540)" This reverts commit a3eca5f5c96eed0bc29bd4e58e470997091a1fb3. * include headers on mkldnn lib * retrigger * retrigger * build config for maven and pip (#13556) * config for pip * symbol whitelist * maven build config * Fix for import mxnet taking long time if multiple process launched (#13602) * /~https://github.com/apache/incubator-mxnet/issues/12255 doing import mxnet in multiple processes take very long time. Details : #12255 One of the reason we have OMP tuning code which iterates to find OMP tune overhead. We are reducing this iteration count to reduce the overehead of tuning code. Also, We added an environment variable where users can set the number of cores that should be used to determine tuning. * cpplint fix * Adding new environment variable: MXNET_USE_NUM_CORES_OPERATOR_TUNING to doc * fixing formatting in doc * Add reshape op supported by MKL-DNN (#12980) * Add reshape op supported by MKL-DNN * fix build issue * fix lint * fix lint * fix lint * fix lint * fix lint * fix lint * fix white space * add unit test * merge if blocks * Improve dev_menu usability, local build and virtualenv (#13529) * Improve dev_menu, add build command and virtualenv creation with local builds for easy testing * Update dev_menu.py Co-Authored-By: larroy <pedro.larroy.lists@gmail.com> * Cuda off by default, use ccache * address CR * [Clojure] Correct the versions in the README so they correspond to the latest maven.org release (#13507) * Correct the versions so they correspond to the latest maven.org release * trigger build * feedback from @kohr-h * Optimization of metric evaluation (#13471) * Change argsort to argpartition * Global statistics in metrics * Fix lint * Fixes from review * Trigger * Fixes from review, fix to F1, MCC and perplexity metrics, added test for global stats * Fix lint * Fix compatibility with Python 2 * Revert "Feature/mkldnn static (#13628)" (#13638) This reverts commit 5bcf2bd6e8b48fa27bfcfdafd06401ec2d28978b. * support mkl log when dtype is fp32 or fp64 (#13150) * support mkl log when dtype is fp32 or fp64 * remove macro * ensure data size less than or equal MKL_INT_MAX * code specification * fix indent * for retrigger * [MXNET-1209] Tutorial transpose reshape (#13208) * transpose tutorial * Adding Anirudhs comments * Update tutorial with some more examples * Adding links * Fixing the links, adding more examples * Update reshape_transpose.md * Fixing spelling mistakes * Updating image resolution * Adding Simon's comments * Small fixes * Update reshape_transpose.md * Update reshape_transpose.md * empty commit * empty commit * updated reference to Apache MXNet (#13645) * Complimentary gluon DataLoader improvements (#13606) * init * add tests * doc * lint * fix openmp * Improve CCache handling (#13456) * Remove gitignore entries * Modify Makefile * Modify user permissions * Add new ccache wrapper function * Change PATH rewrite to a different one to resolve CUDA issues * Add ccache to gpu cmake * Enable ccache for every build * Set permissions for arm dockerfiles * Disable ccache for ASAN * Remove g++-8 ccache redirect * Update Android Dockerfiles for user permissions * Fix ASAN compiler typo * Remove sanity for speed * Move build dir creation in android armv8 * Revert "Remove sanity for speed" This reverts commit e8386a774dafe96337930b9cac36cb24fc36585e. * Add ccache for NVCC in Makefile * [MXNET-918] Random module (#13039) * introduce random API * revert useless changes * shorter types in APIDoc gen code * fix after merge from master * Trigger CI * temp code / diag on CI * cleanup type-class code * cleanup type-class code * fix scalastyle * Fix incorrect delete in MXExecutorReshape exception handling (#13376) * Fix bad delete. Delete the pointed-to handle on cleanup, not the location of the handle itself. Also don't delete it if we didn't set it in the first place. * Remove unusued 'exec' var from MXExecutorBindEX. * [MXNET-1251] Basic configuration to do static-linking (#13621) * Basic configuration to do static-linking * update build script and place it in the install part * clean up the code further * revert maven into build-from-source * add curl to deps * [MXNET-1195] Cleanup Scala README file (#13582) * Updated the Scala-Readme with upto-date information * Updated the header * Removed redundant build status * Minor formatting changes * Addressed the PR feedback * Added section on Scala training APIs * Removed mention of deprecated Model API * scripts for building libmxnet binary and wheel (#13648) * add script for making all dependencies * tools for building pip package * build scripts for lib and wheel * [MXNET-1083] Add the example to demonstrate the inference workflow using C++ API (#13294) * [MXNET-1083] Add the example to demonstrate the inference workflow using C++ API * [MXNET-1083] Add the example to demonstrate the inference workflow using C++ API * Updated the code to address the review comments. * Added the README file for the folder. * Addressed the review comments * Addressed the review comments to use argmax and default mean values. * Update MKLDNN_README.md (#13653) * Support Quantized Fully Connected by INT8 GEMM (#12922) * add quantized fully connect support * disable qfc cpu case since s8u8s32 is only supported by MKL BLAS library * retrigger to ci testing * move implementation to cc file and add STORAGE_TYPE_ASSIGN_CHECK * fix typo bug * retrigger the ci test * fix typo bug * retrigger ci * retrigger the ci test * retrigger the ci * retrigger the ci test * retrigger ci test * fix indent issue * retrigger the ci * retrigger the ci test * add verbose message * update log message * using range for loop * using for auto range * enable MKL BLAS ci test * fix typo issue * use TYPE_ASSIGN_CHECK * retrigger the ci * add build fix for Scala/Java build (#13655) * Fix Jetson compilation (#13532) * remove omp which can cause ssd accuracy variance (#13622) * Revert "[MXNET-43] Fix Jetson compilation" (#13665) * Revert "remove omp which can cause ssd accuracy variance (#13622)" This reverts commit 655f1c6f7a0706dd622f73db9af2e6df895ca213. * Revert "Fix Jetson compilation (#13532)" This reverts commit 48e25c4cae355753dd96ea7afe004bf78e0719e4. * Fix Jetson compilation (#13666) * turn on Sphinx warnings as errors (#13544) * turn on warnings as errors * move warnings as error logic to build_all_version * fix typo in comment * add warning as error option for docs pipeline * bump ci to test again; use this chance to add notes on this feature * fix bugs in image.py docs * Update CODEOWNERS, add Pedro Larroy. (#13579) * Revert "Revert "[MXNET-43] Fix Jetson compilation" (#13665)" (#13672) This reverts commit 3433776dac7be75928082bbc1d552fca248fb8e8. * Accelerate DGL csr neighbor sampling (#13588) * Speedup and fix bug in dgl_csr_sampling op * Update dgl_graph.cc * simplify functions. * avoid adding nodes in the last level in the queue. * remove a hashtable lookup in neigh_pos. * reduce a hashtable lookup in sub_ver_mp. * merge copying vids and layers. * reduce hashtable lookup when writing to output csr. * fix a bug. * limit the number of sampled vertices. * fix lint. * fix a compile error. * fix compile error. * fix compile. * remove one hashtable lookup per vertex and hashtable iteration. * remove queue. * use vector for neigh_pos. * fix lint * avoid init output arrays. * fix tests. * fix tests. * update docs. * retrigger * retrigger * [MXNET-1252][1 of 2] Decouple NNVM to ONNX from NNVM to TenosrRT conversion (#13659) * fix unpicklable transform_first on windows (#13686) * Move the debug output message into MXNET_MKLDNN_DEBUG (#13662) * NEWS.md backport from v1.4.x to master (#13693) * merge NEWS.md from 1.4.x to master * NEWS.md backport from v1.4.x to master * Fallback to dense version for grad(reshape), grad(expand_dims) (#13599) * fallback to dense version for grad(reshape), grad(expand_dims) * add _backward_reshape gpu version * reshape test case comments * fix gpu test * remove mkldnn support for _backward_reshape * ONNX export: Add Flatten before Gemm (#13356) * Add Flatten before Gemm * ONNX export test: Allow multiple inputs in forward pass * ONNX export: Test for fully connected * [MXNET-1164] Generate the document for cpp-package using Doxygen (#12977) * Adding cpp-package directory to the Doxyfile. Updating the index.md file in c++ api directory. * Updating the link to classes in C++ API to point to correct html file. * Updated the links to use relative paths. * Removed the extra slash character in the url * Excluded the 3rdparty folder as per the review comment. * Update git clone location to apache github (#13706) * Add timeout/retry logic to docker cache download (#13573) * Added timeout/retry (linear backoff) to docker cache download * Units changed, as time.sleep takes seconds as argument * Improved error handling * Using retry decorator * Added retry decorator to _login_dockerhub method * Fixed wrong import * Fix NDArray ToDLPack Bug (#13698) * Added javadocs and improved example instructions (#13711) * Rearrange tests written only for update_on_kvstore = True (#13514) * Update test_gluon_trainer.py * Update test_gluon_trainer.py * test * Update mshadow to support batch_dot with fp16. (#13716) * fp16 dot * update mshadow * update mshadow * update mshadow * Fix the quantization script to support Python2 (#13700) * fix the quantization script to support python2 * Fix comments, fix similiar issue in imagenet_inference.py * ONNX test code cleanup (#13553) * ONNX test code cleanup * Make tests use the common test case list * Remove import test_cases * Make Gluon backend rep common * Partially enable broadcast tests * Common function to populate tests * Make backend common * test models * Test nodes * ONNX export: Test for fully connected * Edit CI scripts mxnet export test cleanup * Further cleanup backend tests * README * Some corrections * test case format for test_models * update social media section (#13705) * script for installing gpu libraries and build tools (#13646) * Port of scala infer package to clojure (#13595) * Port of scala infer package to clojure * Add inference examples * Fix project.clj * Update code for integration tests * Address comments and add unit tests * Add specs and simplify interface * Minor nit * Update README * update code owner (#13737) * AdamW operator (Fixing Weight Decay Regularization in Adam) (#13728) * tests * remove optimizer and move op to contrib * rename parameter * ONNX import/export: Add missing tests, ONNX export: LogSoftMax (#13654) * Logsoftmax, missing tests * Support multiple outputs in Gluon backendrep * Remove repeated unsqueeze test * Allow multiple output support * ONNX test code cleanup - part 2 (#13738) * Common test caller * Remove incorrect comment * Make corrections to CI * fix ci script * Update basic_layers.py (#13732) * ONNX import: Hardmax (#13717) * ONNX import: Hardmax * Fix lint errors * add github link for issue with reshape * gluon docfix (#13631) * Fixes for trainer with update_on_kvstore=False (#13721) * add clarification for param_dict * more tests for dist kvstore * more unittests * fix a bug * more dist exception test * revert optimizer list * fix bug and comment * fix doc rendering and lint * add invalid sched test * fix website * trigger * update doc * Reorder module import orders for dist-kvstore (#13742) * Reorder module import orders for dist-kvstore * more code comments * CMake: Enable installation of cpp-package headers (#13339) * Allow CMake based installation of cpp-package * Add installation of missing nnvm headers * Add documentation as to where public headers will be installed * disable error checking when building old versions (#13725) * Integrate MKLDNN Conv1d and support 3d layout (#13530) * add 3d layout support for MKLDNN Conv and Activation * fix lint * code refactor * add testcase for group1 conv and skip quantization for conv1d * fix lint * avoid conv1d quantization * code refactor and add activation ut * del todo * Making MKL-DNN default on MXNet master (#13681) * mkldnn is default makefile and explicitly turn off for buidls * add endif * retrigger * retrigger * build mkldnn as static lib * update makefile to statically build mkldnn * build static mkldnn * fix static name * fix static name * update static for mac * rename mkldnn dep in ci * remove moving mkldnn dynamic lib * retrigger * remove commented code * retrigger * remove mkldnn dnaymic for unitest * retrigger * retrigger * force static for mkldnn lib * turn of mkldnn on arm builds * remove dynamic mkldnn bind * update jenkins to use only mkldnn * remove last flag * turn mkldnn by default on mac * move mkldnn files for GPU MKLDNN build * copy lib mxnet in gpu build * only link windows * add mkldnn.mk * try force linking * retrigger * retrigger * remove mkldnn dynanmic check * use ifndef * remove test mkldnn install * fix spacing * fix index * remove cp of mkldnn since statically linked * add libmkldnn.a to list of files to pack * include mkl_ml * add mkldnn to pack * add libiomp to ci pack * move static libs * fix typo * pack mkldnn * retrigger * add linux artifacts * move libmkldnn in gpu cmake build * move libmkldnn and libiomp5 on gpu workspace * move linked files * fix typo * fix typo * add artifacts for tensorrt * move mkldnn lib in scala build * move mkldnn lib on cpu scala * create dir for binding * rename libmkldnn in scala * move mklml dep in scala builds * move mkl to another linked folder * move libmkl to another dir * add libmklml * move mkldnn * move mkldnn on centos * specify new dynamic path * retrigger * remove mkldnn dynamic lib * remove moving mkldnn artifact * add ld path * retrigger * Revert "remove moving mkldnn artifact" This reverts commit 16cca196e9e1ad92db74f4e8a01b3b052076d268. * Revert "remove mkldnn dynamic lib" This reverts commit d51043622d4ef7fcb95aff6a3e84d91ab71b48c9. * update makefile * Revert RPATH change and trigger CI * correcting use-mkldnn flags for two tests * mkldnn default on linux for starters * reverting naming rules of pack_lib * adding mkldnn=0 flags to centos non mkldnn builds * adding mkldnn=0 flags to ubuntu gpu non mkldnn builds * removing mkldnn binary operation for ubuntu gpu cmake non mkldnn build * removing mkldnn binary operation for centos non-mkldnn unittest * adding explicit USE_MKLDNN=0 flags for clang builds * adding explicit USE_MKLDNN=0 flags for cpu ubuntu builds * removing mkldnn binaries from non mkldnn builds scala gpu * adding explicit flag mkldnn=0 for tensorrt gpu build * adding explicit flag mkldnn=0 for ubuntu cmake asan * adding centos cpu mkldnn tests to CI * adding CentOS GPU MKLDNN build and unittest * not keeping mkldnn default for mac os * setting mkldnn default for x86_64 only * running docs with mkldnn=0 flag * removing CentOS CPU Scala MKLDNN test * setting mkldnn default for x86_64 only * not making mkldn default on windows * removing Centos MKLDNN tests from CI * retrigger * retrigger * retrigger * use relative links; update links (#13741) * [MXNET-1231] Allow not using Some in the Scala operators (#13619) * add initial commit * update image classifier as well * create Util class make Some conversion * add test changes * adress Comments * fix the spacing problem * fix generator base * change name to Option * fix bug in profiler tutorial when using cpu (#13695) try except approach only goes to ctx=mx.gpu() because test_utils.list_gpus() at least returns empty array and do not producing error * local docs build feature (#13682) * make ROIAlign support position-sensitive pooling (#13088) * make ROIAlign support position-sensitive pooling * add unittest for RoIAlign op * fix ccplint error * fix python3 compability for unittest * change OMP for better performance * delete blank line to trigger CI * add shape check when position_sensitive is true * fix the typo * typo: shuold -> should * remove private() clause in omp statement * add examples and fix the dependency problem (#13620) * add examples and fix the dependency problem * add Nightly run and optimized script * add explanation for the line * Update Adam optimizer documentation (#13754) * Less cudaGet/SetDevice calls in Gluon execution (#13764) * Remove unnecessary cudaGetDevice/cudaSetDevice calls * Fixes for the DeviceGuard * Retrigger CI * Fix for possible invalid device ordinal when using DeviceStore while driver is unloading * Fix for RTC when the driver API call is the first call * Added DeviceStore to pooled engine * Scope requests so it's not needed for dev_menu (#13771) * Fix USE_MKLDNN check in Makefile (#13775) * fix makefile * change make/config.mk * add comments * retrigger ci * fix c complier to clang (#13778) * Fixed mailing list addresses (#13766) * [MXNET-1255] update hybridize documentation (#13597) * update hybridize documentation * address review comments * improve doc * address comments * address comments * [MXNET-244] Work around likely compiler bug on nested inlines and temporary acces… (#13535) * Work around likely compiler bug on nested inlines and temporary access to stream * Don't compile khatri_rao tests if we don't have LAPACK * Address CR comment * Use curl to download sample data instead of wget. (#13761) * fix bipartite match memory corruption (#13727) * remove attributs clear on TRT nodes for GetOptimizedSymbol (#13703) * Add CPU test coverage and refine cmake builds (#13338) * add license (#13793) * [MXNET-862] Basic maven jenkins pipeline (#13450) * Jenkins Publish Nightly Maven Progress * Seperate Build, Test, and Deploy Stages with parallel * Re-organize Scala maven build (#13626) * Re-organize scala maven build 1. Automatically detect which platform to build for scala. 2. Remove platform dependend submodules 3. Fix cyclic module dependencies 4. Fix scalatype style check 5. Now mvn can be executed in submodule 6. Maven build can be executed from any directory not only in root project 7. Checkin javah header file, and use verify task to detect native API changes 8. Improve incremental build performance 9. Remove unittest and integration-test profile, use proper task instead 10. Delete generated scala file during maven clean. * Redo maven deploy related tasks. 1. Removed maven release plugin. 2. Make maven build friendly to CI, allow cli override version. 3. Moved gpg signing to deploy stage. 4. Created a separeated deploy module. 5. Updated Makefile to new maven build change. 6. Remove unused nexus-staging-plugin 7. Added nightly and staging profile for CI. * Support mkldnn for Scala. * Add extra header file to export for error checking (#13795) * add extra header file to include * fix sanity check * fix sanity * move c_api_common.h to include folder * fix build error * keep c_api_common.h internal * strip out error handling API into a separate header * consolidate comment into one paragraph per review * remove unnecessary include * fix redirection issues; set default version to master (#13796) * [MXNET-898] ONNX import/export: Sample_multinomial, ONNX export: GlobalLpPool, LpPool (#13500) * ONNX import/export: Sample_multinomial * ONNX export: GlobalLpPool, LpPool * Handle default p_value * Add tests for multinomial, lppool, globallppool * add a comment about shape test * whitelist symbols for using MXNet error handling externally (#13812) * fix for params with no dims in onnx (#13413) * fix for params with no dims * fix * fix * retrigger build * test added * retrigger CI * retrigger ci * Remove semicolon in libmxnet.sym file (#13822) * Remove semicolon in libmxnet.sym file * empty commit to trigger CI * Clojure example for fixed label-width captcha recognition (#13769) * Clojure example for fixed label-width captcha recognition * Update evaluation * Better training and inference (w/ cleanup) * Captcha generation for testing * Make simple test work * Add test and update README * Add missing consts file * Follow comments * Update LICENSE File with subcomponents (#13808) * Update LICENSE File with subcomponents * Fix JavaScript licenses * Dockerfiles for Publish Testing (#13707) * Add new Maven build for Scala package (#13819) * clean up build * fix minor issue and add mkldnn * fix mx_dist problem * fix clojure build * fix skip test * ONNX ops: norm exported and lpnormalization imported (#13806) * ReduceL1, l2 export, lpnormalization import added * fix * fix * fix * fix * remove useless code (#13777) * Fixing a symlink issue with R install (#13708) * fix minor indentation (#13827) * [MXNET-880] ONNX export: Random uniform, Random normal, MaxRoiPool (#13676) * ONNX export: Random uniform, Random normal * ONNX export: MaxRoiPool * tests for maxroipool, randomnormal, randomuniform * onnx export ops (#13821) * onnx export ops * retrigger ci * retrigger ci * fix * [MXNET-1260] Float64 DType computation support in Scala/Java (#13678) * Added Float64 as a supported datatype in NDArray * Added unit tests for Float64 in NDArray * Fix for failing Clojure unit tests * Added Float and Double as MX_PRIMITIVES for computation in Scala * Trying out second approach --> Private Impl methods with generic signature, and public methods calling the Impls * Fixed errors in *= method * Added Float64 in IO.scala and DataIter.scala * Added another testcase for IO.DataDesc creation * Fixed failing CI * Added Float64 in Predictor class * Added Float64 in Classifier class * Added Double as a possible return type to : classifyWithNDArray * Added unit tests for Classifier and Predictor.scala classes for Float64/Double * Approach 3 --> Using a trait to mirror Float and Double in Scala * Added comments on MX_PRIMITIVES.scala * Added Float64/Double support for inference in ImageClassifier APIs * Added unary- and compareTo in MX_NUMBER_LIKE * Renamed MX_NUMBER_LIKE to MX_PRIMITIVE_TYPE * Fixed linting issue * Now specifying dType from the available data in copyTo and MXDataIter.scala for creating a new DataIterator * Add primitives support handling to the generator for proper conversion * Reduced code duplication in classify method in Classifier.scala * Fix infer package for new signatures and address some bugs * Removed code duplication in getPixelsArray * remove debugging * Changed classifyWithNDArray method in Classifier.scala * Removed code duplication in predictImpl * Satisfying lint god _/\_ * Fixed failing PredictorSuite test * Renamed MX_FLOAT to Camel case * Revert "Renamed MX_FLOAT to Camel case" This reverts commit 9d7c3ce6f9c4d6ed2c46041a02e23c0f1df8dfe5. * Added an implicit conversion from int--> float to support int operations in NDArrays. (These ops were already supported in the previous versions) * Added Float64 as a training option to ImClassification Suite. Also added integration tests for it * Satisfy Lint God _/\_ * Added Float64 support in Java NDArray * Added Float64 support in Java's Predictor API * Added yours truly to the Contributors list * Added method comments on Predictor.predict with Array[Double] as a possible input * Added method comments explaining what MX_PRIMITIVE_TYPE is * Fixed errors cause by rebasing with master * Added licences to the files * [MXNET-1263] Unit Tests for Java Predictor and Object Detector APIs (#13794) * Added unit tests for Predictor API in Java * Added unit tests for ObjectDetectorOutput * Added unit tests for ObjectDetector API in Java * Addressed PR comments * Added Maven SureFire plugin to run the Java UTs * Pom file clean up -- moved surefire plugin to parent pom.xml * Renamed skipTests to SkipJavaTests * Fix scala doc build break for v1.3.1 (#13820) * Fix doc build break for v1.3.1 * ignore errors on v1.3.x during scala docs gen * Remove MXNET_STORAGE_FALLBACK_LOG_VERBOSE from test_autograd.py (#13830) * Add Local test stage and option to jump directly to menu item from commandline (#13809) * Removes unneeded nvidia driver ppa installation (#13814) * Improve license_header tool by only traversing files under revision c… (#13803) * Improve license_header tool by only traversing files under revision control * use HEAD instead of master for CI * Disabled flaky test (#13758) * change to compile time (#13835) * fix Makefile for rpkg (#13590) * fix Makefile for rpkg * update R and roxygen2 requirements * add roxygen requirement * add roxygen requirement * [CI] Prevent timeouts when rebuilding containers with docker. (#13818) * Prevent timeouts when rebuilding containers with docker. Increase timeout from 120 to 180 for pipelines * Increase docker cache timeout * Increase timeout also for docs * limit parallel builds to 10 * Code modification for testcases of various network models in directory example (#12498) * example testcase modified * rcnn file add * license add * license init * CI test trigger * rcnn modify give up * trigger * modify for better user experience * change the default parameter to xpu=None * Update bdk_demo.py * Update fcn_xs.py * Update test.py * Update train.py * Update bdk_demo.py * Update bdk_demo.py * modify review comments * refine * modify Readmes according to the changed code. * finetune READMEs * re-trigger ci * re-trigger ci twice * Add copyrights for third party licenses to license file (#13851) * Fix Tree Reduction on new instance type p3dn.24xlarge (#13852) * add fallback for gpu topology detection using CUDA 9.2 * add fallback for gpu topology detection using CUDA 9.2 * add log * update 3rdparty to master * add fallback for gpu topology detection using CUDA 9.2 * add log * update 3rdparty to master * bring 3rdparty packages to upstream/master * rebase to master * Update gpu_topology.h * [Clojure] package infer tweaks (#13864) * change object detection prediction to be a map * change predictions to a map for image-classifiers * change return types of the classifiers to be a map - add tests for base classifier and with-ndarray as well * tweak return types and inputs for predict - add test for plain predict * updated infer-classify examples * adjust the infer/object detections tests * tweak predictor test * Feedback from @kedarbellare review * put scaling back in * put back predict so it can handle multiple inputs * restore original functions signatures (remove first) * Modifying clojure CNN text classification example (#13865) * Modifying clojure CNN text classification example * Small fixes * Another minor fix * adding tolerance to flaky test (#13850) * adding tolerance * retrigger ci * retrigger ci * Julia v0.7/1.0 support and drop v0.6 support (#12845) * Fix cpp examples build on Mac. (#13826) This is a regression of addning @rpath name to libmxnet.so on Mac, example executable is not able to find libmxnet.so anymore. Add @rpath search path to fix this issue. * Fix launch bounds in spatial transformer (#13188) * Fix launch bounds in spatial transformer * Adding explanation in comment. * Update example scripts classpath. (#13849) * [MXNET-1177]Adding Scala Demo to be run as a part of Nightly CI (#13823) * Adding Scala Demo to be run as a part of Nightly CI * Addressed PR feedback : making a profile to fetch nightly jars only on CI * Changed name from scalacidemo to scala_ci_demo * Synchronized the scala-demo and java-demo for nightly CI runs * Pruned the maven command to simply maven install * changed running from ./.sh to bash .sh to be consistent * Add CODEOWNERS for Julia package (#13872) * fix ssd quantization script error (#13843) * fix ssd quantization script error * update readme for ssd * move quantized SSD instructions from quantization/README.md to ssd/README.md * update ssd readme and accuracy * update readme for SSD-vGG16 * Fix permissions of ci/docker/install/ubuntu_publish.sh (#13840) * Avoid adding SegfaultLogger if process already has sig handler. (#13842) In current implemenation, we override signal handler regardless if MXNET_USE_SIGNAL_HANDLER=1. This breaks caller process behavior and cause process exit unexpectedly. The example use case is libmxnet.so is loadded into java process via JNI or JNA. JVM will crash due to SegfaultLogger. In this PR, we will not register SegfaultLogger if there is a signal handler registered. * fix the fetching GPU problem (#13889) * Fix SN-GAN example doc (#13877) * fix the wrong argument * fix broken link * update Spectral Normalization Code (#13868) * update sn_code * update sn_code * Temporarily disable website testing (#13887) * Fixed java benchmark failing error by fixing the classpath (#13891) * Jenkins nightly maven with static build script and gpu (#13767) * Added logging to GitHub commit status publishing (#13615) * Add a test for SGLD optimizer with comparisons for set noise seeds. (#13762) * [MXNET-703] Update to TensorRT 5, ONNX IR 3. Fix inference bugs. (#13310) * [MXNET-703] Install CUDA 10 compatible cmake This works around a CUDA 10 cmake issue documented here: /~https://github.com/clab/dynet/issues/1457 This fix is temporary; once an updated cmake package is published to Ubuntu's package repo it may be reverted. * [MXNET-703] Update to TensorRT 5 ONNX IR 3. Fix inference bugs. * [MXNET-703] Describe onnx opsets and major version * Fix the order of error term's operands (#13745) * fix the order of error term's operands * address comments * Add mkldnn OP for slice (#13730) * add mkldnn slice * fix lint * fix lint * mv SliceEx to matrix_op.cc * fix lint * optimize dispatch_mode * retrigger ci * fix indent * fix bug in nag optimizer (#13683) * fix bug in nag optimizer ``` grad += wd * weight mom[:] += grad grad[:] += self.momentum * mom weight[:] += -lr * grad ``` This will minus wd*weight twice, but in`state = momentum * state + grad + wd * weight weight = weight - (lr * (grad + momentum * state)) ` only minus once. * fix bug in nag test fix bug in nag test * rewrite nag test * rewrite nag * fix nag with in-place operations * fix nag with in-place operations * #13813 examples with opencv4/origami (#13813) * Fix BatchNorm converter for CoreML when fix_gamma=True (#13557) * beta doc fixes (#13860) * Update profiler doc (#13901) * Update c_api_profile.cc * Update c_api_profile.cc * Fix for test always returning true (#13911) * Add error checking for cpp examples. (#13828) * add ccache to docs build (#13832) * Java install info update (#13912) * updated java dependency * update to duplicated java cpu * java gpu update * Updated java dependency version information * Static build instruction for MXNet in general (#13914) * update scripts and tutorial * add the static test for scala package * kill publish test * fix build issue * address comments * julia: fix `argmax` for NDArray (#13871) - fix 0-based index output to 1-based index close #13786 * Support populating errors back to MXNet engine in callback (#13922) * add an optional error_msg in engine on_complete callbcak * use dmlc::Error struct to make error population extendable * Fix document build (#13927) * fix doc build * Revert "Temporarily disable website testing (#13887)" This reverts commit 9d4281271c871a938f1ac4ee55b218872031963d. * test_ImageRecordIter_seed_augmentation flaky test fix (#12485) * Moves seed_aug parameter to ImageRecParserParam and re-seeds RNG before each augmentation to guarantee reproducibilit * Update image record iterator tests to check the whole iterator not only first image * Version switching user experience improvements (#13921) * fix version switching for anchors and search * improved redirects * fix bug for dev previews; remove hardcoded protocol * Julia: fix filename quoting in docstring (#13894) Quoting filename with backticks to prevent markdown mis-rendering some of them with underscore. * disable default MKLDNN for cross compilation (#13893) * disable default MKLDNN for cross compilation * adding temporary debug logs * Julia: deprecate `mx.empty`, replace it with `UndefInitializer` (#13934) In Julia 0.7+, constructing a uninitialized array is provided via the APIs: - `Array{T,N}(undef, dims...)` - `Array{T,N}(undef, dims)` - `Array{T}(undef, dims...)` - `Array{T}(undef, dims)` There is an API `mx.empty(dims...)` serving for this purpose. This PR proposes that deprecating the original API `mx.empty` and provide the functionality with the API design similar to Julia's Base. - `NDArray{T,N}(undef, dims...)` - `NDArray{T,N}(undef, dims)` - `NDArray{T}(undef, dims...)` - `NDArray{T}(undef, dims)` - `NDArray(undef, dims...)` - `NDArray(undef, dims)` e.g. ```julia julia> NDArray{Int,2}(undef, 5, 2) 5×2 NDArray{Int64,2} @ CPU0: 94290755905104 94290752678143 94290752660544 68719476760 94290752674408 94290737734368 94290752660544 18 94290752674408 18 julia> NDArray(undef, 5, 2) # default type is `mx.MX_float` 5×2 NDArray{Float32,2} @ CPU0: -29112.406f0 5.2029858f-8 3.0763f-41 6.7375383f-10 1.7613131f19 0.0f0 4.840456f30 0.0f0 4.4262863f30 0.0f0 ``` - The original `mx.empty` APIs are still functional. If user invokes them, a deprecation warning will be popped up. * Runtime feature detection (#13549) * Prototype for runtime feature detection * Includes from diamond to quotes * Add CPU feature and BLAS flavour flags * Add BLAS flavour and CPU SSE and AVX flags * MXNET_USE_LAPACK * Fix C++ linting errors * Expose runtime feature detection in the public C API and in the Python API * Refactor Storage -> FeatureSet * Refine documentation * Add failure case * Fix pylint * Address CR comments * Reduce verbosity of container builds (wget output) (#13888) * Add back R tests and fix typo around R and perl tests (#13940) * Add back R tests and fix typo around R and perl tests * Fix permissions * Fix copy&paste mistake around roxygen and remove previous permission override * fix doc of take operator (#13947) * #13624 clojure nightly tests (#13624) * Add erfinv operator for calculating inverse error function (#13811) * add default behaviour for argmax * prototype of erfvin * add test * gpu support * Revert "add default behaviour for argmax" This reverts commit 64e9f1a9e3c9cabf312b8d80b3520b22da31c0b6. * move erfinv to contrib * edit copyright * remove atof * use std and update license * add license exclude file * fix per eric's comments * change license header * Update project.clj file to use the snapshots repo to be able to pull (#13935) nightly Scala jar - also update readme * Julia: add windows-cpu build (#13937) - Julia v0.7 - Julia v1.0 * split_v2 (#13687) * Update autoencoder example (#12933) * Fixing the autoencoder example * adding pointer to VAE * fix typos * Update README.md * Updating notebook * Update after comments * Update README.md * Update README.md * Retrigger build * Updates after review * Static build for Python (#13916) * add python unit test * address comments * switch sanity test to Gluon module test * We don't run tests (╯‵□′)╯︵┻━┻ * add variant in the environment variable * add document improvement * kill the conflict * Flaky maven binary download (#13974) * Aggregate SGD (#13346) * Aggregate SGD * Make OpWrapperGenerator understand Tuple<float> * Trigger * Add NNVM Tuple to cpp-package op.h * Trigger * Fix pylint aggregate SGD * Update info about new ENV vars and modifying 2 tests that require update_on_kvstore to be true * Fix * Aggregate SGD support for Gluon trainer * Added text to doc about aggregate update in SGD optimizer * Docs changes from review * Gradient multiplier (contrib) operator (#13632) * Added the gradient reversal contrib operator Missing test for backwards pass * Fixed linting errors * Fixed forward test * Added random forward / backward test for gradient reversal * Update test_contrib_operator.py * Fixed typo in gradient reversal op description * Replace forward code with the identitiy implementation * Fixed typos in function docs * Changed default behavior to identity * Replaced backward code with scalar_mul * Fixed backward operator and unit test * Renamed operator to gradient multiplier * Update test_contrib_operator.py Retrigger flaky test * Update gradient_multiplier_op.cc Improved the description of the scalar multiplier * Update README.md (#13973) * Fixing the doc for symbolic version of rand_zipfian (#13978) * Fixes #12779 * Gluon end to end tutorial (#13411) * initial draft gluon tutorial * add reference * add cpp inference * improve wording * address pr comments * add util functions on dataset * move util file * update link * fix typo, add test * allow download * update wording * update links * address comments * use lr scheduler with optimizer * separate into 2 tutorials * add c++ tutorial to test whitelist * [MXNET-1293] Adding Iterables instead of List to method signature for infer APIs in Java (#13977) * Added Iterables as input type instead of List in Predictor for Java * Added Iterables to ObjectDetector API * Added tests for Predictor API * Added tests for ObjectDetector * Use CPUPinned context in ImageRecordIOParser2 (#13980) * create NDArray with CPUPinned context in ImageRecordIOParser2 * update document * use -1 device_id as an option to create CPU(0) context * retrigger CI * fix cpplint error * Added optional parameters to BilinearResize2D to do relative scaling (#13985) * Added optional parameters to BilinearResize2D to do relative scaling * Removed unnecessary params in unit tests. * Fixed deprecated casting style * [MXNET-1301] Remove the unnecessary WaitAll statements from inception_inference example (#13972) * Removed the unnecessary WaitAll statements * Removed the WaitAll() calls wherever they are not necessary. * [MXNET-1000] get Ndarray real value and form it from a NDArray (#12690) * add visualize * adding Any type input to form NDArray * fix bug and add tests * add a toString method * add Visualize Util and migrate visualize structure to there * update with tests * refactor code * fix the minor issue * add multiple types support * add changes on names and tests * make code elegant and improve readability * api change (#13903) * ONNX export: Add Crop, Deconvolution and fix the default stride of Pooling to 1 (#12399) * Added Deconvolution and Crop to ONNX exporter * Added default for pool_type * Sample python bilinear initializer at integral points in y-direction (#12983) * Sample python bilinear initializer at integral points in y-direction * Add unit test for bilinear initializer * [MXNET-703] Minor refactor of TensorRT code (#13311) * Python BucketingModule bind() with grad_req = 'add' (#13984) * remember grad_req from bind and apply it to sub-modules * unit-test for gradient accumulation with bucketing modules * MXNET-1295 Adding integer index support to Sequence* family of operators. (#13880) * Adding integer index support to Sequence* family of operators. Adding ability to use int32 arrays, or any castable-to-int type, as the sequence_length array to SequenceMask, SequenceLast, and SequenceReverse. Previously these operaters all requred sequence_length to be the same data type as the input array. See MxNet Jira ticket here: https://issues.apache.org/jira/browse/MXNET-1295 See also GitHub issues here: /~https://github.com/apache/incubator-mxnet/issues/12649 /~https://github.com/dmlc/gluon-nlp/issues/346 * Adding explicit braces to an if statement to fix g++ warning * fixing sequence_mask.cu by adding IType to template * Fixing whitespace errors reported by linter * Adding unit tests * Fixing length of lines to pass linter * Disabled flaky test test_negative_binomial_generator (#13784) * Fix website error pages (#13963) * fix error redirect * add error artifacts for local build * build docs with CPP package (#13983) * Update scala-package gitignore configuration. (#13962) * [MXNET-1232] fix demo and add Eclipse support (#13979) * fix demo and add Eclipse support * fix on docs * fix typo * Update docs/install/java_setup.md Co-Authored-By: lanking520 <lanking520@live.com> * add fixes in docs * fix compile error in debug mode (#13873) the latest BufferEntry do not contain ctx function and results in compile errors. inside of BufferEntry is an object of NDArray, that is the expected data. * Image normalize operator - GPU support, 3D/4D inputs (#13802) * CPU version of normalize operator is working and unit test added * Add GPU implementation and tests * Working GPU normalize transforms * Add default values, fix imports, fix documentation * Add backward implmentation for image normalize * Add tests for backward pass * Move back operators to its original files * Add review comments * Add 4D example * Make infer type generic * Fix inline function build error * make functions as inline to avoid multiple definition conflict across cc and cu * Fix build errors * Fix failing GPU tests * remove debug; add support for v1.4.x docs; fix publish bug (#14015) * Return value docs for nd.random.* and sym.random.* (#13994) * mx.random.multinomial python documentation updated, return type details added * multinomial documentation clarified * added basic case for negative_binomial * added basic case for generalized_negative_binomial * basic case added for gamma * added basic case for exponential * basic case added for randn * remaining base cases added. * randint case added * cleaned up return types for random.py * zboldyga added to contributors * spacing typo correction * updated symbol.random return types, minor correction to ndarray.random return types * removed trailing whitespace in docs * Julia: split ndarray.jl into several snippets (#14001) - `ndarray/type.jl` - `ndarray/context.jl` - `ndarray/show.jl` - `ndarray/remap.jl` - `ndarray/array.jl` - `ndarray/arithmetic.jl` - `ndarray/comparison.jl` - `ndarray/io.jl` - `ndarray/reduction.jl` - `ndarray/statistic.jl` - `ndarray/linalg.jl` - `ndarray/trig.jl` - `ndarray/activation.jl` - `ndarray/autoimport.jl` * float32 -> float16 cast consistency across implementations (#13857) * Added test showing float32->float16 discrepancy when mshadow float2half() is used. * Temp update mshadow submodule SHA to point to PR368 (b211cb7). * Temp switch to url = /~https://github.com/DickJC123/mshadow.git * Updata mshadow submodule SHA. * Improve code style per reviewer comments. * Move back to dmlc/mshadow.git, now with float->half rounding. * Expand test_operator.py:test_cast_float32_to_float16 to test np.nan. * Improve bulking in Gluon (#13890) * Improve bulking in Gluon * Trigger CI * Fix MXNet R package build (#13952) * fix mxnet r package build * add ci * remove mkldnn-gpu test for R * add minimal test for MKLDNN-R * pick mlp as minimal R test * Fix inconsistent handling for FResourceRequestEx for imperative and symbolic executor (#14007) * Update op_attr_types.h * Update attach_op_resource_pass.cc * [MXNET-1180] Java Image API (#13807) * add java example * add test and change PredictorExample * add image change * Add minor fixes * add License * add predictor Example tests * fix the issue with JUnit test * Satisfy Lint God ʕ •ᴥ•ʔ * update the pom file config * update documentation * add simplified methods * Export resize and support batch size (#14014) * add image resize operator and unit test * refactor the resize operator and address lint issues * address comment and add doc * assert size is more than 2 * add test case of 4D input * use ndarray datatype * add inline to Shape * add 4D input example * refactor the duplicate code and separate the resize from image_random * clean up the code * add resize implementation * delete the variable not used * refactor the code with structure and enum to make code more understandable * fix the lint * address comments * address comment 1. add description 2. refactor unit test and add dtype * update data type check * lint * move the common utitlity to image_utils * add default value for keep_ratio * change the operator doc * update the image utility function * fix lint * use Hang implementation to achieve image resize operator GPU * update the check and doc * refactor the caffe_gpu_interp2_kernel * update doc and fix the cpu compile error * update the comment * fix lint * add unit test for gpu * address comments * remove the crop and centercop utility function to make the PR clear * fix the syntax error * delete the warning * add unit test with 4D * fix typo * add more unit test * fix unit test * set atol = 1 * fix missing numpy import * fix the unit test * delete test case * fix unit test missing dependency * fix error data type * unify the style and add invalid interp * update the doc * add NAG optimizer to r api (#14023) * Now passing DType of Label downstream to Label's DataDesc object (#14038) * fix test_stn (#14063) * re-enable test after issue fixed /~https://github.com/apache/incubator-mxnet/issues/10973 (#14032) * Remove all usages of makefile for scala (#14013) * Remove all usages of makefile for scala * Unify making folders for scala/java setup * Fix mxdoc path * Add batch mode to calls * fix nightly test on tutorials (#14036) * fix nightly test * fix typo * trigger ci * update the scala installation tutorial on intellij (#14033) * update the scala installation tutorial on intellij * update the so answer * update the so answer * Image ToTensor operator - GPU support, 3D/4D inputs (#13837) * Add CPU implementation of ToTensor * Add tests for cpu * Add gpu implementation and tests * Fix lint issues * Cleanup includes * Move back changes to original image operators files * Add 4D example * resolve merge conflicts * Fix failing tests * parallelize on channel in kernel launch * rewrote the concat test to avoid flaky failures (#14049) ran 10000 times with no failures * Fix website scala doc (#14065) * Fix doc building * Remove deplicate in * [Clojure] Add resource scope to clojure package (#13993) * Add resource scope to clojure package * add rat * fix integration test * feedback from @benkamphaus - move from defs to atoms to make the tests a bit better * adding alias with-do and with-let more tests * another test * Add examples in docstring * refactor example and test to use resource-scope/with-let * fix tests and problem with laziness now they work as expected! * refactor to be a bit more modular * remove comments * Update NOTICE (#14043) * modifying SyncBN doc for FP16 use case (#14041) LGTM * add new cloud providers to install page (#14039) * add new cloud providers * fix colon * CUDNN dropout (#13896) * cudnn dropout * test dropout as stateful op * add cudnn_off * refactor * fix bug when using inf forward * turn on cudnn in gluon * reuse dropout state space * dropout passthrough * address comments * fix test_depthwise_convoltuion for occasional CI failures (#14016) * keeping same contexts for comparison * enabling test * testing default context * Revert "testing default context" This reverts commit 1f95d0228178debde14680839bb6abab14c6d049. * Disabling test due to CI failure on MKL-DNN * ONNX export: broadcast_to, tile ops (#13981) * Expand,tile op export * fix * adding test cases * adding comments * [MXNET-1258]fix unittest for ROIAlign Operator (#13609) * fix roi align test * retrigger unittest * add more test detail for ROIAlign test * remove url in test_op_roi_align * remove blank line in test_op_roi_align in test_operator * merge master * Update test_operator.py * retrigger CI * Fix performance regression in normalize operator (#14055) * parallelize on channel forward pass * parallelize on channel normalize backward pass * Fix lint issues * Trying to fix CI build failure on GPU * Fix failing GPU test on CI Do not pass normalize param as is to GPU kernel * Fix to_tensor tests * Pass mean and std_dev as native types for kernel * Fix CI failure. Do not pass mean, std as vector to kernel * Add maven wraper to scala project. (#13702) * Increase perfomance of BulkAppend and BulkFlush (#14067) * Better bulkappend * Fix lint * [MXNET-1178] updating scala docs (#14070) * updating scala docs * Addressed PR feedback * update the version name (#14076) * [MXNET-1121] Example to demonstrate the inference workflow using RNN (#13680) * [MXNET-1121] Example to demonstrate the inference workflow using RNN * Addressed the review comments. Updated the ReadMe files. * Removed the unnecessary creation of NDArray * Added the unit tests to nightly tests to catch the failure. * Updated the makefiles and unit tests so that the examples are built and tested in nightly * Added the visual representation of the model and fixed the CI failure. * Added the missing pdf file. * Fixing the broken ci_test.sh * Update cpp-package/example/inference/README.md Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com> * Update cpp-package/example/inference/README.md Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com> * Update cpp-package/example/inference/README.md Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com> * Update cpp-package/example/inference/README.md Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com> * Update cpp-package/example/inference/README.md Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com> * Update cpp-package/example/inference/simple_rnn.cpp Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com> * Update cpp-package/example/inference/simple_rnn.cpp Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com> * Update cpp-package/example/inference/simple_rnn.cpp Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com> * Update cpp-package/example/inference/simple_rnn.cpp Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com> * Update cpp-package/example/inference/simple_rnn.cpp Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com> * Update cpp-package/example/inference/README.md Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com> * Update cpp-package/example/inference/README.md Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com> * Update cpp-package/example/inference/README.md Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com> * Update cpp-package/example/inference/simple_rnn.cpp Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com> * Update cpp-package/example/inference/simple_rnn.cpp Co-Authored-By: leleamol <19983848+leleamol@users.noreply.github.com> * Applying unresolved changes to README file. * Fixing the CI build failure. * Updated the RNN example from sequence generation to sentiment analysis * Updated the readme files. Updated the example to use trained model and updated the unit test. * Addressed the review comment to increase the default sequence length. Added the examples with inputs of various lengths. * Updated the example to handle variable length input. Updated the readme and unit test files accordingly. * Updated the example to share the memory between executors by createing shared executors. * Updated the creation of executors from largest to smallest bucket key * Creating the executor for the highest bucket key. * Updated the unit test to check for the results in a range and modified the function name to be consistent with others. * Fixed the logic to find the right bucket. * hybridize rnn and add model graph (#13244) * hybridize rnn and add model graph * trigger CI * separate mxboard visualization * add options and she-bang * add defaults * trigger CI * rename export-model * Exclude concat layer for gpu quantization (#14060) * exclude concat for gpu quantization * remove quantized_concat test in non-subgraph flow * Remove inplace support for ToTensor operator (#14083) * Remove stale check for op req type * Do not register to tensor operator with in place option. * [MKLDNN] Enable signed int8 support for convolution. (#13697) * Enable s8s8 support for MKLDNN convolution. * Fix cpp build * Fix build. * Fix build * Remove openmp min/max reduction for windows build * Add mkldnn_OIhw4i16o4i_s8s8 support * Add all s8s8 weight format * Change ssd quantize script. * Update * Manually cast mshadow shape size to size_t * Fix merge. * Fix perl package. * Retrigger CI * Fix GPU test * Fix GPU test * Rerun CI * Rerun CI * Rerun CI * Rerun CI * Remove weight_channelwise_scale from params. * Fix * Keep API compatible. * Rerun CI * Rerun CI * Rerun CI * Rerun CI * Address comments. * fix. * Address debug build. * Add comment for next_impl * Rerun ci * Add new api MXExecutorSetMonitorCallbackEX * Add default value for monitor_all for cpp header. * Rerun CI * fix * script change for uint8. * trigger ci * trigger ci * [MXNET-1291] solve pylint errors in examples with issue no.12205 (#13815) * Unify the style here Unify the style here and remove the testing 'print' code segment. * Unify the description of comment Change the description of comment from "multi-layer perceptron" to "Get multi-layer perceptron" * Unify the style of comments Unify the style of comments suggested by @sandeep-krishnamurthy * git pull the lastest code from master of incubator-mxnet * Complete rebase * Solve PEP8 [C0304 ] Final newline missing Sovle example/deep-embedded-clustering/solver.py(150): [C0304 ] Final newline missing * fix merge issue * skip output_names unittest for mxnet-ngraph

marcoabreu added Python Question labels Aug 20, 2018

kohr-h mentioned this issue Oct 30, 2018

Import and GPU init extremely slow on Windows #13040

Open

samskalicky mentioned this issue Dec 10, 2018

Low CPU usage of MXNet in subprocesses #13593

Open

Vikas-kum mentioned this issue Dec 10, 2018

[MXNET-12255] fixing import mxnet taking long time when theere are many cores #13601

Closed

7 tasks

Vikas-kum mentioned this issue Dec 10, 2018

Fix for import mxnet taking long time if multiple process launched #13602

Merged

7 tasks

Vikas-kum mentioned this issue Dec 13, 2018

Fix for import mxnet taking long time if multiple process launched (#… #13637

Merged

7 tasks

anirudh2290 closed this as completed Dec 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretty high cpu load when import mxnet #12255

Pretty high cpu load when import mxnet #12255

fighting-liu commented Aug 20, 2018

vdantu commented Aug 20, 2018

fighting-liu commented Aug 21, 2018 •

edited

Loading

vdantu commented Aug 23, 2018

fighting-liu commented Aug 24, 2018

kardoszc commented Aug 31, 2018 •

edited

Loading

kardoszc commented Sep 2, 2018

vrakesh commented Oct 2, 2018 •

edited

Loading

vrakesh commented Oct 3, 2018

vrakesh commented Oct 5, 2018

lupesko commented Oct 7, 2018

vrakesh commented Oct 8, 2018

samskalicky commented Nov 30, 2018 •

edited

Loading

samskalicky commented Nov 30, 2018 •

edited

Loading

Vikas-kum commented Nov 30, 2018

samskalicky commented Dec 1, 2018 •

edited

Loading

cjolivier01 commented Dec 5, 2018

cjolivier01 commented Dec 5, 2018

Vikas-kum commented Dec 5, 2018

Vikas-kum commented Dec 5, 2018

larroy commented Dec 5, 2018

samskalicky commented Dec 6, 2018 •

edited

Loading

anirudh2290 commented Dec 17, 2018

Pretty high cpu load when import mxnet #12255

Pretty high cpu load when import mxnet #12255

Comments

fighting-liu commented Aug 20, 2018

vdantu commented Aug 20, 2018

fighting-liu commented Aug 21, 2018 • edited Loading

vdantu commented Aug 23, 2018

fighting-liu commented Aug 24, 2018

kardoszc commented Aug 31, 2018 • edited Loading

kardoszc commented Sep 2, 2018

vrakesh commented Oct 2, 2018 • edited Loading

vrakesh commented Oct 3, 2018

vrakesh commented Oct 5, 2018

lupesko commented Oct 7, 2018

vrakesh commented Oct 8, 2018

samskalicky commented Nov 30, 2018 • edited Loading

samskalicky commented Nov 30, 2018 • edited Loading

Vikas-kum commented Nov 30, 2018

samskalicky commented Dec 1, 2018 • edited Loading

cjolivier01 commented Dec 5, 2018

cjolivier01 commented Dec 5, 2018

Vikas-kum commented Dec 5, 2018

Vikas-kum commented Dec 5, 2018

larroy commented Dec 5, 2018

samskalicky commented Dec 6, 2018 • edited Loading

anirudh2290 commented Dec 17, 2018

fighting-liu commented Aug 21, 2018 •

edited

Loading

kardoszc commented Aug 31, 2018 •

edited

Loading

vrakesh commented Oct 2, 2018 •

edited

Loading

samskalicky commented Nov 30, 2018 •

edited

Loading

samskalicky commented Nov 30, 2018 •

edited

Loading

samskalicky commented Dec 1, 2018 •

edited

Loading

samskalicky commented Dec 6, 2018 •

edited

Loading