Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

MKLDNN header missing in recent mxnet nightly static builds #18120

Closed
eric-haibin-lin opened this issue Apr 21, 2020 · 13 comments · Fixed by #18205, #18310 or #18355
Closed

MKLDNN header missing in recent mxnet nightly static builds #18120

eric-haibin-lin opened this issue Apr 21, 2020 · 13 comments · Fixed by #18205, #18310 or #18355
Assignees

Comments

@eric-haibin-lin
Copy link
Member

root@ip-172-31-37-108:/bps-mkl# pip3 install --pre mxnet-cu100 -f https://dist.mxnet.io/python/cu100  -U
Collecting mxnet-cu100
  Downloading https://repo.mxnet.io/dist/python/cu100/mxnet_cu100-2.0.0b20200312-py2.py3-none-manylinux1_x86_64.whl (808.1MB)
ls /usr/local/lib/python3.6/dist-packages/mxnet/include/
dlpack  dmlc  mshadow  mxnet  nnvm

Since mkldnn is turned on by default in Feb, the headers should be included, too

@TaoLv

@leezu
Copy link
Contributor

leezu commented Apr 21, 2020

FYI, I intend to switch the CD to cmake build in the coming days. It's currently blocked by #17557 because I'd like the CD to pass at least once prior to changing the build system.. #17557 may be fixed by #18025

However, the issue may be present in both cmake and make based staticbuilds.

@TaoLv TaoLv self-assigned this Apr 21, 2020
@TaoLv
Copy link
Member

TaoLv commented Apr 21, 2020

I doubt it's caused by the fix in #16857. We have lost the mkldnn header folder since then. For example, in https://repo.mxnet.io/dist/python/mkl/mxnet_mkl-1.6.0b20191205-py2.py3-none-manylinux1_x86_64.whl.

@leezu
Copy link
Contributor

leezu commented Apr 29, 2020

@TaoLv agree #16857 is wrong. Now we only copy the headers on OSX platform which doesn't make any sense

@leezu
Copy link
Contributor

leezu commented Apr 29, 2020

I think @mseth10 will fix it as part of his work on fixing CD

@wms2537
Copy link

wms2537 commented May 12, 2020

The problem seems to persist on the latest nightly builds.
I have tried the followings and the mkldnn headers are not included. This caused problems when installing horovod.

mxnet_cu101-2.0.0b20200506-py2.py3-none-manylinux2014_x86_64.whl
mxnet_cu101-2.0.0b20200508-py2.py3-none-manylinux2014_x86_64.whl

@eric-haibin-lin
Copy link
Member Author

mxnet-cu100==2.0.0b20200509 seems to work:

ubuntu@ip-172-31-37-108:~/src$ pip3 list | grep mxnet
mxnet-cu100         2.0.0b20200509

ubuntu@ip-172-31-37-108:~/src$ ls /home/ubuntu/.local/lib/python3.6/site-packages/mxnet/include/
dlpack  dmlc  mkldnn  mshadow  mxnet  nnvm

@wms2537
Copy link

wms2537 commented May 13, 2020

It doesn't work for me. Is there anything to run or install other than pip3 install --pre mxnet-cu102 -f https://dist.mxnet.io/python/cu102 ?

@leezu
Copy link
Contributor

leezu commented May 13, 2020

@leezu leezu reopened this May 13, 2020
@eric-haibin-lin
Copy link
Member Author

Actually I cannot find it now after i uninstalling mxnet and reinstall https://repo.mxnet.io/dist/python/cu100/mxnet_cu100-2.0.0b20200509-py2.py3-none-manylinux2014_x86_64.whl . Perhaps I didn't remove/cleanup the previous mxnet version correctly previously.
@TaoLv

@leezu
Copy link
Contributor

leezu commented May 14, 2020

I looked at the CD pipeline and the mkldnn headers are correctly present at the time of building the wheel. The issue really is that

/~https://github.com/apache/incubator-mxnet/blob/47a38d1101beb5897fe30c9d04f4599cd7636731/tools/pip/setup.py#L152-L154

only runs if the variant is named "*mkl" but now we don't use this mkl naming scheme anymore.

I'll open a PR to fix the setup.py script.

@wms2537
Copy link

wms2537 commented May 18, 2020

Only the following files are in the mkldnn folder
dnnl_config.h dnnl_version.h
Is this normal?

@TaoLv
Copy link
Member

TaoLv commented May 18, 2020

No. It should have the following files:

$ ll 3rdparty/mkldnn/build/install/include/
total 812
-rw-r--r-- 1 lvtao lvtao   2364 May 18 23:28 dnnl_config.h
-rw-r--r-- 1 lvtao lvtao   2641 Apr 12 21:44 dnnl_debug.h
-rw-r--r-- 1 lvtao lvtao 180648 May 11 17:25 dnnl.h
-rw-r--r-- 1 lvtao lvtao 485168 May  7 22:23 dnnl.hpp
-rw-r--r-- 1 lvtao lvtao   1887 May  7 22:23 dnnl_threadpool_iface.hpp
-rw-r--r-- 1 lvtao lvtao  80051 May 11 17:25 dnnl_types.h
-rw-r--r-- 1 lvtao lvtao   1079 May 18 23:28 dnnl_version.h
-rw-r--r-- 1 lvtao lvtao    963 Apr 12 21:44 mkldnn_config.h
-rw-r--r-- 1 lvtao lvtao    959 Apr 12 21:44 mkldnn_debug.h
-rw-r--r-- 1 lvtao lvtao  34732 May  7 22:23 mkldnn_dnnl_mangling.h
-rw-r--r-- 1 lvtao lvtao    935 Apr 12 21:44 mkldnn.h
-rw-r--r-- 1 lvtao lvtao    943 Apr 12 21:44 mkldnn.hpp
-rw-r--r-- 1 lvtao lvtao    959 Apr 12 21:44 mkldnn_types.h
-rw-r--r-- 1 lvtao lvtao    967 Apr 12 21:44 mkldnn_version.h

@leezu
Copy link
Contributor

leezu commented May 18, 2020

The root cause of this issue is that there are a lot of non-standard hard-coded paths in various places of the installation script. Instead of continuing to maintain and hotpatch these, I suggest to just use the standard process and declare the files that need to be included as cmake INSTALL target.

Then cmake -DCMAKE_INSTALL_PREFIX=/tmp/mxnet ..; ninja install will place all files that need to be packaged at the install prefix path, and the python setup.py only needs to package these files.. This will also simplify the CD setup..

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.