diff --git a/MKLDNN_README.md b/MKLDNN_README.md index 214fc83985fb..34790c9c513d 100644 --- a/MKLDNN_README.md +++ b/MKLDNN_README.md @@ -15,316 +15,4 @@ -# Build/Install MXNet with MKL-DNN - -A better training and inference performance is expected to be achieved on Intel-Architecture CPUs with MXNet built with [Intel MKL-DNN](/~https://github.com/intel/mkl-dnn) on multiple operating system, including Linux, Windows and MacOS. -In the following sections, you will find build instructions for MXNet with Intel MKL-DNN on Linux, MacOS and Windows. - -The detailed performance data collected on Intel Xeon CPU with MXNet built with Intel MKL-DNN can be found [here](https://mxnet.incubator.apache.org/faq/perf.html#intel-cpu). - - -

Contents

- -* [1. Linux](#1) -* [2. MacOS](#2) -* [3. Windows](#3) -* [4. Verify MXNet with python](#4) -* [5. Enable MKL BLAS](#5) -* [6. Enable graph optimization](#6) -* [7. Quantization](#7) -* [8. Support](#8) - -

Linux

- -### Prerequisites - -``` -sudo apt-get update -sudo apt-get install -y build-essential git -sudo apt-get install -y libopenblas-dev liblapack-dev -sudo apt-get install -y libopencv-dev -sudo apt-get install -y graphviz -``` - -### Clone MXNet sources - -``` -git clone --recursive /~https://github.com/apache/incubator-mxnet.git -cd incubator-mxnet -``` - -### Build MXNet with MKL-DNN - -``` -make -j $(nproc) USE_OPENCV=1 USE_MKLDNN=1 USE_BLAS=mkl USE_INTEL_PATH=/opt/intel -``` - -If you don't have the full [MKL](https://software.intel.com/en-us/intel-mkl) library installation, you might use OpenBLAS as the blas library, by setting USE_BLAS=openblas. - -

MacOS

- -### Prerequisites - -Install the dependencies, required for MXNet, with the following commands: - -- [Homebrew](https://brew.sh/) -- llvm (clang in macOS does not support OpenMP) -- OpenCV (for computer vision operations) - -``` -# Paste this command in Mac terminal to install Homebrew -/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" - -# install dependency -brew update -brew install pkg-config -brew install graphviz -brew tap homebrew/core -brew install opencv -brew tap homebrew/versions -brew install llvm -``` - -### Clone MXNet sources - -``` -git clone --recursive /~https://github.com/apache/incubator-mxnet.git -cd incubator-mxnet -``` - -### Build MXNet with MKL-DNN - -``` -LIBRARY_PATH=$(brew --prefix llvm)/lib/ make -j $(sysctl -n hw.ncpu) CC=$(brew --prefix llvm)/bin/clang CXX=$(brew --prefix llvm)/bin/clang++ USE_OPENCV=1 USE_OPENMP=1 USE_MKLDNN=1 USE_BLAS=apple USE_PROFILER=1 -``` - -

Windows

- -On Windows, you can use [Micrsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) and [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/) to compile MXNet with Intel MKL-DNN. -[Micrsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) is recommended. - -**Visual Studio 2015** - -To build and install MXNet yourself, you need the following dependencies. Install the required dependencies: - -1. If [Microsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) is not already installed, download and install it. You can download and install the free community edition. -2. Download and Install [CMake 3](https://cmake.org/) if it is not already installed. -3. Download and install [OpenCV 3](http://sourceforge.net/projects/opencvlibrary/files/opencv-win/3.0.0/opencv-3.0.0.exe/download). -4. Unzip the OpenCV package. -5. Set the environment variable ```OpenCV_DIR``` to point to the ```OpenCV build directory``` (```C:\opencv\build\x64\vc14``` for example). Also, you need to add the OpenCV bin directory (```C:\opencv\build\x64\vc14\bin``` for example) to the ``PATH`` variable. -6. If you have Intel Math Kernel Library (MKL) installed, set ```MKL_ROOT``` to point to ```MKL``` directory that contains the ```include``` and ```lib```. If you want to use MKL blas, you should set ```-DUSE_BLAS=mkl``` when cmake. Typically, you can find the directory in -```C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018\windows\mkl```. -7. If you don't have the Intel Math Kernel Library (MKL) installed, download and install [OpenBLAS](http://sourceforge.net/projects/openblas/files/v0.2.14/). Note that you should also download ```mingw64.dll.zip`` along with openBLAS and add them to PATH. -8. Set the environment variable ```OpenBLAS_HOME``` to point to the ```OpenBLAS``` directory that contains the ```include``` and ```lib``` directories. Typically, you can find the directory in ```C:\Program files (x86)\OpenBLAS\```. - -After you have installed all of the required dependencies, build the MXNet source code: - -1. Download the MXNet source code from [GitHub](/~https://github.com/apache/incubator-mxnet). Don't forget to pull the submodules: -``` -git clone --recursive /~https://github.com/apache/incubator-mxnet.git -``` - -2. Copy file `3rdparty/mkldnn/config_template.vcxproj` to incubator-mxnet root. - -3. Start a Visual Studio command prompt. - -4. Use [CMake 3](https://cmake.org/) to create a Visual Studio solution in ```./build``` or some other directory. Make sure to specify the architecture in the -[CMake 3](https://cmake.org/) command: -``` -mkdir build -cd build -cmake -G "Visual Studio 14 Win64" .. -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_NVRTC=0 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_PROFILER=1 -DUSE_BLAS=open -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_NAME=All -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release -``` - -5. In Visual Studio, open the solution file,```.sln```, and compile it. -These commands produce a library called ```libmxnet.dll``` in the ```./build/Release/``` or ```./build/Debug``` folder. -Also ```libmkldnn.dll``` with be in the ```./build/3rdparty/mkldnn/src/Release/``` - -6. Make sure that all the dll files used above(such as `libmkldnn.dll`, `libmklml.dll`, `libiomp5.dll`, `libopenblas.dll`, etc) are added to the system PATH. For convinence, you can put all of them to ```\windows\system32```. Or you will come across `Not Found Dependencies` when loading MXNet. - -**Visual Studio 2017** - -To build and install MXNet yourself using [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/), you need the following dependencies. Install the required dependencies: - -1. If [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/) is not already installed, download and install it. You can download and install the free community edition. -2. Download and install [CMake 3](https://cmake.org/files/v3.11/cmake-3.11.0-rc4-win64-x64.msi) if it is not already installed. -3. Download and install [OpenCV](https://sourceforge.net/projects/opencvlibrary/files/opencv-win/3.4.1/opencv-3.4.1-vc14_vc15.exe/download). -4. Unzip the OpenCV package. -5. Set the environment variable ```OpenCV_DIR``` to point to the ```OpenCV build directory``` (e.g., ```OpenCV_DIR = C:\utils\opencv\build```). -6. If you don’t have the Intel Math Kernel Library (MKL) installed, download and install [OpenBlas](https://sourceforge.net/projects/openblas/files/v0.2.20/OpenBLAS%200.2.20%20version.zip/download). -7. Set the environment variable ```OpenBLAS_HOME``` to point to the ```OpenBLAS``` directory that contains the ```include``` and ```lib``` directories (e.g., ```OpenBLAS_HOME = C:\utils\OpenBLAS```). - -After you have installed all of the required dependencies, build the MXNet source code: - -1. Start ```cmd``` in windows. - -2. Download the MXNet source code from GitHub by using following command: - -```r -cd C:\ -git clone --recursive /~https://github.com/apache/incubator-mxnet.git -``` - -3. Copy file `3rdparty/mkldnn/config_template.vcxproj` to incubator-mxnet root. - -4. Follow [this link](https://docs.microsoft.com/en-us/visualstudio/install/modify-visual-studio) to modify ```Individual components```, and check ```VC++ 2017 version 15.4 v14.11 toolset```, and click ```Modify```. - -5. Change the version of the Visual studio 2017 to v14.11 using the following command (by default the VS2017 is installed in the following path): - -```r -"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvars64.bat" -vcvars_ver=14.11 -``` - -6. Create a build dir using the following command and go to the directory, for example: - -```r -mkdir C:\build -cd C:\build -``` - -7. CMake the MXNet source code by using following command: - -```r -cmake -G "Visual Studio 15 2017 Win64" .. -T host=x64 -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_NVRTC=0 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_PROFILER=1 -DUSE_BLAS=open -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_NAME=All -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release -``` - -8. After the CMake successfully completed, compile the the MXNet source code by using following command: - -```r -msbuild mxnet.sln /p:Configuration=Release;Platform=x64 /maxcpucount -``` - -9. Make sure that all the dll files used above(such as `libmkldnn.dll`, `libmklml.dll`, `libiomp5.dll`, `libopenblas.dll`, etc) are added to the system PATH. For convinence, you can put all of them to ```\windows\system32```. Or you will come across `Not Found Dependencies` when loading MXNet. - -

Verify MXNet with python

- -``` -cd python -sudo python setup.py install -python -c "import mxnet as mx;print((mx.nd.ones((2, 3))*2).asnumpy());" - -Expected Output: - -[[ 2. 2. 2.] - [ 2. 2. 2.]] -``` - -### Verify whether MKL-DNN works - -After MXNet is installed, you can verify if MKL-DNN backend works well with a single Convolution layer. - -``` -import mxnet as mx -import numpy as np - -num_filter = 32 -kernel = (3, 3) -pad = (1, 1) -shape = (32, 32, 256, 256) - -x = mx.sym.Variable('x') -w = mx.sym.Variable('w') -y = mx.sym.Convolution(data=x, weight=w, num_filter=num_filter, kernel=kernel, no_bias=True, pad=pad) -exe = y.simple_bind(mx.cpu(), x=shape) - -exe.arg_arrays[0][:] = np.random.normal(size=exe.arg_arrays[0].shape) -exe.arg_arrays[1][:] = np.random.normal(size=exe.arg_arrays[1].shape) - -exe.forward(is_train=False) -o = exe.outputs[0] -t = o.asnumpy() -``` - -More detailed debugging and profiling information can be logged by setting the environment variable 'MKLDNN_VERBOSE': -``` -export MKLDNN_VERBOSE=1 -``` -For example, by running above code snippet, the following debugging logs providing more insights on MKL-DNN primitives `convolution` and `reorder`. That includes: Memory layout, infer shape and the time cost of primitive execution. -``` -mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_nchw out:f32_nChw16c,num:1,32x32x256x256,6.47681 -mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_oihw out:f32_OIhw16i16o,num:1,32x32x3x3,0.0429688 -mkldnn_verbose,exec,convolution,jit:avx512_common,forward_inference,fsrc:nChw16c fwei:OIhw16i16o fbia:undef fdst:nChw16c,alg:convolution_direct,mb32_g1ic32oc32_ih256oh256kh3sh1dh0ph1_iw256ow256kw3sw1dw0pw1,9.98193 -mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_oihw out:f32_OIhw16i16o,num:1,32x32x3x3,0.0510254 -mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_nChw16c out:f32_nchw,num:1,32x32x256x256,20.4819 -``` - -

Enable MKL BLAS

- -With MKL BLAS, the performace is expected to furtherly improved with variable range depending on the computation load of the models. -You can redistribute not only dynamic libraries but also headers, examples and static libraries on accepting the license [Intel® Simplified license](https://software.intel.com/en-us/license/intel-simplified-software-license). -Installing the full MKL installation enables MKL support for all operators under the linalg namespace. - - 1. Download and install the latest full MKL version following instructions on the [intel website.](https://software.intel.com/en-us/mkl) - - 2. Run `make -j ${nproc} USE_BLAS=mkl` - - 3. Navigate into the python directory - - 4. Run `sudo python setup.py install` - -### Verify whether MKL works - -After MXNet is installed, you can verify if MKL BLAS works well with a single dot layer. - -``` -import mxnet as mx -import numpy as np - -shape_x = (1, 10, 8) -shape_w = (1, 12, 8) - -x_npy = np.random.normal(0, 1, shape_x) -w_npy = np.random.normal(0, 1, shape_w) - -x = mx.sym.Variable('x') -w = mx.sym.Variable('w') -y = mx.sym.batch_dot(x, w, transpose_b=True) -exe = y.simple_bind(mx.cpu(), x=x_npy.shape, w=w_npy.shape) - -exe.forward(is_train=False) -o = exe.outputs[0] -t = o.asnumpy() -``` - -You can open the `MKL_VERBOSE` flag by setting environment variable: -``` -export MKL_VERBOSE=1 -``` -Then by running above code snippet, you probably will get the following output message which means `SGEMM` primitive from MKL are called. Layout information and primitive execution performance are also demonstrated in the log message. -``` -Numpy + Intel(R) MKL: THREADING LAYER: (null) -Numpy + Intel(R) MKL: setting Intel(R) MKL to use INTEL OpenMP runtime -Numpy + Intel(R) MKL: preloading libiomp5.so runtime -MKL_VERBOSE Intel(R) MKL 2018.0 Update 1 Product build 20171007 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.40GHz lp64 intel_thread NMICDev:0 -MKL_VERBOSE SGEMM(T,N,12,10,8,0x7f7f927b1378,0x1bc2140,8,0x1ba8040,8,0x7f7f927b1380,0x7f7f7400a280,12) 8.93ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:40 WDiv:HOST:+0.000 -``` - -

Enable graph optimization

- -Graph optimization by subgraph feature are available in master branch. You can build from source and then use below command to enable this *experimental* feature for better performance: - -``` -export MXNET_SUBGRAPH_BACKEND=MKLDNN -``` - -This limitations of this experimental feature are: - -- Use this feature only for inference. When training, be sure to turn the feature off by unsetting the `MXNET_SUBGRAPH_BACKEND` environment variable. - -- This feature will only run on the CPU, even if you're using a GPU-enabled build of MXNet. - -- [MXNet Graph Optimization and Quantization Technical Information and Performance Details](https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN). - -

Quantization and Inference with INT8

- -Benefiting from Intel® MKL-DNN, MXNet built with Intel® MKL-DNN brings outstanding performance improvement on quantization and inference with INT8 Intel® CPU Platform on Intel® Xeon® Scalable Platform. - -- [CNN Quantization Examples](/~https://github.com/apache/incubator-mxnet/tree/master/example/quantization). - -

Next Steps and Support

- -- For questions or support specific to MKL, visit the [Intel MKL](https://software.intel.com/en-us/mkl) website. - -- For questions or support specific to MKL, visit the [Intel MKLDNN](/~https://github.com/intel/mkl-dnn) website. - -- If you find bugs, please open an issue on GitHub for [MXNet with MKL](/~https://github.com/apache/incubator-mxnet/labels/MKL) or [MXNet with MKLDNN](/~https://github.com/apache/incubator-mxnet/labels/MKLDNN). +File is moved to [docs/tutorials/mkldnn/MKLDNN_README.md](docs/tutorials/mkldnn/MKLDNN_README.md). diff --git a/NEWS.md b/NEWS.md index 20804dab4d45..ad842ac84786 100644 --- a/NEWS.md +++ b/NEWS.md @@ -164,7 +164,7 @@ MKLDNN backend takes advantage of MXNet subgraph to implement the most of possib ##### Quantization Performance of reduced-precision (INT8) computation is also dramatically improved after the graph optimization feature is applied on CPU Platforms. Various models are supported and can benefit from reduced-precision computation, including symbolic models, Gluon models and even custom models. Users can run most of the pre-trained models with only a few lines of commands and a new quantization script imagenet_gen_qsym_mkldnn.py. The observed accuracy loss is less than 0.5% for popular CNN networks, like ResNet-50, Inception-BN, MobileNet, etc. -Please find detailed information and performance/accuracy numbers here: [MKLDNN README](/~https://github.com/apache/incubator-mxnet/blob/master/MKLDNN_README.md), [quantization README](/~https://github.com/apache/incubator-mxnet/tree/master/example/quantization#1) and [design proposal](https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN) +Please find detailed information and performance/accuracy numbers here: [MKLDNN README](/~https://github.com/apache/incubator-mxnet/blob/master/docs/tutorials/mkldnn/MKLDNN_README.md), [quantization README](/~https://github.com/apache/incubator-mxnet/tree/master/example/quantization#1) and [design proposal](https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN) ### New Operators diff --git a/README.md b/README.md index 307bbd432c27..dbc606dfd141 100644 --- a/README.md +++ b/README.md @@ -66,7 +66,7 @@ What's New * [Version 0.8.0 Release](/~https://github.com/dmlc/mxnet/releases/tag/v0.8.0) * [Updated Image Classification with new Pre-trained Models](./example/image-classification) * [Notebooks How to Use MXNet](/~https://github.com/zackchase/mxnet-the-straight-dope) -* [MKLDNN for Faster CPU Performance](./MKLDNN_README.md) +* [MKLDNN for Faster CPU Performance](./docs/tutorials/mkldnn/MKLDNN_README.md) * [MXNet Memory Monger, Training Deeper Nets with Sublinear Memory Cost](/~https://github.com/dmlc/mxnet-memonger) * [Tutorial for NVidia GTC 2016](/~https://github.com/dmlc/mxnet-gtc-tutorial) * [Embedding Torch layers and functions in MXNet](https://mxnet.incubator.apache.org/faq/torch.html) diff --git a/docs/faq/perf.md b/docs/faq/perf.md index 00310dfbb5bd..e1318b843a03 100644 --- a/docs/faq/perf.md +++ b/docs/faq/perf.md @@ -43,7 +43,7 @@ We also find that setting the following environment variables can help: | :-------- | :---------- | | `OMP_NUM_THREADS` | Suggested value: `vCPUs / 2` in which `vCPUs` is the number of virtual CPUs. For more information, please see the guide for [setting the number of threads using an OpenMP environment variable](https://software.intel.com/en-us/mkl-windows-developer-guide-setting-the-number-of-threads-using-an-openmp-environment-variable) | | `KMP_AFFINITY` | Suggested value: `granularity=fine,compact,1,0`. For more information, please see the guide for [Thread Affinity Interface (Linux* and Windows*)](https://software.intel.com/en-us/node/522691). | -| `MXNET_SUBGRAPH_BACKEND` | Set to MKLDNN to enable the [subgraph feature](https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN) for better performance. For more information please see [Build/Install MXNet with MKL-DNN](/~https://github.com/apache/incubator-mxnet/blob/master/MKLDNN_README.md)| +| `MXNET_SUBGRAPH_BACKEND` | Set to MKLDNN to enable the [subgraph feature](https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN) for better performance. For more information please see [Build/Install MXNet with MKL-DNN](/~https://github.com/apache/incubator-mxnet/blob/master/docs/tutorials/mkldnn/MKLDNN_README.md)| Note that _MXNet_ treats all CPUs on a single machine as a single device. So whether you specify `cpu(0)` or `cpu()`, _MXNet_ will use all CPU cores on the machine. diff --git a/docs/install/ubuntu_setup.md b/docs/install/ubuntu_setup.md index 24f9ab98e6e4..f225023d18d5 100644 --- a/docs/install/ubuntu_setup.md +++ b/docs/install/ubuntu_setup.md @@ -175,7 +175,7 @@ If building on CPU and using OpenBLAS: make -j $(nproc) ``` -If building on CPU and using MKL and MKL-DNN (make sure MKL is installed according to [Math Library Selection](build_from_source.html#math-library-selection) and [MKL-DNN README](/~https://github.com/apache/incubator-mxnet/blob/master/MKLDNN_README.md)): +If building on CPU and using MKL and MKL-DNN (make sure MKL is installed according to [Math Library Selection](build_from_source.html#math-library-selection) and [MKL-DNN README](/~https://github.com/apache/incubator-mxnet/blob/master/docs/tutorials/mkldnn/MKLDNN_README.md)): ```bash git clone --recursive /~https://github.com/apache/incubator-mxnet.git diff --git a/docs/install/windows_setup.md b/docs/install/windows_setup.md index 3c3da5349235..96929692f1ec 100644 --- a/docs/install/windows_setup.md +++ b/docs/install/windows_setup.md @@ -136,7 +136,7 @@ We provide two primary options to build and install MXNet yourself using [Micros **NOTE:** Visual Studio 2017's compiler is `vc15`. This is not to be confused with Visual Studio 2015's compiler, `vc14`. -You also have the option to install MXNet with MKL or MKL-DNN. In this case it is recommended that you refer to the [MKLDNN_README](/~https://github.com/apache/incubator-mxnet/blob/master/MKLDNN_README.md). +You also have the option to install MXNet with MKL or MKL-DNN. In this case it is recommended that you refer to the [MKLDNN_README](/~https://github.com/apache/incubator-mxnet/blob/master/docs/tutorials/mkldnn/MKLDNN_README.md). **Option 1: Build with Microsoft Visual Studio 2017 (VS2017)** @@ -156,7 +156,7 @@ To build and install MXNet yourself using [VS2017](https://www.visualstudio.com/ 1. Download and run the [OpenCV](https://sourceforge.net/projects/opencvlibrary/files/opencv-win/3.4.1/opencv-3.4.1-vc14_vc15.exe/download) package. There are more recent versions of OpenCV, so please create an issue/PR to update this info if you validate one of these later versions. 1. This will unzip several files. You can place them in another directory if you wish. We will use `C:\utils`(```mkdir C:\utils```) as our default path. 1. Set the environment variable `OpenCV_DIR` to point to the OpenCV build directory that you just unzipped. Start ```cmd``` and type `set OpenCV_DIR=C:\utils\opencv\build`. -1. If you don’t have the Intel Math Kernel Library (MKL) installed, you can install it and follow the [MKLDNN_README](/~https://github.com/apache/incubator-mxnet/blob/master/MKLDNN_README.md) from here, or you can use OpenBLAS. These instructions will assume you're using OpenBLAS. +1. If you don’t have the Intel Math Kernel Library (MKL) installed, you can install it and follow the [MKLDNN_README](/~https://github.com/apache/incubator-mxnet/blob/master/docs/tutorials/mkldnn/MKLDNN_README.md) from here, or you can use OpenBLAS. These instructions will assume you're using OpenBLAS. 1. Download the [OpenBlas](https://sourceforge.net/projects/openblas/files/v0.2.19/OpenBLAS-v0.2.19-Win64-int32.zip/download) package. Later versions of OpenBLAS are available, but you would need to build from source. v0.2.19 is the most recent version that ships with binaries. Contributions of more recent binaries would be appreciated. 1. Unzip the file, rename it to ```OpenBLAS``` and put it under `C:\utils`. You can place the unzipped files and folders in another directory if you wish. 1. Set the environment variable `OpenBLAS_HOME` to point to the OpenBLAS directory that contains the `include` and `lib` directories and type `set OpenBLAS_HOME=C:\utils\OpenBLAS` on the command prompt(```cmd```). diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md index 7e0ffaa3f72a..d42903688e0b 100644 --- a/docs/tutorials/index.md +++ b/docs/tutorials/index.md @@ -27,6 +27,7 @@ embedded/index.md gluon/index.md java/index.md + mkldnn/index.md nlp/index.md onnx/index.md python/index.md @@ -142,6 +143,7 @@ Select API:  * [Large-Scale Multi-Host Multi-GPU Image Classification](/tutorials/vision/large_scale_classification.html) * [Importing an ONNX model into MXNet](/tutorials/onnx/super_resolution.html) * [Optimizing Deep Learning Computation Graphs with TensorRT](/tutorials/tensorrt/inference_with_trt.html) + * [How to build and install MXNet with MKL-DNN backend](/tutorials/mkldnn/MKLDNN_README.html) * API Guides * Core APIs * NDArray diff --git a/docs/tutorials/mkldnn/MKLDNN_README.md b/docs/tutorials/mkldnn/MKLDNN_README.md new file mode 100644 index 000000000000..189de3f3a0d4 --- /dev/null +++ b/docs/tutorials/mkldnn/MKLDNN_README.md @@ -0,0 +1,330 @@ + + + + + + + + + + + + + + + + + +# Build/Install MXNet with MKL-DNN + +A better training and inference performance is expected to be achieved on Intel-Architecture CPUs with MXNet built with [Intel MKL-DNN](/~https://github.com/intel/mkl-dnn) on multiple operating system, including Linux, Windows and MacOS. +In the following sections, you will find build instructions for MXNet with Intel MKL-DNN on Linux, MacOS and Windows. + +The detailed performance data collected on Intel Xeon CPU with MXNet built with Intel MKL-DNN can be found [here](https://mxnet.incubator.apache.org/faq/perf.html#intel-cpu). + + +

Contents

+ +* [1. Linux](#1) +* [2. MacOS](#2) +* [3. Windows](#3) +* [4. Verify MXNet with python](#4) +* [5. Enable MKL BLAS](#5) +* [6. Enable graph optimization](#6) +* [7. Quantization](#7) +* [8. Support](#8) + +

Linux

+ +### Prerequisites + +``` +sudo apt-get update +sudo apt-get install -y build-essential git +sudo apt-get install -y libopenblas-dev liblapack-dev +sudo apt-get install -y libopencv-dev +sudo apt-get install -y graphviz +``` + +### Clone MXNet sources + +``` +git clone --recursive /~https://github.com/apache/incubator-mxnet.git +cd incubator-mxnet +``` + +### Build MXNet with MKL-DNN + +``` +make -j $(nproc) USE_OPENCV=1 USE_MKLDNN=1 USE_BLAS=mkl USE_INTEL_PATH=/opt/intel +``` + +If you don't have the full [MKL](https://software.intel.com/en-us/intel-mkl) library installation, you might use OpenBLAS as the blas library, by setting USE_BLAS=openblas. + +

MacOS

+ +### Prerequisites + +Install the dependencies, required for MXNet, with the following commands: + +- [Homebrew](https://brew.sh/) +- llvm (clang in macOS does not support OpenMP) +- OpenCV (for computer vision operations) + +``` +# Paste this command in Mac terminal to install Homebrew +/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" + +# install dependency +brew update +brew install pkg-config +brew install graphviz +brew tap homebrew/core +brew install opencv +brew tap homebrew/versions +brew install llvm +``` + +### Clone MXNet sources + +``` +git clone --recursive /~https://github.com/apache/incubator-mxnet.git +cd incubator-mxnet +``` + +### Build MXNet with MKL-DNN + +``` +LIBRARY_PATH=$(brew --prefix llvm)/lib/ make -j $(sysctl -n hw.ncpu) CC=$(brew --prefix llvm)/bin/clang CXX=$(brew --prefix llvm)/bin/clang++ USE_OPENCV=1 USE_OPENMP=1 USE_MKLDNN=1 USE_BLAS=apple USE_PROFILER=1 +``` + +

Windows

+ +On Windows, you can use [Micrsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) and [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/) to compile MXNet with Intel MKL-DNN. +[Micrsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) is recommended. + +**Visual Studio 2015** + +To build and install MXNet yourself, you need the following dependencies. Install the required dependencies: + +1. If [Microsoft Visual Studio 2015](https://www.visualstudio.com/vs/older-downloads/) is not already installed, download and install it. You can download and install the free community edition. +2. Download and Install [CMake 3](https://cmake.org/) if it is not already installed. +3. Download and install [OpenCV 3](http://sourceforge.net/projects/opencvlibrary/files/opencv-win/3.0.0/opencv-3.0.0.exe/download). +4. Unzip the OpenCV package. +5. Set the environment variable ```OpenCV_DIR``` to point to the ```OpenCV build directory``` (```C:\opencv\build\x64\vc14``` for example). Also, you need to add the OpenCV bin directory (```C:\opencv\build\x64\vc14\bin``` for example) to the ``PATH`` variable. +6. If you have Intel Math Kernel Library (MKL) installed, set ```MKL_ROOT``` to point to ```MKL``` directory that contains the ```include``` and ```lib```. If you want to use MKL blas, you should set ```-DUSE_BLAS=mkl``` when cmake. Typically, you can find the directory in +```C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018\windows\mkl```. +7. If you don't have the Intel Math Kernel Library (MKL) installed, download and install [OpenBLAS](http://sourceforge.net/projects/openblas/files/v0.2.14/). Note that you should also download ```mingw64.dll.zip`` along with openBLAS and add them to PATH. +8. Set the environment variable ```OpenBLAS_HOME``` to point to the ```OpenBLAS``` directory that contains the ```include``` and ```lib``` directories. Typically, you can find the directory in ```C:\Program files (x86)\OpenBLAS\```. + +After you have installed all of the required dependencies, build the MXNet source code: + +1. Download the MXNet source code from [GitHub](/~https://github.com/apache/incubator-mxnet). Don't forget to pull the submodules: +``` +git clone --recursive /~https://github.com/apache/incubator-mxnet.git +``` + +2. Copy file `3rdparty/mkldnn/config_template.vcxproj` to incubator-mxnet root. + +3. Start a Visual Studio command prompt. + +4. Use [CMake 3](https://cmake.org/) to create a Visual Studio solution in ```./build``` or some other directory. Make sure to specify the architecture in the +[CMake 3](https://cmake.org/) command: +``` +mkdir build +cd build +cmake -G "Visual Studio 14 Win64" .. -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_NVRTC=0 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_PROFILER=1 -DUSE_BLAS=open -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_NAME=All -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release +``` + +5. In Visual Studio, open the solution file,```.sln```, and compile it. +These commands produce a library called ```libmxnet.dll``` in the ```./build/Release/``` or ```./build/Debug``` folder. +Also ```libmkldnn.dll``` with be in the ```./build/3rdparty/mkldnn/src/Release/``` + +6. Make sure that all the dll files used above(such as `libmkldnn.dll`, `libmklml.dll`, `libiomp5.dll`, `libopenblas.dll`, etc) are added to the system PATH. For convinence, you can put all of them to ```\windows\system32```. Or you will come across `Not Found Dependencies` when loading MXNet. + +**Visual Studio 2017** + +To build and install MXNet yourself using [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/), you need the following dependencies. Install the required dependencies: + +1. If [Microsoft Visual Studio 2017](https://www.visualstudio.com/downloads/) is not already installed, download and install it. You can download and install the free community edition. +2. Download and install [CMake 3](https://cmake.org/files/v3.11/cmake-3.11.0-rc4-win64-x64.msi) if it is not already installed. +3. Download and install [OpenCV](https://sourceforge.net/projects/opencvlibrary/files/opencv-win/3.4.1/opencv-3.4.1-vc14_vc15.exe/download). +4. Unzip the OpenCV package. +5. Set the environment variable ```OpenCV_DIR``` to point to the ```OpenCV build directory``` (e.g., ```OpenCV_DIR = C:\utils\opencv\build```). +6. If you don't have the Intel Math Kernel Library (MKL) installed, download and install [OpenBlas](https://sourceforge.net/projects/openblas/files/v0.2.20/OpenBLAS%200.2.20%20version.zip/download). +7. Set the environment variable ```OpenBLAS_HOME``` to point to the ```OpenBLAS``` directory that contains the ```include``` and ```lib``` directories (e.g., ```OpenBLAS_HOME = C:\utils\OpenBLAS```). + +After you have installed all of the required dependencies, build the MXNet source code: + +1. Start ```cmd``` in windows. + +2. Download the MXNet source code from GitHub by using following command: + +```r +cd C:\ +git clone --recursive /~https://github.com/apache/incubator-mxnet.git +``` + +3. Copy file `3rdparty/mkldnn/config_template.vcxproj` to incubator-mxnet root. + +4. Follow [this link](https://docs.microsoft.com/en-us/visualstudio/install/modify-visual-studio) to modify ```Individual components```, and check ```VC++ 2017 version 15.4 v14.11 toolset```, and click ```Modify```. + +5. Change the version of the Visual studio 2017 to v14.11 using the following command (by default the VS2017 is installed in the following path): + +```r +"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvars64.bat" -vcvars_ver=14.11 +``` + +6. Create a build dir using the following command and go to the directory, for example: + +```r +mkdir C:\build +cd C:\build +``` + +7. CMake the MXNet source code by using following command: + +```r +cmake -G "Visual Studio 15 2017 Win64" .. -T host=x64 -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_NVRTC=0 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_PROFILER=1 -DUSE_BLAS=open -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_NAME=All -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release +``` + +8. After the CMake successfully completed, compile the the MXNet source code by using following command: + +```r +msbuild mxnet.sln /p:Configuration=Release;Platform=x64 /maxcpucount +``` + +9. Make sure that all the dll files used above(such as `libmkldnn.dll`, `libmklml.dll`, `libiomp5.dll`, `libopenblas.dll`, etc) are added to the system PATH. For convinence, you can put all of them to ```\windows\system32```. Or you will come across `Not Found Dependencies` when loading MXNet. + +

Verify MXNet with python

+ +``` +cd python +sudo python setup.py install +python -c "import mxnet as mx;print((mx.nd.ones((2, 3))*2).asnumpy());" + +Expected Output: + +[[ 2. 2. 2.] + [ 2. 2. 2.]] +``` + +### Verify whether MKL-DNN works + +After MXNet is installed, you can verify if MKL-DNN backend works well with a single Convolution layer. + +``` +import mxnet as mx +import numpy as np + +num_filter = 32 +kernel = (3, 3) +pad = (1, 1) +shape = (32, 32, 256, 256) + +x = mx.sym.Variable('x') +w = mx.sym.Variable('w') +y = mx.sym.Convolution(data=x, weight=w, num_filter=num_filter, kernel=kernel, no_bias=True, pad=pad) +exe = y.simple_bind(mx.cpu(), x=shape) + +exe.arg_arrays[0][:] = np.random.normal(size=exe.arg_arrays[0].shape) +exe.arg_arrays[1][:] = np.random.normal(size=exe.arg_arrays[1].shape) + +exe.forward(is_train=False) +o = exe.outputs[0] +t = o.asnumpy() +``` + +More detailed debugging and profiling information can be logged by setting the environment variable 'MKLDNN_VERBOSE': +``` +export MKLDNN_VERBOSE=1 +``` +For example, by running above code snippet, the following debugging logs providing more insights on MKL-DNN primitives `convolution` and `reorder`. That includes: Memory layout, infer shape and the time cost of primitive execution. +``` +mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_nchw out:f32_nChw16c,num:1,32x32x256x256,6.47681 +mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_oihw out:f32_OIhw16i16o,num:1,32x32x3x3,0.0429688 +mkldnn_verbose,exec,convolution,jit:avx512_common,forward_inference,fsrc:nChw16c fwei:OIhw16i16o fbia:undef fdst:nChw16c,alg:convolution_direct,mb32_g1ic32oc32_ih256oh256kh3sh1dh0ph1_iw256ow256kw3sw1dw0pw1,9.98193 +mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_oihw out:f32_OIhw16i16o,num:1,32x32x3x3,0.0510254 +mkldnn_verbose,exec,reorder,jit:uni,undef,in:f32_nChw16c out:f32_nchw,num:1,32x32x256x256,20.4819 +``` + +

Enable MKL BLAS

+ +With MKL BLAS, the performace is expected to furtherly improved with variable range depending on the computation load of the models. +You can redistribute not only dynamic libraries but also headers, examples and static libraries on accepting the license [Intel Simplified license](https://software.intel.com/en-us/license/intel-simplified-software-license). +Installing the full MKL installation enables MKL support for all operators under the linalg namespace. + + 1. Download and install the latest full MKL version following instructions on the [intel website.](https://software.intel.com/en-us/mkl) + + 2. Run `make -j ${nproc} USE_BLAS=mkl` + + 3. Navigate into the python directory + + 4. Run `sudo python setup.py install` + +### Verify whether MKL works + +After MXNet is installed, you can verify if MKL BLAS works well with a single dot layer. + +``` +import mxnet as mx +import numpy as np + +shape_x = (1, 10, 8) +shape_w = (1, 12, 8) + +x_npy = np.random.normal(0, 1, shape_x) +w_npy = np.random.normal(0, 1, shape_w) + +x = mx.sym.Variable('x') +w = mx.sym.Variable('w') +y = mx.sym.batch_dot(x, w, transpose_b=True) +exe = y.simple_bind(mx.cpu(), x=x_npy.shape, w=w_npy.shape) + +exe.forward(is_train=False) +o = exe.outputs[0] +t = o.asnumpy() +``` + +You can open the `MKL_VERBOSE` flag by setting environment variable: +``` +export MKL_VERBOSE=1 +``` +Then by running above code snippet, you probably will get the following output message which means `SGEMM` primitive from MKL are called. Layout information and primitive execution performance are also demonstrated in the log message. +``` +Numpy + Intel(R) MKL: THREADING LAYER: (null) +Numpy + Intel(R) MKL: setting Intel(R) MKL to use INTEL OpenMP runtime +Numpy + Intel(R) MKL: preloading libiomp5.so runtime +MKL_VERBOSE Intel(R) MKL 2018.0 Update 1 Product build 20171007 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.40GHz lp64 intel_thread NMICDev:0 +MKL_VERBOSE SGEMM(T,N,12,10,8,0x7f7f927b1378,0x1bc2140,8,0x1ba8040,8,0x7f7f927b1380,0x7f7f7400a280,12) 8.93ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:40 WDiv:HOST:+0.000 +``` + +

Enable graph optimization

+ +Graph optimization by subgraph feature are available in master branch. You can build from source and then use below command to enable this *experimental* feature for better performance: + +``` +export MXNET_SUBGRAPH_BACKEND=MKLDNN +``` + +This limitations of this experimental feature are: + +- Use this feature only for inference. When training, be sure to turn the feature off by unsetting the `MXNET_SUBGRAPH_BACKEND` environment variable. + +- This feature will only run on the CPU, even if you're using a GPU-enabled build of MXNet. + +- [MXNet Graph Optimization and Quantization Technical Information and Performance Details](https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN). + +

Quantization and Inference with INT8

+ +Benefiting from Intel MKL-DNN, MXNet built with Intel MKL-DNN brings outstanding performance improvement on quantization and inference with INT8 Intel CPU Platform on Intel Xeon Scalable Platform. + +- [CNN Quantization Examples](/~https://github.com/apache/incubator-mxnet/tree/master/example/quantization). + +

Next Steps and Support

+ +- For questions or support specific to MKL, visit the [Intel MKL](https://software.intel.com/en-us/mkl) website. + +- For questions or support specific to MKL, visit the [Intel MKLDNN](/~https://github.com/intel/mkl-dnn) website. + +- If you find bugs, please open an issue on GitHub for [MXNet with MKL](/~https://github.com/apache/incubator-mxnet/labels/MKL) or [MXNet with MKLDNN](/~https://github.com/apache/incubator-mxnet/labels/MKLDNN). diff --git a/docs/tutorials/mkldnn/index.md b/docs/tutorials/mkldnn/index.md new file mode 100644 index 000000000000..faf6526fb824 --- /dev/null +++ b/docs/tutorials/mkldnn/index.md @@ -0,0 +1,25 @@ + + + + + + + + + + + + + + + + + +# Tutorials + +```eval_rst +.. toctree:: + :glob: + + * +``` diff --git a/tests/tutorials/test_sanity_tutorials.py b/tests/tutorials/test_sanity_tutorials.py index 429527db2000..7865000c7608 100644 --- a/tests/tutorials/test_sanity_tutorials.py +++ b/tests/tutorials/test_sanity_tutorials.py @@ -33,6 +33,8 @@ 'embedded/index.md', 'embedded/wine_detector.md', 'gluon/index.md', + 'mkldnn/index.md', + 'mkldnn/MKLDNN_README.md', 'nlp/index.md', 'onnx/index.md', 'python/index.md',