diff --git a/docs/faq/env_var.md b/docs/faq/env_var.md index c1c23ba969d2..ffde628d83a3 100644 --- a/docs/faq/env_var.md +++ b/docs/faq/env_var.md @@ -280,6 +280,11 @@ When USE_PROFILER is enabled in Makefile or CMake, the following environments ca - Values: Int ```(default=4)``` - This variable controls how many CuDNN dropout state resources to create for each GPU context for use in operator. +* MXNET_SUBGRAPH_BACKEND + - Values: String ```(default="")``` + - This variable controls the subgraph partitioning in MXNet. + - This variable is used to perform MKL-DNN FP32 operator fusion and quantization. Please refer to the [MKL-DNN operator list](../tutorials/mkldnn/operator_list.md) for how this variable is used and the list of fusion passes. + * MXNET_SAFE_ACCUMULATION - Values: Values: 0(false) or 1(true) ```(default=0)``` - If this variable is set, the accumulation will enter the safe mode, meaning accumulation is done in a data type of higher precision than diff --git a/docs/faq/perf.md b/docs/faq/perf.md index e1318b843a03..62b40247081c 100644 --- a/docs/faq/perf.md +++ b/docs/faq/perf.md @@ -34,8 +34,13 @@ Performance is mainly affected by the following 4 factors: ## Intel CPU -For using Intel Xeon CPUs for training and inference, we suggest enabling -`USE_MKLDNN = 1` in `config.mk`. +When using Intel Xeon CPUs for training and inference, the `mxnet-mkl` package is recommended. Adding `--pre` installs a nightly build from master. Without it you will install the latest patched release of MXNet: + +``` +$ pip install mxnet-mkl [--pre] +``` + +Or build MXNet from source code with `USE_MKLDNN=1`. For Linux users, `USE_MKLDNN=1` will be turned on by default. We also find that setting the following environment variables can help: diff --git a/docs/install/index.md b/docs/install/index.md index 10db8d95b44a..ea93d40e0f8c 100644 --- a/docs/install/index.md +++ b/docs/install/index.md @@ -124,6 +124,12 @@ Indicate your preferred configuration. Then, follow the customized commands to i $ pip install mxnet ``` +MKL-DNN enabled pip packages are optimized for Intel hardware. You can find performance numbers in the MXNet tuning guide. + +``` +$ pip install mxnet-mkl==1.4.0 +``` +
@@ -131,6 +137,12 @@ $ pip install mxnet $ pip install mxnet==1.3.1 ``` +MKL-DNN enabled pip packages are optimized for Intel hardware. You can find performance numbers in the MXNet tuning guide. + +``` +$ pip install mxnet-mkl==1.3.1 +``` +
@@ -138,6 +150,12 @@ $ pip install mxnet==1.3.1 $ pip install mxnet==1.2.1 ``` +MKL-DNN enabled pip packages are optimized for Intel hardware. You can find performance numbers in the MXNet tuning guide. + +``` +$ pip install mxnet-mkl==1.2.1 +``` +
@@ -185,9 +203,15 @@ $ pip install mxnet==0.11.0 $ pip install mxnet --pre ``` +MKL-DNN enabled pip packages are optimized for Intel hardware. You can find performance numbers in the MXNet tuning guide. + +``` +$ pip install mxnet-mkl --pre +``` +

-MXNet offers MKL pip packages that will be much faster when running on Intel hardware. + Check the chart below for other options, refer to PyPI for other MXNet pip packages, or validate your MXNet installation. pip packages diff --git a/docs/tutorials/mkldnn/MKLDNN_README.md b/docs/tutorials/mkldnn/MKLDNN_README.md index c5779670cd87..2a7cd40ac291 100644 --- a/docs/tutorials/mkldnn/MKLDNN_README.md +++ b/docs/tutorials/mkldnn/MKLDNN_README.md @@ -1,25 +1,27 @@ - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + # Build/Install MXNet with MKL-DNN A better training and inference performance is expected to be achieved on Intel-Architecture CPUs with MXNet built with [Intel MKL-DNN](/~https://github.com/intel/mkl-dnn) on multiple operating system, including Linux, Windows and MacOS. In the following sections, you will find build instructions for MXNet with Intel MKL-DNN on Linux, MacOS and Windows. +Please find MKL-DNN optimized operators and other features in the [MKL-DNN operator list](../mkldnn/operator_list.md). + The detailed performance data collected on Intel Xeon CPU with MXNet built with Intel MKL-DNN can be found [here](https://mxnet.incubator.apache.org/faq/perf.html#intel-cpu). @@ -306,14 +308,14 @@ Graph optimization by subgraph feature are available in master branch. You can b ``` export MXNET_SUBGRAPH_BACKEND=MKLDNN ``` - -When `MKLDNN` backend is enabled, advanced control options are avaliable: - -``` -export MXNET_DISABLE_MKLDNN_CONV_OPT=1 # disable MKLDNN convolution optimization pass -export MXNET_DISABLE_MKLDNN_FC_OPT=1 # disable MKLDNN FullyConnected optimization pass -``` - + +When `MKLDNN` backend is enabled, advanced control options are avaliable: + +``` +export MXNET_DISABLE_MKLDNN_CONV_OPT=1 # disable MKLDNN convolution optimization pass +export MXNET_DISABLE_MKLDNN_FC_OPT=1 # disable MKLDNN FullyConnected optimization pass +``` + This limitations of this experimental feature are: diff --git a/docs/tutorials/mkldnn/operator_list.md b/docs/tutorials/mkldnn/operator_list.md new file mode 100644 index 000000000000..4958f8d9b602 --- /dev/null +++ b/docs/tutorials/mkldnn/operator_list.md @@ -0,0 +1,88 @@ + + + + + + + + + + + + + + + + + +# MKL-DNN Operator list + +MXNet MKL-DNN backend provides optimized implementations for various operators covering a broad range of applications including image classification, object detection, natural language processing. + +To help users understanding MKL-DNN backend better, the following table summarizes the list of supported operators, data types and functionalities. A subset of operators support faster training and inference by using a lower precision version. Refer to the following table's `INT8 Inference` column to see which operators are supported. + +| Operator | Function | FP32 Training (backward) | FP32 Inference | INT8 Inference | +| --- | --- | --- | --- | --- | +| **Convolution** | 1D Convolution | Y | Y | N | +| | 2D Convolution | Y | Y | Y | +| | 3D Convolution | Y | Y | N | +| **Deconvolution** | 2D Deconvolution | Y | Y | N | +| | 3D Deconvolution | Y | Y | N | +| **FullyConnected** | 1D-4D input, flatten=True | N | Y | Y | +| | 1D-4D input, flatten=False | N | Y | Y | +| **Pooling** | 2D max Pooling | Y | Y | Y | +| | 2D avg pooling | Y | Y | Y | +| **BatchNorm** | 2D BatchNorm | Y | Y | N | +| **LRN** | 2D LRN | Y | Y | N | +| **Activation** | ReLU | Y | Y | Y | +| | Tanh | Y | Y | N | +| | SoftReLU | Y | Y | N | +| | Sigmoid | Y | Y | N | +| **softmax** | 1D-4D input | Y | Y | N | +| **Softmax_output** | 1D-4D input | N | Y | N | +| **Transpose** | 1D-4D input | N | Y | N | +| **elemwise_add** | 1D-4D input | Y | Y | Y | +| **Concat** | 1D-4D input | Y | Y | Y | +| **slice** | 1D-4D input | N | Y | N | +| **Quantization** | 1D-4D input | N | N | Y | +| **Dequantization** | 1D-4D input | N | N | Y | +| **Requantization** | 1D-4D input | N | N | Y | + +Besides direct operator optimizations, we also provide graph fusion passes listed in the table below. Users can choose to enable or disable these fusion patterns through environmental variables. + +For example, you can enable all FP32 fusion passes in the following table by: + +``` +export MXNET_SUBGRAPH_BACKEND=MKLDNN +``` + +And disable `Convolution + Activation` fusion by: + +``` +export MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU=1 +``` + +When generating the corresponding INT8 symbol, users can enable INT8 operator fusion passes as following: + +``` +# get qsym after model quantization +qsym = qsym.get_backend_symbol('MKLDNN_QUANTIZE') +qsym.save(symbol_name) # fused INT8 operators will be save into the symbol JSON file +``` + +| Fusion pattern | Disable | +| --- | --- | +| Convolution + Activation | MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU | +| Convolution + elemwise_add | MXNET_DISABLE_MKLDNN_FUSE_CONV_SUM | +| Convolution + BatchNorm | MXNET_DISABLE_MKLDNN_FUSE_CONV_BN | +| Convolution + Activation + elemwise_add | | +| Convolution + BatchNorm + Activation + elemwise_add | | +| FullyConnected + Activation(ReLU) | MXNET_DISABLE_MKLDNN_FUSE_FC_RELU | +| Convolution (INT8) + re-quantization | | +| FullyConnected (INT8) + re-quantization | | +| FullyConnected (INT8) + re-quantization + de-quantization | | + + +To install MXNet MKL-DNN backend, please refer to [MKL-DNN backend readme](MKLDNN_README.md) + +For performance numbers, please refer to [performance on Intel CPU](../../faq/perf.md#intel-cpu) diff --git a/tests/tutorials/test_sanity_tutorials.py b/tests/tutorials/test_sanity_tutorials.py index 7865000c7608..f89c23484568 100644 --- a/tests/tutorials/test_sanity_tutorials.py +++ b/tests/tutorials/test_sanity_tutorials.py @@ -35,6 +35,7 @@ 'gluon/index.md', 'mkldnn/index.md', 'mkldnn/MKLDNN_README.md', + 'mkldnn/operator_list.md', 'nlp/index.md', 'onnx/index.md', 'python/index.md',