-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Integrate MKLDNN Conv1d and support 3d layout #13530
Conversation
@mxnet-label-bot add [MKLDNN, Operator, pr-awaiting-review] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please point out where are the unit tests for this feature. Is it covered by the existing CI? Notice activation is also changed in this PR.
@TaoLv address Tao's comments. These changes are already covered by |
} else if (arr.shape().ndim() == 3) { | ||
tz = num_groups > 1 | ||
? mkldnn::memory::dims{num_groups, | ||
static_cast<int>(arr.shape()[0] / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use O, I, H, W instead of 0, 1, 2 ...
dilates[0] = param.conv_param.dilate[0] - 1; | ||
dilates[1] = param.conv_param.dilate[1] - 1; | ||
} else { | ||
LOG(FATAL) << "MKL-DNN currently supports 1d and 2d convolution"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better if we can mention the given dimension in the error message. Same in other LOG(FATAL).
Do we have unit test for 1D Convolution without grouping? Please also make sure that this change works well for quantized Convolution (which I think doesn't support 1D yet). |
Address Tao's comments. I've skip conv1d convolution on both C level and Python level. @ZhennanQin Please help take a look. Maybe skip all non-4d data layout in quantization.py is not a good choice. |
python/mxnet/contrib/quantization.py
Outdated
@@ -488,6 +488,9 @@ def quantize_model(sym, arg_params, aux_params, | |||
A tuple of quantized symbol, quantized arg_params, and aux_params. | |||
------- | |||
""" | |||
if ctx == cpu(0) and len(calib_data.provide_data[0].shape) != 3: | |||
raise ValueError('MKL-DNN quantized OPs temporary support 4d layout.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't check calib_data as quantization flow support non-calib mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@zheng-da @ZhennanQin Please help to review and approve if no further concerns. |
LGTM. |
Seems there was problem with CI. Do you mind re-triggering it with an empty commit? @xinyu-intel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the test for 1D activation and quantization conv (see if the msg are printed as expectation?
dims.resize(shape.ndim() + 1); | ||
dims[0] = 1; | ||
for (size_t i = 0; i < shape.ndim(); i++) | ||
dims[i + 1] = shape[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a performance difference between 3D and 4D implementation?
num_groups, static_cast<int>(arr.shape()[N] / num_groups), | ||
static_cast<int>(arr.shape()[C]), static_cast<int>(arr.shape()[H]), | ||
static_cast<int>(arr.shape()[W])}; | ||
return mkldnn::memory::desc{tz, get_mkldnn_type(arr.dtype()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is the common part of dim3 and dim4, right?
static_cast<int>(arr.shape()[3])}; | ||
return mkldnn::memory::desc{tz, get_mkldnn_type(arr.dtype()), | ||
mkldnn::memory::format::any}; | ||
CHECK((arr.shape().ndim() == 3) || (arr.shape().ndim() == 4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a variable to save the value of arr.shape().ndim()
to avoid mutiple time call
@pengzhao-intel Conv1d will fall back to native cpu implement before this optimization. 100 iterations total time(ms) on Xeon Skylake 8180 1 socket:
I've add 1d,3d,4d data shape to activation test. Regarding quantized conv, it will now return error when the data shape is 3d and users should exclude this layer:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks for the contributions :)
@zheng-da Please take a look. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase code then I think it's good to merge.
@xinyu-intel Thank you for the contribution. Now merging. |
I noticed a very nice improvement with this change. Still, for my use-case: conv1d, small batch size, small channel dimension, long sequence length, The simplified snippet to reproduce (conv1d.py):
When run on a host with a lot of cores (AWS c4.8xlarge) results in:
There is no improvement when number of threads is increased to 4, or more. Playing more with this example, for higher 'channels' value, it starts to be a little better parallelizable. Bartosz |
@bputrycz really thanks for trying the new MKLDNN API and give us the very useful feedback. In a short time, you can try to launch multiple instances (processor) and each one binds to 2/4 cores to get the max throughput in case this is the inference case. |
@TaoLv is following this issue and will back to you with more details soon. |
@bputrycz Cannot reproduce the problem. The code snippet you provided scales well on my machine:
Have you ever tried the cpu affinity before run multi-thread case?
|
Ummm, just notice that you're using c4.8xlarge which I think have no AVX512. Will have another try and come back to you later. |
Confirmed this problem exists on machine without AVX512 as threading optimization for 1D Convolution is made for AVX512 only. Would you mind having a try on AWS c5? @bputrycz |
Yes. I confirm. Thank you very much. |
It's good to see the problem can be resolved on AWS c5. |
@bputrycz Also feel free to let me know if the performance on c4 is critical for you. |
@TaoLv Performance on c4 is not critical for me, as for now. |
* add 3d layout support for MKLDNN Conv and Activation * fix lint * code refactor * add testcase for group1 conv and skip quantization for conv1d * fix lint * avoid conv1d quantization * code refactor and add activation ut * del todo
* add 3d layout support for MKLDNN Conv and Activation * fix lint * code refactor * add testcase for group1 conv and skip quantization for conv1d * fix lint * avoid conv1d quantization * code refactor and add activation ut * del todo
Description
This PR aims to integrate MKLDNN Conv1d and enable 3d layout for Conv and Activation.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
@pengzhao-intel @TaoLv