-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Performance improving for MKL-DNN Quantized FullyConnected #14528
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good to use an enum instead of the hardcode number in the code.
LGTM.
@@ -48,6 +48,12 @@ enum FullyConnectedOpResource {kTempSpace}; | |||
enum FullyConnectedOpOutputs {kOut}; | |||
} // fullc | |||
|
|||
namespace quantized_fullc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quantized_fc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to align with fullc :)
@@ -195,6 +195,13 @@ void SgMKLDNNFCOp::Forward(const OpContext &ctx, | |||
} | |||
|
|||
MKLDNNFCForwardFullFeature(full_param_, ctx, fwd_.get(), new_inputs, new_req, out_data); | |||
|
|||
if (mkldnn_param.quantized && !mkldnn_param.enable_float_output) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comments on why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's straightforward here, the OutMin and OutMax are only valid when the op is quantized and not generating fp32 output.
@mxnet-label-bot update [MKLDNN, Performance, pr-awaiting-testing] |
@anirudh2290 @ZhennanQin @xinyu-intel to review |
@@ -52,15 +47,15 @@ void MKLDNNQuantizedFullyConnectedForward(const nnvm::NodeAttrs &attrs, | |||
NDArray weight = in_data[fullc::kWeight]; | |||
|
|||
const float min_data = | |||
in_data[num_inputs + quantized_fc_enum::kDataMin].data().dptr<float>()[0]; | |||
in_data[num_inputs + quantized_fullc::kDataMin].data().dptr<float>()[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite strange usage. Why not define a whole input sets with original inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original inputs might not include bias, which results in different index for all these min/max. Just to simplify the ordering for quantized op only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks for your contribution. Merging now. |
* Cached bias to Quantized FullyCOnnected based on Subgraph to improve performance * retrigger CI * retrigger CI
Description
The patch is mainly for improving the performance of MKL-DNN quantized FullyConnected.
@pengzhao-intel @TaoLv
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments