-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
@anirudh2290 @ZhennanQin @TaoLv @ciyongch to review :) |
@mxnet-label-bot Add [Quantization, MKLDNN] |
} | ||
|
||
NNVM_REGISTER_OP(_contrib_quantized_sum) | ||
.describe(R"code(Adds arguments element-wise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change the document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// dataA && dataB are uint8 | ||
if (out_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8) { | ||
output_data_range = kInt8Range; | ||
output_data_type = mkldnn::memory::s8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
float B_scale = GetScale(in_data[quantized_sum_enum::kDataB], dataB_min, dataB_max); | ||
// rescaled_mem is for reorder mkldnn memory | ||
std::shared_ptr<mkldnn::memory> rescaled_mem; | ||
// output default set as int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Int32 by default. Do we have any other choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when fusion with requantize op, out put is int8/uint8
auto s8_pd = (dataA_int8 == true) | ||
? dataA_mem->get_primitive_desc() | ||
: dataB_mem->get_primitive_desc(); | ||
rescaled_mem = std::make_shared<mkldnn::memory>(s8_pd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will allocate memory here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reorder ( line 134 ) is done in this if() field, so need allocate memory first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conventionally, we don't want to allocate memory implicitly inside MKL-DNN API. Besides, seems this allocation will happen every iteration which is performance problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mkldnn sum doesn't support int8 + uint8, so need to reorder them to the same data type first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change them to TmpMemMgr::Get()->Alloc
output_min = 0 - output_max; | ||
} | ||
|
||
std::vector<float> scales; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many scales do we have? Is it possible to reserve space for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two, scale 0 for dataA, scale 1 for dataB. OK will reserve first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest:
// scale 0 is for data A, scale 1 is for data B
std::vector<float> scales(2);
|
||
auto dataA_mem = in_data[quantized_sum_enum::kDataA].GetMKLDNNData(); | ||
auto dataB_mem = in_data[quantized_sum_enum::kDataB].GetMKLDNNData(); | ||
bool dataA_int8 = (in_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8) ? true : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK。 add const for const variable
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add doc info in the header of the new files, including Copyright/brief/author...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
|
||
auto dataA_mem = in_data[quantized_sum_enum::kDataA].GetMKLDNNData(); | ||
auto dataB_mem = in_data[quantized_sum_enum::kDataB].GetMKLDNNData(); | ||
const bool dataA_int8 = (in_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_dataA_int8 could be better for understanding..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
DMLC_REGISTER_PARAMETER(RequantizeSumParam); | ||
|
||
static float GetScale(const NDArray& data, float min, float max) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
if (out_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8) { | ||
output_data_range = kInt8Range; | ||
output_data_type = mkldnn::memory::s8; | ||
} else if (out_data[quantized_sum_enum::kDataA].dtype() == mshadow::kUint8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L74 & L77, is it kOut
but not kDataA
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
out_data_scale = output_data_range/MaxAbs(output_min, output_max); | ||
} else { | ||
output_max = dataA_absmax + dataB_absmax; | ||
output_min = 0 - output_max; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_min = -output_max;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
if (in_type->at(i) == mshadow::kInt8) { | ||
TYPE_ASSIGN_CHECK(*in_type, i, mshadow::kInt8); | ||
} else { | ||
TYPE_ASSIGN_CHECK(*in_type, i, mshadow::kUint8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK(in_type->at(i) == mshadow::kInt8 || in_type->at(i) == mshadow::kUint8);
@@ -21,7 +21,7 @@ | |||
|
|||
#include "mkldnn_conv_property.h" | |||
#include "mkldnn_fc_property.h" | |||
#include "mkldnn_conv_post_quantize_property.h" | |||
#include "mkldnn_post_quantize_property.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you remove the conv part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge them in one file
|
||
@with_seed() | ||
def test_quantized_sum(): | ||
def check_quantized_sum(data_shape, qtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add test case in test_subgraph.py
Please retrigger the CI |
@TaoLv @ciyongch @ZhennanQin please help review again :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the operator is not called quantized_elemwise_add
? sum
is another operator which is used to accumulate elements of an array.
#include <utility> | ||
#include <vector> | ||
#include <string> | ||
#include "../../tensor/elemwise_unary_op.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure these headers are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove unnecessary head file
|
||
struct RequantizeSumParam : public dmlc::Parameter<RequantizeSumParam> { | ||
dmlc::optional<float> min_calib_range; // min float value calculated from calibration dataset | ||
dmlc::optional<float> max_calib_range; // max float value calculated from calibration dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove comments. I think these two parameters are already described in L43 and L48.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
auto dataA_mem = in_data[quantized_sum_enum::kDataA].GetMKLDNNData(); | ||
auto dataB_mem = in_data[quantized_sum_enum::kDataB].GetMKLDNNData(); | ||
const bool is_dataA_int8 = (in_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8) | ||
? true : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const bool is_dataA_int8 = (in_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
} else if (out_data[quantized_sum_enum::kOut].dtype() == mshadow::kUint8) { | ||
output_data_range = kUint8Range; | ||
output_data_type = mkldnn::memory::u8; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add else
clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
output_min = 0 - output_max; | ||
} | ||
|
||
std::vector<float> scales; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest:
// scale 0 is for data A, scale 1 is for data B
std::vector<float> scales(2);
if (is_dataA_int8 == true) { | ||
u8_reorder_scale = out_data_scale/B_scale; | ||
scales.push_back(out_data_scale/A_scale); | ||
scales.push_back(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scales[0] = out_data_scale / A_scale;
scales[1] = 1.0f;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
mkldnn::memory::format i_fmt = static_cast<mkldnn::memory::format>( | ||
in_pds[quantized_sum_enum::kDataA].desc().data.format); | ||
auto output_desc = memory::desc(i_dims, output_data_type, i_fmt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mkldnn::memory::desc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
NNVM_REGISTER_OP(_contrib_quantized_sum) | ||
.set_attr<FInferStorageType>("FInferStorageType", SumStorageType) | ||
.set_attr<FComputeEx>("FComputeEx<cpu>", MKLDNNQuantizedSumForward) | ||
.set_attr<FResourceRequest>("FResourceRequest", [](const NodeAttrs& n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need resource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
} | ||
|
||
NNVM_REGISTER_OP(_contrib_quantized_sum) | ||
.describe(R"code(elem_add operator for input dataA and input dataB data type of int8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elem_add
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to elemwise_add
has changed from quantized_sum to quantized_elemwise_add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@TaoLv @ciyongch @ZhennanQin please help review the change again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. Please fix parameter indents after function names are changed,
} | ||
|
||
static void MKLDNNQuantizedElemwiseAddForward(const nnvm::NodeAttrs& attrs, const OpContext& ctx, | ||
const std::vector<NDArray>& in_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix indent.
// A, B, A_min, A_max, B_min, B_max | ||
CHECK_EQ(in_data.size(), 6U); | ||
// C, C_min, C_max | ||
CHECK_EQ(out_data.size(), 3U); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some descriptive message for these two checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the error message if the check is failed~
if (params.max_calib_range.has_value() && params.min_calib_range.has_value()) { | ||
output_min = params.min_calib_range.value(); | ||
output_max = params.max_calib_range.value(); | ||
out_data_scale = output_data_range/MaxAbs(output_min, output_max); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add spaces before and after /
.
float u8_reorder_scale = 0; | ||
if (params.max_calib_range.has_value() && params.min_calib_range.has_value()) { | ||
if (is_dataA_int8 == true) { | ||
u8_reorder_scale = out_data_scale/B_scale; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
namespace op { | ||
|
||
static bool ElemwiseAddShape(const nnvm::NodeAttrs& attrs, mxnet::ShapeVector* in_shape, | ||
mxnet::ShapeVector* out_shape) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix indent.
|
||
fp32_rslt = output.asnumpy() | ||
int8_rslt = qoutput.asnumpy()*max_val/0x7fffffff | ||
assert_almost_equal(int8_rslt, int8_rslt, atol = 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why choose atol=1?
// A, B, A_min, A_max, B_min, B_max | ||
CHECK_EQ(in_data.size(), 6U); | ||
// C, C_min, C_max | ||
CHECK_EQ(out_data.size(), 3U); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the error message if the check is failed~
scales[0] = out_data_scale / A_scale; | ||
scales[1] = out_data_scale / B_scale; | ||
} else { | ||
scales[0] = dataA_absmax*output_data_range / ((dataA_absmax + dataB_absmax)*dataA_range); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please also add spaces around operation *
.
.set_attr<FCompute>("FCompute<cpu>", QuantizedElemwiseAddForward) | ||
.set_attr<FNeedRequantize>("FNeedRequantize", [](const NodeAttrs& attrs) { return true; }) | ||
.add_argument("lhs", "NDArray-or-Symbol", "first input") | ||
.add_argument("rhs", "NDArray-or-Symbol", "4th input") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does 4th input
mean the order of parameter list when users call this operator? Seems it does not align with the order of FListInputNames
.
CI is not passed yet~ Please take a look. Thank you. @rongzha1 |
.set_attr<FNeedRequantize>("FNeedRequantize", [](const NodeAttrs& attrs) { return true; }) | ||
.add_argument("lhs", "NDArray-or-Symbol", "first input") | ||
.add_argument("rhs", "NDArray-or-Symbol", "second input") | ||
.add_argument("lhs_min", "NDArray-or-Symbol", "second input") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be third
?
.set_attr<FInferStorageType>("FInferStorageType", ElemwiseAddStorageType) | ||
.set_attr<FComputeEx>("FComputeEx<cpu>", MKLDNNQuantizedElemwiseAddForward) | ||
.set_attr<bool>("TIsMKLDNN", true) | ||
.set_attr_parser(ParamParser<RequantizeElemwiseAddParam>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's quantize
in the operator name but requantize
in the param name. Is it intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. this is for fusion with requantized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Just minor comment.
} | ||
// C | ||
int dtype = mshadow::kInt32; | ||
#if MXNET_USE_MKLDNN == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a feature of mkldnn. Consider to remove this macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments. Now it's approved.
Finally, the CI pass. Thanks for the contribution :) Merging now. |
* add quantized sum * fix gpu compiler error and cpu testcase fail * add default forward function for quantized_sum * skip quantized_sum for gpu ctx * fix comments * fix indetation and comments * retrigger CI * alloc memeory through TmpMemMgr * fix comments Apr.12 * change sum to elemwise_add * change Sum to ElemwiseAdd * fix indents * retrigger CI * trigger CI * fix indentation and typo * trigger CI * fix typo * fix typo * remove USE_MKLDNN macro for requantize params * rename param same as its op * trigger CI * trigger CI * trigger CI
* add quantized sum * fix gpu compiler error and cpu testcase fail * add default forward function for quantized_sum * skip quantized_sum for gpu ctx * fix comments * fix indetation and comments * retrigger CI * alloc memeory through TmpMemMgr * fix comments Apr.12 * change sum to elemwise_add * change Sum to ElemwiseAdd * fix indents * retrigger CI * trigger CI * fix indentation and typo * trigger CI * fix typo * fix typo * remove USE_MKLDNN macro for requantize params * rename param same as its op * trigger CI * trigger CI * trigger CI
Description
add quantized sum impl for mkldnn, support int8&&uint8
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
testcase is added in : tests/python/quantization/test_quantization.py test_quantized_sum()