Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MKLDNN] add quantized sum #14614

Merged
merged 32 commits into from
Apr 30, 2019
Merged

[MKLDNN] add quantized sum #14614

merged 32 commits into from
Apr 30, 2019

Conversation

rongzha1
Copy link
Contributor

@rongzha1 rongzha1 commented Apr 4, 2019

Description

add quantized sum impl for mkldnn, support int8&&uint8

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • [done ] Changes are complete (i.e. I finished coding on this PR)
  • [done ] All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • [done ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • [done ] Feature1, tests, (and when applicable, API doc)
    testcase is added in : tests/python/quantization/test_quantization.py test_quantized_sum()

@pengzhao-intel
Copy link
Contributor

@anirudh2290 @ZhennanQin @TaoLv @ciyongch to review :)

@pengzhao-intel pengzhao-intel changed the title add quantized sum [MKLDNN] add quantized sum Apr 4, 2019
@pengzhao-intel
Copy link
Contributor

@mxnet-label-bot Add [Quantization, MKLDNN]

@marcoabreu marcoabreu added MKLDNN Quantization Issues/Feature Requests related to Quantization labels Apr 4, 2019
}

NNVM_REGISTER_OP(_contrib_quantized_sum)
.describe(R"code(Adds arguments element-wise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc Outdated Show resolved Hide resolved
// dataA && dataB are uint8
if (out_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8) {
output_data_range = kInt8Range;
output_data_type = mkldnn::memory::s8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

float B_scale = GetScale(in_data[quantized_sum_enum::kDataB], dataB_min, dataB_max);
// rescaled_mem is for reorder mkldnn memory
std::shared_ptr<mkldnn::memory> rescaled_mem;
// output default set as int32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Int32 by default. Do we have any other choice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when fusion with requantize op, out put is int8/uint8

auto s8_pd = (dataA_int8 == true)
? dataA_mem->get_primitive_desc()
: dataB_mem->get_primitive_desc();
rescaled_mem = std::make_shared<mkldnn::memory>(s8_pd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will allocate memory here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reorder ( line 134 ) is done in this if() field, so need allocate memory first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventionally, we don't want to allocate memory implicitly inside MKL-DNN API. Besides, seems this allocation will happen every iteration which is performance problematic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mkldnn sum doesn't support int8 + uint8, so need to reorder them to the same data type first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change them to TmpMemMgr::Get()->Alloc

output_min = 0 - output_max;
}

std::vector<float> scales;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many scales do we have? Is it possible to reserve space for them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two, scale 0 for dataA, scale 1 for dataB. OK will reserve first

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest:

// scale 0 is for data A, scale 1 is for data B
std::vector<float> scales(2);


auto dataA_mem = in_data[quantized_sum_enum::kDataA].GetMKLDNNData();
auto dataB_mem = in_data[quantized_sum_enum::kDataB].GetMKLDNNData();
bool dataA_int8 = (in_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8) ? true : false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK。 add const for const variable

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc Outdated Show resolved Hide resolved
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add doc info in the header of the new files, including Copyright/brief/author...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


auto dataA_mem = in_data[quantized_sum_enum::kDataA].GetMKLDNNData();
auto dataB_mem = in_data[quantized_sum_enum::kDataB].GetMKLDNNData();
const bool dataA_int8 = (in_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_dataA_int8 could be better for understanding..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


DMLC_REGISTER_PARAMETER(RequantizeSumParam);

static float GetScale(const NDArray& data, float min, float max) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline func?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

if (out_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8) {
output_data_range = kInt8Range;
output_data_type = mkldnn::memory::s8;
} else if (out_data[quantized_sum_enum::kDataA].dtype() == mshadow::kUint8) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L74 & L77, is it kOut but not kDataA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

out_data_scale = output_data_range/MaxAbs(output_min, output_max);
} else {
output_max = dataA_absmax + dataB_absmax;
output_min = 0 - output_max;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_min = -output_max;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

if (in_type->at(i) == mshadow::kInt8) {
TYPE_ASSIGN_CHECK(*in_type, i, mshadow::kInt8);
} else {
TYPE_ASSIGN_CHECK(*in_type, i, mshadow::kUint8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHECK(in_type->at(i) == mshadow::kInt8 || in_type->at(i) == mshadow::kUint8);

@@ -21,7 +21,7 @@

#include "mkldnn_conv_property.h"
#include "mkldnn_fc_property.h"
#include "mkldnn_conv_post_quantize_property.h"
#include "mkldnn_post_quantize_property.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you remove the conv part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge them in one file


@with_seed()
def test_quantized_sum():
def check_quantized_sum(data_shape, qtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add test case in test_subgraph.py

@pengzhao-intel
Copy link
Contributor

Please retrigger the CI

@pengzhao-intel
Copy link
Contributor

@TaoLv @ciyongch @ZhennanQin please help review again :)

Copy link
Member

@TaoLv TaoLv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the operator is not called quantized_elemwise_add? sum is another operator which is used to accumulate elements of an array.

#include <utility>
#include <vector>
#include <string>
#include "../../tensor/elemwise_unary_op.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure these headers are used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unnecessary head file


struct RequantizeSumParam : public dmlc::Parameter<RequantizeSumParam> {
dmlc::optional<float> min_calib_range; // min float value calculated from calibration dataset
dmlc::optional<float> max_calib_range; // max float value calculated from calibration dataset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comments. I think these two parameters are already described in L43 and L48.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

auto dataA_mem = in_data[quantized_sum_enum::kDataA].GetMKLDNNData();
auto dataB_mem = in_data[quantized_sum_enum::kDataB].GetMKLDNNData();
const bool is_dataA_int8 = (in_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8)
? true : false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const bool is_dataA_int8 = (in_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

} else if (out_data[quantized_sum_enum::kOut].dtype() == mshadow::kUint8) {
output_data_range = kUint8Range;
output_data_type = mkldnn::memory::u8;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add else clause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

output_min = 0 - output_max;
}

std::vector<float> scales;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest:

// scale 0 is for data A, scale 1 is for data B
std::vector<float> scales(2);

if (is_dataA_int8 == true) {
u8_reorder_scale = out_data_scale/B_scale;
scales.push_back(out_data_scale/A_scale);
scales.push_back(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scales[0] = out_data_scale / A_scale;
scales[1] = 1.0f;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
mkldnn::memory::format i_fmt = static_cast<mkldnn::memory::format>(
in_pds[quantized_sum_enum::kDataA].desc().data.format);
auto output_desc = memory::desc(i_dims, output_data_type, i_fmt);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mkldnn::memory::desc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

NNVM_REGISTER_OP(_contrib_quantized_sum)
.set_attr<FInferStorageType>("FInferStorageType", SumStorageType)
.set_attr<FComputeEx>("FComputeEx<cpu>", MKLDNNQuantizedSumForward)
.set_attr<FResourceRequest>("FResourceRequest", [](const NodeAttrs& n) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need resource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

}

NNVM_REGISTER_OP(_contrib_quantized_sum)
.describe(R"code(elem_add operator for input dataA and input dataB data type of int8,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elem_add?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to elemwise_add

@rongzha1
Copy link
Contributor Author

Why the operator is not called quantized_elemwise_add? sum is another operator which is used to accumulate elements of an array.

has changed from quantized_sum to quantized_elemwise_add

Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pengzhao-intel
Copy link
Contributor

@TaoLv @ciyongch @ZhennanQin please help review the change again.

Copy link
Member

@TaoLv TaoLv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments. Please fix parameter indents after function names are changed,

}

static void MKLDNNQuantizedElemwiseAddForward(const nnvm::NodeAttrs& attrs, const OpContext& ctx,
const std::vector<NDArray>& in_data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix indent.

// A, B, A_min, A_max, B_min, B_max
CHECK_EQ(in_data.size(), 6U);
// C, C_min, C_max
CHECK_EQ(out_data.size(), 3U);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some descriptive message for these two checks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the error message if the check is failed~

if (params.max_calib_range.has_value() && params.min_calib_range.has_value()) {
output_min = params.min_calib_range.value();
output_max = params.max_calib_range.value();
out_data_scale = output_data_range/MaxAbs(output_min, output_max);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add spaces before and after /.

float u8_reorder_scale = 0;
if (params.max_calib_range.has_value() && params.min_calib_range.has_value()) {
if (is_dataA_int8 == true) {
u8_reorder_scale = out_data_scale/B_scale;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

namespace op {

static bool ElemwiseAddShape(const nnvm::NodeAttrs& attrs, mxnet::ShapeVector* in_shape,
mxnet::ShapeVector* out_shape) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix indent.


fp32_rslt = output.asnumpy()
int8_rslt = qoutput.asnumpy()*max_val/0x7fffffff
assert_almost_equal(int8_rslt, int8_rslt, atol = 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why choose atol=1?

// A, B, A_min, A_max, B_min, B_max
CHECK_EQ(in_data.size(), 6U);
// C, C_min, C_max
CHECK_EQ(out_data.size(), 3U);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the error message if the check is failed~

scales[0] = out_data_scale / A_scale;
scales[1] = out_data_scale / B_scale;
} else {
scales[0] = dataA_absmax*output_data_range / ((dataA_absmax + dataB_absmax)*dataA_range);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please also add spaces around operation *.

.set_attr<FCompute>("FCompute<cpu>", QuantizedElemwiseAddForward)
.set_attr<FNeedRequantize>("FNeedRequantize", [](const NodeAttrs& attrs) { return true; })
.add_argument("lhs", "NDArray-or-Symbol", "first input")
.add_argument("rhs", "NDArray-or-Symbol", "4th input")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does 4th input mean the order of parameter list when users call this operator? Seems it does not align with the order of FListInputNames.

@TaoLv
Copy link
Member

TaoLv commented Apr 24, 2019

CI is not passed yet~ Please take a look. Thank you. @rongzha1

.set_attr<FNeedRequantize>("FNeedRequantize", [](const NodeAttrs& attrs) { return true; })
.add_argument("lhs", "NDArray-or-Symbol", "first input")
.add_argument("rhs", "NDArray-or-Symbol", "second input")
.add_argument("lhs_min", "NDArray-or-Symbol", "second input")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be third?

.set_attr<FInferStorageType>("FInferStorageType", ElemwiseAddStorageType)
.set_attr<FComputeEx>("FComputeEx<cpu>", MKLDNNQuantizedElemwiseAddForward)
.set_attr<bool>("TIsMKLDNN", true)
.set_attr_parser(ParamParser<RequantizeElemwiseAddParam>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quantize in the operator name but requantize in the param name. Is it intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. this is for fusion with requantized

Copy link
Contributor

@ZhennanQin ZhennanQin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Just minor comment.

}
// C
int dtype = mshadow::kInt32;
#if MXNET_USE_MKLDNN == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a feature of mkldnn. Consider to remove this macro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Done

Copy link
Member

@TaoLv TaoLv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments. Now it's approved.

@pengzhao-intel
Copy link
Contributor

Finally, the CI pass. Thanks for the contribution :)

Merging now.

@pengzhao-intel pengzhao-intel merged commit 84c1635 into apache:master Apr 30, 2019
access2rohit pushed a commit to access2rohit/incubator-mxnet that referenced this pull request May 14, 2019
* add quantized sum

* fix gpu compiler error and cpu testcase fail

* add default forward function for quantized_sum

* skip quantized_sum for gpu ctx

* fix comments

* fix indetation and comments

* retrigger CI

* alloc memeory through TmpMemMgr

*  fix comments Apr.12

* change sum to elemwise_add

* change Sum to ElemwiseAdd

* fix indents

* retrigger CI

* trigger CI

* fix indentation and typo

* trigger CI

* fix typo

* fix typo

* remove USE_MKLDNN macro for requantize params

* rename param same as its op

* trigger CI

* trigger CI

* trigger CI
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* add quantized sum

* fix gpu compiler error and cpu testcase fail

* add default forward function for quantized_sum

* skip quantized_sum for gpu ctx

* fix comments

* fix indetation and comments

* retrigger CI

* alloc memeory through TmpMemMgr

*  fix comments Apr.12

* change sum to elemwise_add

* change Sum to ElemwiseAdd

* fix indents

* retrigger CI

* trigger CI

* fix indentation and typo

* trigger CI

* fix typo

* fix typo

* remove USE_MKLDNN macro for requantize params

* rename param same as its op

* trigger CI

* trigger CI

* trigger CI
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
MKLDNN Quantization Issues/Feature Requests related to Quantization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants