Why sharing data type in ShareBufferWith fixes unittests for mkldnn ops using non-float datatypes? #37306

iclementine · 2021-11-17T12:08:23Z

@jczaja Hi, I have a question about inplace mechanism in mkldnn ops.

Description

I was implementing a operator that can view real tensors(dtype: float32 or float64) of shape (d1, d2, ..., 2) as complex tensors(dtype: complex64 or complex128, respectively) of shape (d1, d2, ...). This requires that sharing buffer(or storage) between 2 tensors with different data types. (PR: #37240)

Paddle shares storage with ShareBufferWith.

void ShareBufferWith(const Tensor& tensor) {
  holder_ = tensor.holder_;
  offset_ = tensor.offset_;
  type_ = tensor.type_;
}

I tried by removing type_ = tensor.type_. I could actually share storage bewteen real and complex tensors by removing this. But I notices that there was a related PR where this line was added.

PR #24853 Fix bug in ShareBufferWith causing non-fp32 inplace uts to fail

During call to ShareBufferWith, the type wasn't assigned when holders were assigned. This causes inplace UTs that use non-fp32 tensors (for example int8, uint8) to fail on FetchOp.

Also, removing this line PR #37247 disable copying of datatype when sharing buffer between two tensorscause a failure of test_elementwise_add_bf16_mkldnn_op. The build log(see the attachment) shows that there is a mismatch in memory size, possible caused by error in data type.

But I think there could be a better way to solve this problem. By definition, data type is a metadatum associated with a Tensor instead of a storage object.

Question

So I why sharing data type in ShareBufferWith fixes unittests for mkldnn ops using non-float datatypes when inplace strategy is enabled. If this is figured out, we could fix this while avoid sharing data types in ShareBufferWith, this sharing
storage between Tensor with different data types is possible.

Thank you!

build.log

The text was updated successfully, but these errors were encountered:

jczaja · 2021-11-17T16:19:06Z

@iclementine I started to analyze situation and will come back to you soon.

jczaja · 2021-11-18T15:07:14Z

@iclementine
I'm not sure If I can answer the question. As former colleague of ours was implementing that fix.

So, data type in this situation you are describing is bf16 which 2 bytes per element as oppose to floating point data type which is four bytes per element. Unit test for inplace scenario is using "buffer_share" op that does call "ShareBufferWith(<src bf16_tensor)" . In case of this unit test src bf16 tensor is holding 100 elements of which each one is of size of two bytes.
If ShareButterWith() is not setting type to bf16 in target buffer then default type (float) is being set. And later on when there is fetch_handle_op working e.g. TensorCopy is called there is "check_memory_size()" which compares tensorsize(200) with numel()*sizeof(float) and as 200 is not equal 400 then there is assertion not matched.

The thing is that there was no bigger design analysis behind this solution. My colleague just was debugging and noticed function ShareBufferWith() is not transfering type which cause problems. And by his (and mine) understanding ShareBufferWith() should transfer data type.

If you have an idea how to have this fixed otherwise we would be happy to read about it.

iclementine · 2021-11-19T08:57:26Z

@jczaja
I think I have found the reason why not sharing data type would cause inplace elementwise op for mkldnn with non-float data type to fail.

For an operator to enable inplace strategy, the output resues the memory of the input in some way. But the kernel still has to take care of the data type of the output tensor.

When the kernel allocates the memory of the ouput (via Tesor.mutable_data<T> or platform::BinaryMKLDNNHandler<T>.AcquireDstMemory<T>(Tensor*), etc), the data type is also set to the desired type.

But when the memory is not allocated by some allocator, but shared with other allocation, there should be a separate step to set the data type. (Though I think it should be the responsibility of the operator to infer the correct data type of the return Variable.)

So I add a step to set the data type.(mutable_data<T>(...) only set data type when the memory of a tensor is allocated and do not need to re-allocate.)

See #PR 37247 for details.

jczaja · 2021-11-19T11:57:30Z

@iclementine Ok. If there is not Tensor::set_type() then mutable_data needs to be call. We have analogical constructions in some other ops supporting inplace e.g. softmax, activations..

iclementine · 2021-11-22T03:22:50Z

@jczaja Thank you. I'm going to do the same to other operators.

lidanqing-intel assigned jczaja Nov 17, 2021

lidanqing-intel added the Intel label Nov 17, 2021

lidanqing-intel closed this as completed Nov 22, 2021

iclementine mentioned this issue Nov 22, 2021

disable copying of datatype when sharing buffer between two tensors. #37247

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why sharing data type in ShareBufferWith fixes unittests for mkldnn ops using non-float datatypes? #37306

Why sharing data type in ShareBufferWith fixes unittests for mkldnn ops using non-float datatypes? #37306

iclementine commented Nov 17, 2021 •

edited

Loading

jczaja commented Nov 17, 2021

jczaja commented Nov 18, 2021

iclementine commented Nov 19, 2021 •

edited

Loading

jczaja commented Nov 19, 2021

iclementine commented Nov 22, 2021

Why sharing data type in ShareBufferWith fixes unittests for mkldnn ops using non-float datatypes? #37306

Why sharing data type in ShareBufferWith fixes unittests for mkldnn ops using non-float datatypes? #37306

Comments

iclementine commented Nov 17, 2021 • edited Loading

Description

Question

jczaja commented Nov 17, 2021

jczaja commented Nov 18, 2021

iclementine commented Nov 19, 2021 • edited Loading

jczaja commented Nov 19, 2021

iclementine commented Nov 22, 2021

iclementine commented Nov 17, 2021 •

edited

Loading

iclementine commented Nov 19, 2021 •

edited

Loading