-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why sharing data type in ShareBufferWith fixes unittests for mkldnn ops using non-float datatypes? #37306
Comments
@iclementine I started to analyze situation and will come back to you soon. |
@iclementine So, data type in this situation you are describing is bf16 which 2 bytes per element as oppose to floating point data type which is four bytes per element. Unit test for inplace scenario is using "buffer_share" op that does call "ShareBufferWith(<src bf16_tensor)" . In case of this unit test src bf16 tensor is holding 100 elements of which each one is of size of two bytes. The thing is that there was no bigger design analysis behind this solution. My colleague just was debugging and noticed function ShareBufferWith() is not transfering type which cause problems. And by his (and mine) understanding ShareBufferWith() should transfer data type. If you have an idea how to have this fixed otherwise we would be happy to read about it. |
@jczaja For an operator to enable inplace strategy, the output resues the memory of the input in some way. But the kernel still has to take care of the data type of the output tensor. When the kernel allocates the memory of the ouput (via But when the memory is not allocated by some allocator, but shared with other allocation, there should be a separate step to set the data type. (Though I think it should be the responsibility of the operator to infer the correct data type of the return Variable.) So I add a step to set the data type.( See #PR 37247 for details. |
@iclementine Ok. If there is not Tensor::set_type() then mutable_data needs to be call. We have analogical constructions in some other ops supporting inplace e.g. softmax, activations.. |
@jczaja Thank you. I'm going to do the same to other operators. |
@jczaja Hi, I have a question about inplace mechanism in mkldnn ops.
Description
I was implementing a operator that can view real tensors(dtype: float32 or float64) of shape
(d1, d2, ..., 2)
as complex tensors(dtype: complex64 or complex128, respectively) of shape(d1, d2, ...)
. This requires that sharing buffer(or storage) between 2 tensors with different data types. (PR: #37240)Paddle shares storage with
ShareBufferWith
.I tried by removing
type_ = tensor.type_
. I could actually share storage bewteen real and complex tensors by removing this. But I notices that there was a related PR where this line was added.PR #24853 Fix bug in ShareBufferWith causing non-fp32 inplace uts to fail
Also, removing this line PR #37247 disable copying of datatype when sharing buffer between two tensorscause a failure of
test_elementwise_add_bf16_mkldnn_op
. The build log(see the attachment) shows that there is a mismatch in memory size, possible caused by error in data type.But I think there could be a better way to solve this problem. By definition, data type is a metadatum associated with a Tensor instead of a storage object.
Question
So I why sharing data type in
ShareBufferWith
fixes unittests for mkldnn ops using non-float datatypes when inplace strategy is enabled. If this is figured out, we could fix this while avoid sharing data types inShareBufferWith
, this sharingstorage between Tensor with different data types is possible.
Thank you!
build.log
The text was updated successfully, but these errors were encountered: