-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Make to_tensor and normalize to accept 3D or 4D tensor inputs #13614
Make to_tensor and normalize to accept 3D or 4D tensor inputs #13614
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sandeep-krishnamurthy
|
||
for (int i = 0; i < channel; ++i) { | ||
DType mean = param.mean[param.mean.ndim() > 1 ? i : 0]; | ||
DType std_dev = param.std[param.std.ndim() > 1 ? i : 0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mean and std_dev should be float type defined by line 115, 116
MSHADOW_TYPE_SWITCH(inputs[0].type_flag_, DType, { | ||
float* output = outputs[0].dptr<float>(); | ||
DType* input = inputs[0].dptr<DType>(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#pragma omp parallel for collapse(2) |
would this make better performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot be used inside Macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about checking the DType by ourself without using the Macro
DType mean = param.mean[param.mean.ndim() > 1 ? i : 0]; | ||
DType std_dev = param.std[param.std.ndim() > 1 ? i : 0]; | ||
for (int j = 0; j < length; ++j) { | ||
output[step + i*length + j] = (input[step + i*length + j] - mean) / std_dev; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if input is int, should it be int or float after nomarlization? I prefer float here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, it should be float. Making the change.
@mxnet-label-bot add[Gluon, Data-loading, Operator] |
Examples | ||
-------- | ||
>>> transformer = transforms.Normalize(mean=(0, 1, 2), std=(3, 2, 1)) | ||
>>> image = mx.nd.random.uniform(0, 1, (3, 4, 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example of 4D (N x C x H x W) here?
<< "Input image must have shape (height, width, channels), or " | ||
<< "(N, height, width, channels) but got " << shp; | ||
if (shp.ndim() == 3) { | ||
SHAPE_ASSIGN_CHECK(*out_attrs, 0, TShape({shp[2], shp[0], shp[1]})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be clearer to define enum constant N, W, C, H instead of using 0, 1, 2, 3
|
||
for (int l = 0; l < length; ++l) { | ||
for (int c = 0; c < channel; ++c) { | ||
output[step + c*length + l] = static_cast<float>(input[step + l*channel + c]) / 255.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
255.0f is already a float, so this cast may not be needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question is why 255.0f? Maybe using a constant variable with clear name is more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Making the change.
|
||
#pragma omp parallel for | ||
for (auto n = 0; n < batch_size; ++n) { | ||
ToTensorImpl(inputs, outputs, length, channel, n*step); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good change!
<< "Input tensor must have shape (channels, height, width), or " | ||
<< "(N, channels, height, width), but got " << dshape; | ||
|
||
uint32_t nchannels; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use int or int32_t. See: https://google.github.io/styleguide/cppguide.html#Integer_Types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Good learning!
data_in = np.random.uniform(0, 255, (300, 300, 3)).astype(dtype=np.uint8) | ||
out_nd = transforms.ToTensor()(nd.array(data_in, dtype='uint8')) | ||
out_nd = transforms.ToTensor()(nd.array(data_in)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove the dtype in original 3D input test?
assert_almost_equal(data_expected_4d, out_nd_4d.asnumpy()) | ||
|
||
# Invalid Input - Neither 3D or 4D input | ||
invalid_data_in = nd.random.uniform(0, 1, (5, 5, 3, 300, 300)).astype(dtype=np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
const int batch_size = inputs[0].shape_[0]; | ||
const int length = inputs[0].shape_[1] * inputs[0].shape_[2]; | ||
const int channel = inputs[0].shape_[3]; | ||
const int step = channel * length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large can this value be in practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, IMO make step unsigned long long int would be enough?
ToTensorImpl(inputs, outputs, length, channel); | ||
} else if (inputs[0].ndim() == 4) { | ||
// 4D input batch of images | ||
const int batch_size = inputs[0].shape_[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large can this value be in practice?
if (shp.ndim() == 3) { | ||
SHAPE_ASSIGN_CHECK(*out_attrs, 0, TShape({shp[2], shp[0], shp[1]})); | ||
} else if (shp.ndim() == 4) { | ||
SHAPE_ASSIGN_CHECK(*out_attrs, 0, TShape({shp[0], shp[3], shp[1], shp[2]})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can shp[0] be zero?
@stu1130 @apeforest - I have addressed your comments. I have raised a separate PR - #13802 only for Normalize. It also includes GPU support. Closing this PR. I request you to review other PR. Thanks. |
Description
With this change:
ToTensor
transformation operator can take 3D (h, w, c) or 4D (n, h, w, c) at once.Normalize
transformation operator can take 3D (c, h, w) or 4D (n, c, h, w) at once.In another PR, I will work on to support list of 3D tensors (list of images) as input for different shape image tensors use case.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
TODO
@stu1130 @zhreshold @apeforest