Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Make to_tensor and normalize to accept 3D or 4D tensor inputs #13614

Conversation

sandeep-krishnamurthy
Copy link
Contributor

@sandeep-krishnamurthy sandeep-krishnamurthy commented Dec 11, 2018

Description

With this change:

  • ToTensor transformation operator can take 3D (h, w, c) or 4D (n, h, w, c) at once.
  • Normalize transformation operator can take 3D (c, h, w) or 4D (n, c, h, w) at once.
  • Also parallelized the operator with omp
  • This change will be required to fuse transformation pipeline with network graph during inference. Where inputs are usually 4D inputs (n, c, h, w).
  • This is backward compatible change, this do not change any existing behavior.

In another PR, I will work on to support list of 3D tensors (list of images) as input for different shape image tensors use case.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Support 3D or 4D input tensors to to_tensor and normalize transformation operator.

TODO

  • API documentation updates. (WIP)

@stu1130 @zhreshold @apeforest

@sandeep-krishnamurthy sandeep-krishnamurthy changed the title Make to_tensor to accept 1 or batch of images Make to_tensor to accept 3D or 4D tensor inputs Dec 11, 2018
@sandeep-krishnamurthy sandeep-krishnamurthy changed the title Make to_tensor to accept 3D or 4D tensor inputs Make to_tensor and normalize to accept 3D or 4D tensor inputs Dec 11, 2018
Copy link
Contributor

@stu1130 stu1130 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


for (int i = 0; i < channel; ++i) {
DType mean = param.mean[param.mean.ndim() > 1 ? i : 0];
DType std_dev = param.std[param.std.ndim() > 1 ? i : 0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mean and std_dev should be float type defined by line 115, 116

MSHADOW_TYPE_SWITCH(inputs[0].type_flag_, DType, {
float* output = outputs[0].dptr<float>();
DType* input = inputs[0].dptr<DType>();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#pragma omp parallel for collapse(2)

would this make better performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot be used inside Macro.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about checking the DType by ourself without using the Macro

DType mean = param.mean[param.mean.ndim() > 1 ? i : 0];
DType std_dev = param.std[param.std.ndim() > 1 ? i : 0];
for (int j = 0; j < length; ++j) {
output[step + i*length + j] = (input[step + i*length + j] - mean) / std_dev;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if input is int, should it be int or float after nomarlization? I prefer float here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, it should be float. Making the change.

src/operator/image/image_random-inl.h Show resolved Hide resolved
@roywei
Copy link
Member

roywei commented Dec 12, 2018

@mxnet-label-bot add[Gluon, Data-loading, Operator]

Examples
--------
>>> transformer = transforms.Normalize(mean=(0, 1, 2), std=(3, 2, 1))
>>> image = mx.nd.random.uniform(0, 1, (3, 4, 2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example of 4D (N x C x H x W) here?

<< "Input image must have shape (height, width, channels), or "
<< "(N, height, width, channels) but got " << shp;
if (shp.ndim() == 3) {
SHAPE_ASSIGN_CHECK(*out_attrs, 0, TShape({shp[2], shp[0], shp[1]}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be clearer to define enum constant N, W, C, H instead of using 0, 1, 2, 3


for (int l = 0; l < length; ++l) {
for (int c = 0; c < channel; ++c) {
output[step + c*length + l] = static_cast<float>(input[step + l*channel + c]) / 255.0f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

255.0f is already a float, so this cast may not be needed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question is why 255.0f? Maybe using a constant variable with clear name is more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Making the change.


#pragma omp parallel for
for (auto n = 0; n < batch_size; ++n) {
ToTensorImpl(inputs, outputs, length, channel, n*step);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change!

<< "Input tensor must have shape (channels, height, width), or "
<< "(N, channels, height, width), but got " << dshape;

uint32_t nchannels;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Good learning!

data_in = np.random.uniform(0, 255, (300, 300, 3)).astype(dtype=np.uint8)
out_nd = transforms.ToTensor()(nd.array(data_in, dtype='uint8'))
out_nd = transforms.ToTensor()(nd.array(data_in))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the dtype in original 3D input test?

assert_almost_equal(data_expected_4d, out_nd_4d.asnumpy())

# Invalid Input - Neither 3D or 4D input
invalid_data_in = nd.random.uniform(0, 1, (5, 5, 3, 300, 300)).astype(dtype=np.float32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

const int batch_size = inputs[0].shape_[0];
const int length = inputs[0].shape_[1] * inputs[0].shape_[2];
const int channel = inputs[0].shape_[3];
const int step = channel * length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large can this value be in practice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, IMO make step unsigned long long int would be enough?

ToTensorImpl(inputs, outputs, length, channel);
} else if (inputs[0].ndim() == 4) {
// 4D input batch of images
const int batch_size = inputs[0].shape_[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large can this value be in practice?

if (shp.ndim() == 3) {
SHAPE_ASSIGN_CHECK(*out_attrs, 0, TShape({shp[2], shp[0], shp[1]}));
} else if (shp.ndim() == 4) {
SHAPE_ASSIGN_CHECK(*out_attrs, 0, TShape({shp[0], shp[3], shp[1], shp[2]}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can shp[0] be zero?

@sandeep-krishnamurthy
Copy link
Contributor Author

@stu1130 @apeforest - I have addressed your comments. I have raised a separate PR - #13802 only for Normalize. It also includes GPU support.

Closing this PR. I request you to review other PR. Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants