Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-1258]fix unittest for ROIAlign Operator #13609

Merged
merged 12 commits into from
Feb 5, 2019

Conversation

wkcn
Copy link
Member

@wkcn wkcn commented Dec 11, 2018

Description

Fixes #11064
It passes the test whose MXNET_TEST_SEED=35650200.

The old unittest for ROIAlign computes in float64, so there is some float precision problem.
Fixes #11064

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • use float32 in the unittest for ROIAlign OP
  • set atol to 1e-5 in the unittest for ROIAlign OP

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@wkcn wkcn changed the title fix unittest for ROIAlign Operator [MXNET-1258]fix unittest for ROIAlign Operator Dec 11, 2018
@wkcn
Copy link
Member Author

wkcn commented Dec 11, 2018

It seems there is a problem in Windows-CPU CI.

WindowsError: [Error 5] Access is denied: 'c:\\Anaconda3\\envs\\py2\\Lib\\site-packages\\six-1.12.0.dist-info\\RECORD.pip'
You are using pip version 9.0.1, however version 18.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.
Usage: python.exe -m nose [options]
python.exe -m nose: error: no such option: --with-timer
Error running unittest

@roywei
Copy link
Member

roywei commented Dec 12, 2018

@mxnet-label-bot add[CI, Flaky, pr-awaiting-review]

@anirudhacharya
Copy link
Member

can you run the test >500 times and paste the results here to ensure that the test is no longer flaky.

@wkcn
Copy link
Member Author

wkcn commented Dec 17, 2018

@anirudhacharya Thank you.
I wrote the test code and tested it on CPU and GPU.
CPU: 1000+ times
GPU: 1000+ times

It works fine.
I hope it will be no longer flaky :)

Copy link
Member

@anirudhacharya anirudhacharya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a few unnecessary newlines.

tests/python/unittest/test_operator.py Outdated Show resolved Hide resolved
y = max(0.0, y)
return T(0.0), []
x = T(max(0.0, x))
y = T(max(0.0, y))
x_low = int(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not typecast x_low to np.float32 here itself and remove the many typecasting statements below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of x_low is int in the C implementation.

In Python, the default type of float is float64, however the float type in the C implementation is float32.
If the types are not consistent, there will be calculation error.

So I use T to typecast some variables to np.float32 to keep the type consistency between C and Python unittest.


out[r, c, ph, pw] = val * 1.0 / count
out[r, c, ph, pw] = val / count
assert out.dtype == T, out.dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a more descriptive error message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I will add more detail.

@Roshrini
Copy link
Member

Roshrini commented Jan 2, 2019

@anirudhacharya Can you take a look again?

@anirudhacharya
Copy link
Member

LGTM

@anirudhacharya
Copy link
Member

@wkcn please rebase and resolve conflicts

@mxnet-label-bot update [pr-awaiting-merge]

@marcoabreu marcoabreu added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed CI Flaky pr-awaiting-review PR is waiting for code review labels Jan 14, 2019
@wkcn
Copy link
Member Author

wkcn commented Jan 15, 2019

@anirudhacharya In my own project, there is also a ROIAlignOP unittest, and I found that nvcc will change the order of operators because of compiler optimization, causing the problem of float precision. So please not merge the PR temporarily, and I will check it. [link]

@stu1130
Copy link
Contributor

stu1130 commented Jan 15, 2019

@mxnet-label-bot update [pr-work-in-progress]
let us know once you've done the check

@marcoabreu marcoabreu added pr-work-in-progress PR is still work in progress and removed pr-awaiting-merge Review and CI is complete. Ready to Merge labels Jan 15, 2019
@wkcn
Copy link
Member Author

wkcn commented Jan 16, 2019

@stu1130 Thanks!

@wkcn
Copy link
Member Author

wkcn commented Jan 17, 2019

test fail in test activation
#13915

@wkcn
Copy link
Member Author

wkcn commented Jan 17, 2019

@anirudhacharya @stu1130
Hi! I think the PR is finished.
Since nvcc will change the order of operator, I increase the tolerance to 1e-3.

@sandeep-krishnamurthy sandeep-krishnamurthy added Test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress labels Jan 28, 2019
@sandeep-krishnamurthy
Copy link
Contributor

@anirudhacharya - Can you please a look at this PR?

@anirudhacharya
Copy link
Member

@sandeep-krishnamurthy have already approved this PR, above.

Copy link
Contributor

@vandanavk vandanavk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Could you add "Fixes #11064" in the PR description? This will automatically close the issue when the PR is merged

@vandanavk
Copy link
Contributor

@mxnet-label-bot update [Test, pr-awaiting-merge]

@marcoabreu marcoabreu added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-review PR is waiting for code review labels Feb 5, 2019
@sandeep-krishnamurthy sandeep-krishnamurthy merged commit 7c7af3a into apache:master Feb 5, 2019
stephenrawls pushed a commit to stephenrawls/incubator-mxnet that referenced this pull request Feb 16, 2019
* fix roi align test

* retrigger unittest

* add more test detail for ROIAlign test

* remove url in test_op_roi_align

* remove blank line in test_op_roi_align in test_operator

* merge master

* Update test_operator.py

* retrigger CI
vdantu pushed a commit to vdantu/incubator-mxnet that referenced this pull request Mar 31, 2019
* fix roi align test

* retrigger unittest

* add more test detail for ROIAlign test

* remove url in test_op_roi_align

* remove blank line in test_op_roi_align in test_operator

* merge master

* Update test_operator.py

* retrigger CI
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* fix roi align test

* retrigger unittest

* add more test detail for ROIAlign test

* remove url in test_op_roi_align

* remove blank line in test_op_roi_align in test_operator

* merge master

* Update test_operator.py

* retrigger CI
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge Test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: test_operator.test_op_roi_align
8 participants