-
Notifications
You must be signed in to change notification settings - Fork 6.8k
FIX: flaky test exponential generator #14287
Conversation
@anirudh2290 @eric-haibin-lin @apeforest Can you please review ? |
python/mxnet/test_utils.py
Outdated
@@ -1933,7 +1933,7 @@ def chi_square_check(generator, buckets, probs, nsamples=1000000): | |||
_, p = ss.chisquare(f_obs=obs_freq, f_exp=expected_freq) | |||
return p, obs_freq, expected_freq | |||
|
|||
def verify_generator(generator, buckets, probs, nsamples=1000000, nrepeat=5, success_rate=0.25, alpha=0.05): | |||
def verify_generator(generator, buckets, probs, nsamples=1000000, nrepeat=5, success_rate=0.15, alpha=0.05): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we change the value from 0.15 to 0.25?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it's changed from 0.25 to 0.15, but I have the same question. Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question. The comment in this method says "25%" success rate. Why do we change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuxihu @mseth10 @apeforest
It was changed in this PR: /~https://github.com/apache/incubator-mxnet/pull/13498/files
Even though It is mentioned in the comments that success rate should be 25%. There isn't any data to support why it should be kep at 25%. It could have been a typo too back then. SO, reverting it to original value seemed more logical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is, due to increasing the strictness of this check for all generators it might cause them to become flaky. I can also explicitly relax the criteria for only this generator and we can observe the behavior of others and make changes accordingly. Setting it to a fixed value for all the generators doesn't seem like a good idea since all the random generators are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuxihu : Actually No. He in fact made the check more strict. I am simply relaxing it. Therefore it shouldn't make another test flaky. But to make our tests better what I am suggesting is that i let it stay at 0.25 as default and explicitly pass 0.15 for exponential generator check. If others don't fail maybe only this one needed adjustment. Do you think this approach is better ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its fine to revert it to 0.25 to reduce flakiness. it wont make the other test flaky since this change is decreasing the failure chances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@access2rohit I like your idea of passing 0.15 for exponential generator check and let 0.25 be the default for others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuxihu @apeforest @mseth10 @anirudh2290 : I have updated successrate to 20% which is less stricter but not too loose a check. Currently running it 10k times and will update the PR once they succeed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All tests passed with success_rate=0.20
@mxnet-label-bot add [pr-awaiting-review] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (after going through discussion and realizing that probably exponential generator needs relaxed checks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@access2rohit Please make the change as you commented: make 0.25 as default and pass in 0.15 for exponential generator check.
…nerator to 20% to fix its flakiness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@anirudh2290 @apeforest can you review as well ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more question: has this test been disabled? If so, do you also need to enable it in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* changing success_percentage test correctness of random exponential generator to 20% to fix its flakiness * Re-Trigger build
* changing success_percentage test correctness of random exponential generator to 20% to fix its flakiness * Re-Trigger build
Description
change success_rate to 0.20 for test_exponential_generator since 0.25 much stricter and fails random exponential generator tests.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Testing
Command:
ubuntu@ip-172-31-82-110 ~/Workspace/mxnet/tests/python/unittest (master) $ MXNET_TEST_COUNT=10000 nosetests --logging-level=DEBUG --verbose -s test_random.py:test_exponential_generator
Output:
[DEBUG] 10000 of 10000: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=291062812 to reproduce.
ok
Ran 1 test in 46911.498s
OK