-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
…known failing seeds.
Very nice feature, thanks a lot for putting so much effort into this! |
""" | ||
|
||
try: | ||
next_seed = np.random.randint(0, np.iinfo(np.int32).max) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate how this restores the pre-block state? To me this looks like it will always pick a new random seed after this block finishes. I rather expected that the old sequence generator is being restored. Imagine two sequences:
1: 100 -> 101 -> 102 -> ...
2: 200 -> 201 -> 202 -> ...
What I would expect is something along the lines of:
100
101
START: [random_seed(seq_2)]
200
201
202
END: [random_seed(seq_2)]
102
103
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're asking about the "with random_seed(NNN)" construct (as opposed to the with_seed() decorator. One issue faced here is that the mxnet rng state is not save-and-restorable, so we can't achieve exactly what you were expecting. What's promised is this:
Upon conclusion of the block, the rng's are returned to
a state that is a function of their pre-block state, so
any prior non-determinism is preserved.
So 'with random_seed(NNN)' forces determinism on the random values seen inside its block, without changing the determinism/non-determinism of the values seen outside the block. Likewise, 'with random_seed()' (so no numeric arg) forces non-determinism on the random values seen inside its block, without changing the determinism/non-determinism of the values seen inside the block.
Let's take a use-case that explores this feature: a test that initially gets both random shapes and random data to populate tensors of those shapes. Let's say the test is flaky and it's determined that it's best to work with a known set of shapes. The test could be then coded as:
@with_seed()
def test_stuff():
# seed 1234 produces a variety of shapes known to work with this test
with random_seed(1234):
shapes = <get 'random' shapes>
# fill shapes with random data
tensors = populate_tensors(shapes)
<perform test>
Let's say now that the test is seen to fail sporadically, and test seed 987654321 exposes the problem. The methodology of this PR should still work:
MXNET_TEST_SEED=987654321 nosetests --verbose -s test_stuff_module.py:test_stuff
will repro the failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for elaborating, this sounds very reasonable.
I just imagined a few nested blocks with random and hardcoded seeds and in my mind, your process seems to add up. None the less, could you please create a few unit tests to ensure that this behaviour works as expected and preserves a consistent sequence of number even if multiple blocks are nested, some seeds are hardcoded and so on? I think it's reasonable to validate our assumptions in case future changes get introduced.
Please also make sure to test the GPU context intensively, as we've experienced (known) issues on multi-GPU systems.
def test_sequential_warning(): | ||
with warnings.catch_warnings(record=True) as w: | ||
# The following line permits the test to pass if run multiple times | ||
warnings.simplefilter('always') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what case would a test run multiple times and why shall it fail in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR provides an easy way to run a test multiple times under different seeds. I found that this particular test would fail under those circumstances without this fix, when invoked as:
MXNET_TEST_COUNT=100 nosetests --verbose tests/python/unittest/test_gluon.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sounds like a valid failure that should be resolved. Imagine a user calling the same method in a server and failing after some time because we might have some memory leak or something else that causes this error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default behavior of 'warnings' is to collapse identical messages. This test purposefully does something that it thinks should generate a message. For the 'MXNET_TEST_COUNT=100' example above, it would not. Thus, the test should ensure, independent of the global policy, that identical messages are not squashed. Try the following:
python
Python 2.7.12 (default, Nov 20 2017, 18:23:56)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet as mx
>>> import warnings
>>> from mxnet import gluon
>>> warnings.simplefilter('always')
>>> b = gluon.nn.Sequential()
>>> b.add(gluon.nn.Dense(20))
>>> b.hybridize()
/home/dcarter/mxnet_dev/dgx/mxnet/python/mxnet/gluon/nn/basic_layers.py:83: UserWarning: All children of this Sequential layer are HybridBlocks. Consider using HybridSequential for the best performance.
warnings.warn('All children of this Sequential layer are HybridBlocks. Consider ' \
>>> b.hybridize()
/home/dcarter/mxnet_dev/dgx/mxnet/python/mxnet/gluon/nn/basic_layers.py:83: UserWarning: All children of this Sequential layer are HybridBlocks. Consider using HybridSequential for the best performance.
warnings.warn('All children of this Sequential layer are HybridBlocks. Consider ' \
>>>
If you start over and repeat this sequence, but leave out the 'warnings.simplefilter('always')' line, then the final 'b.hybridize()' line will output nothing.
I invite you to pursue making 'warnings.simplefilter('always')' part of all gluon usage, but independent of that, I think this test should leave this line in, since that's what the test needs to pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, thanks for the elaboration.
@@ -360,8 +373,12 @@ def backward(self, dm, dn): | |||
|
|||
assert_almost_equal(x.grad.asnumpy(), dx1) | |||
assert_almost_equal(y.grad.asnumpy(), dy1) | |||
# atol = 1e-6 | |||
# assert_almost_equal(x.grad.asnumpy(), dx1, atol=atol) | |||
# assert_almost_equal(y.grad.asnumpy(), dy1, atol=atol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Restore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. My error to leave this in.
data = mx.symbol.Variable('data') | ||
out = mx.symbol.L2Normalization(data=data, mode=mode, eps=norm_eps) | ||
# TODO(szha): Seeding this masks failures. We need to do a deep dive for failures without this seed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't remove the TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved both the TODO and the seed-setting to the beginning of the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, thanks
Awesome! |
Would it be possible to write a test for this feature? Especially around the |
…known failing seeds.
I'd like to see this feature and the stated assumptions being tested, but otherwise it LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
# To repro, comment-out skip line, then execute: | ||
# MXNET_TEST_SEED=990952066 nosetests --verbose tests/python/unittest/test_operator.py:test_dropout | ||
@unittest.skip("test fails with seed 990952066: 0 output seen with dropout ratio=0. Not yet tracked.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix the test? Dropout is quite important and we shouldn't disable it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is better solved by someone closer to the modeling community. I'll explain what I know as an issue and put the test on a fixed seed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created issue #9816.
Approved for merge. Please resolve the test failure if possible. |
@DickJC123 would you mind creating another PR which adds documentation for these features? Good places could be the readme in the tests dir or maybe even somewhere on the website. I'll leave it up to you. |
Sure.
… On Feb 18, 2018, at 9:08 AM, Marco de Abreu ***@***.***> wrote:
@DickJC123 would you mind creating another PR which adds documentation for these features?
Good places could be the readme in the tests dir or maybe even somewhere on the website. I'll leave it up to you.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
* Adds with_seed() decorator to unittests. Will fail CI due to demo of known failing seeds. * Fix failing tests, remove hard-coded bad seeds. * Adds with_seed() decorator to gpu tests. Will fail CI due to demo of known failing seeds. * Fix failing test, add test of 'with random_seed()' * Add with_seed() to test_gluon_model_zoo_gpy.py * Increase atol for test_adam * Added more 'with random_seed()' testing. Hardcode bad test_training seed. * test_training put back on random seeding * Switched test_dropout to fixed seed, created issue * Hardcoding seed that caused a test_rsp_push_pull CI core-dump * test_rsp_push_pull CI core-dump not reproduced, so unsetting seed
* Adds with_seed() decorator to unittests. Will fail CI due to demo of known failing seeds. * Fix failing tests, remove hard-coded bad seeds. * Adds with_seed() decorator to gpu tests. Will fail CI due to demo of known failing seeds. * Fix failing test, add test of 'with random_seed()' * Add with_seed() to test_gluon_model_zoo_gpy.py * Increase atol for test_adam * Added more 'with random_seed()' testing. Hardcode bad test_training seed. * test_training put back on random seeding * Switched test_dropout to fixed seed, created issue * Hardcoding seed that caused a test_rsp_push_pull CI core-dump * test_rsp_push_pull CI core-dump not reproduced, so unsetting seed
Description
This is a rebasing and partial squashing of the stale "ci test randomness2" PR #8526 . It's based on the premise that the CI should test the framework the way the users do- with lots of different data. Having tests run with random data has been problematic though, with flaky tests being a constant annoyance. This PR supplies the tools to track down the flakiness by explicitly seeding all tests that use random data, and reporting those seeds if the test fails. The first commit of the PR rolls out the with_seed() decorator to tests under the tests/python/unittests directory with some hard-coded seeds designed to expose existing flakiness. Follow-up commits will supply known fixes. Further commits will touch the tests/python/gpu directory tests.
From PR #8526:
This PR introduces a new simple level of control over unittest random seeds while providing inter-test random number generator (RNG) isolation. This PR is an improved replacement to the pending PR #8313. The improvements over that PR are:
To provide this functionality, a custom decorator "@with_seed()" was created. This was considered more powerful than the nosetests "@with_setup()" facility, and less disruptive than changing all tests to become methods of a nosetests test class. The proposed new approach is demonstrated on a simple "test_module.py" test file of three tests. Assuming that the second test needs a set seed for robustness, the file might currently appear as:
Even though test_op3() is OK with nondeterministic data, it will have only a single dataset because it is run after test_op2, which sets the seed. Also, if test_op1() were to fail, there would be no way to reproduce the failure, except for running the test individually to produce a new and hopefully similar failure.
With the proposed approach, the test file becomes:
By importing unittests/common.py, the seeds of the numpy and mxnet RNGs are set initially to a random "module seed" that is output to the log file. The initial RNGs can be thought of as module-level RNGs that are isolated from the individual tests and that provide the string of "test seeds" that determine the behavior of each test's RNGs. The "@with_seed()" test function decorator requests a test seed from the module RNG, sets the numpy, python and mxnet seeds appropriately and runs the test. Should the test fail, the seed for that test is output to the log file. Pass or fail, the decorator reinstates the module RNG state before the next test's decorator is executed, effectively isolating the tests. Debugging a failing test_op3 in the example would proceed as follows:
Because test_op3 failed, its seed appeared in the log file. Also, the test_op2 seed was displayed as a reminder that that test needs more work before it is robust enough for random data. The command to reproduce the problem is produced simply by cutting and pasting from the log:
If test_op3 instead dumped core, the test seed would not be initially apparent. Assuming the core-dump is repeatable based on the data, the module would first be re-run with the command:
The log would now include the test seeds for all tests before they are run, so the test could then be run in isolation as before with MXNET_TEST_SEED=2096230603.
Let's assume that test_op3 was altered by increasing a tolerance. How robust is the test now? This can be explored by repeating the test many times as in:
Finally, this PR adds the @with_seed() decorator for all tests in modules that use random numbers. Also, it includes many specific test robustness fixes that were exposed once this new methodology was adopted internally by the author.
Checklist
Essentials
make lint
)Changes
Comments