-
Notifications
You must be signed in to change notification settings - Fork 6.8k
test_ImageRecordIter_seed_augmentation flaky test fix #12485
test_ImageRecordIter_seed_augmentation flaky test fix #12485
Conversation
0deffe9
to
6aa8d75
Compare
Thanks for your contribution @perdasilva @mxnet-label-bot[pr-work-in-progress] |
@perdasilva - Did you run the tests multiple times to verify it is not flaky anymore? |
6aa8d75
to
97ae4a9
Compare
@sandeep-krishnamurthy sorry for the delay in the response. I've been absent lately because life got in the way. The issue was the multi-threading, I've just done a rebase from master and it seems the number of threads has been reduced to 1 (which fixes the flakyness). I need to look at the history of the file to see what's changed since I was last looking at this problem. EDIT: The problem is still there. I think something might have changed in the build system in the mean time and MP wasn't enabled. I've rebuilt from scratch with MP and it's still flaky. |
I have sketched out a fix to the flakiness here, although its a bit of a hack at the moment. One random number generator is created per preprocessing thread (/~https://github.com/apache/incubator-mxnet/blob/master/src/io/iter_image_recordio_2.cc#L162). While the image records are retrieved in the same order (/~https://github.com/apache/incubator-mxnet/blob/master/src/io/iter_image_recordio_2.cc#L508) and always stored in the same index in the output array (/~https://github.com/apache/incubator-mxnet/blob/master/src/io/iter_image_recordio_2.cc#L567) - they are not guaranteed to be processed by the same thread every time. This matter because the default augmenter seeds the thread's random number generator (/~https://github.com/apache/incubator-mxnet/blob/master/src/io/image_aug_default.cc#L255) the first time it gets called (/~https://github.com/apache/incubator-mxnet/blob/master/src/io/iter_image_recordio_2.cc#L563). This means that for different runs, the same record can be processed in a different iteration of the usage of the random number generator by the augmenters, thus leading to different random numbers being generated, which lead to different changes to the image -> flakyness. To remedy the issue, I've changed to code such that the augmenter seed is a parameter to the image parser. I then use this seed and the image record index to seed the random number generator (outside of the default augmenter) before any changes are made to the image. Since the records are always retrieved in the same order, the same record will always have the same generator seed, independent of the number of threads used -> same random numbers being generated -> reproducible. It's a bit of a hack, so I would appreciate some input from the code owner (@anirudh2290 ) and the community. |
c6152f9
to
1550da3
Compare
@haojin2 - Can you provide some inputs here? Thanks. |
@haojin2 @apeforest Can you please review this PR? Thanks |
@haojin2 @apeforest ping for reviewing the PR! |
src/io/iter_image_recordio_2.cc
Outdated
LOG(INFO) << "tid: " << tid << " idx: " << idx << " index: " << rec.image_index(); | ||
} | ||
if (param_.seed_aug.has_value()) { | ||
prnds_[tid]->seed(idx + param_.seed_aug.value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand why seeds need to be reset all the time. The logic is a bit obscure - the random generators are in the augmentor and get passed to the iterator.
Anyway, what you could try to do (if not refactoring) is to seed all generators in DefaultImageAugmenter::Init
here or in a separate method (since the random generator is not passed in to Init
). I guess the interface ImageAugmenter
would require a separate seed
method then that could be called sequentially (instead of openmp parallelized).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two requirements here: that setting the seed will yield reproducible results and that parallelization should be used to speed up image augmentation. We need to reset the seed for each image because there is no guarantee that the same image will be processed by the same thread, or that even that it will be the ith image processed by that thread, across independent runs or even different hardware (the code figures out the number of threads to use for processing). Meaning the order of draws from the random number generator to augment the same image is not guaranteed. Therefore, resetting the random number generator at the start of processing an image it the only way (at least that I could think of) to guarantee reproducibility when setting a fixed seed. That is, to guarantee that the same random distortions will be applied to the same image across independent runs and different hardware configurations. I hope this make it a little clearer. It's not the easiest topic to discuss in written form lol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your assessment - it's pretty obscure and it took me a while to figure out exactly what was going on.
1550da3
to
12ffd99
Compare
@@ -165,6 +167,8 @@ struct ImageRecParserParam : public dmlc::Parameter<ImageRecParserParam> { | |||
.describe("The data shuffle buffer size in MB. Only valid if shuffle is true."); | |||
DMLC_DECLARE_FIELD(shuffle_chunk_seed).set_default(0) | |||
.describe("The random seed for shuffling"); | |||
DMLC_DECLARE_FIELD(seed_aug).set_default(dmlc::optional<int>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I am missing something here, but the PR says it fixes a flaky test.
Are we adding this new field just to fix the flakiness of a particular test? It does not seem right. we can't change the functionality, just so that it will be easier for us to run tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair call. However, the problem, I think, is architectural.
The seed_aug parameter is passed in to the default image augmenter (/~https://github.com/apache/incubator-mxnet/blob/master/src/io/image_aug_default.cc#L101). But, the actual random number generator is external, but seeded internally (/~https://github.com/apache/incubator-mxnet/blob/master/src/io/image_aug_default.cc#L254).
As I've explained above, I can't see how to meet the requirements of processing the images in parallel and ensuring that the output is reproducible (by setting the augmentation seed).
My fix for the flakiness (without violating the aforementioned requirements), is to ensure that the RNG used to process the image is seeded before processing the image within that thread it is processed in.
To be clear, none of the changes made were to facilitate running the tests, but to actually correct the underlying problem. I agree that the PR is a little rough around the edges atm, but what I'm trying to do is reach out to the community to understand what would be the best way to actually fix the underlying problem.
tests/python/unittest/test_io.py
Outdated
@@ -427,7 +427,7 @@ def check_CSVIter_synthetic(dtype='float32'): | |||
for dtype in ['int32', 'int64', 'float32']: | |||
check_CSVIter_synthetic(dtype=dtype) | |||
|
|||
@unittest.skip("Flaky test: /~https://github.com/apache/incubator-mxnet/issues/11359") | |||
# @unittest.skip("Flaky test: /~https://github.com/apache/incubator-mxnet/issues/11359") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove rather than comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Although again, I need a review of my current approach to see if its the right one. If I get validation from that, I can then prepare the PR for proper review...
Many of nits I'll fix before removing [WIP]
@perdasilva thanks for your contributions. We are looking forward to merging your PR. Could you please address comments above to proceed further? |
728586b
to
6984ba6
Compare
ac885eb
to
5dcea9d
Compare
@perdasilva is it ready for review? |
5dcea9d
to
aa3d167
Compare
@stu1130 I will polish it tomorrow and submit it for review. I'm sorry for the delay. |
aa3d167
to
c92ccba
Compare
@stu1130 ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good
tests/python/unittest/test_io.py
Outdated
def assert_dataiter_equals(dataiter1, dataiter2): | ||
for batch1, batch2 in zip_longest(dataiter1, dataiter2): | ||
# ensure iterators are of same length | ||
assert(batch1 and batch2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this assert doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a function for asserting that two data iterators are producing the same output. The data iterator iterates over batches (lists of arrays). So, basically it runs through the iterators to get the batches, then iterates through the batches and ensures that they are equal.
tests/python/unittest/test_io.py
Outdated
for batch1, batch2 in zip_longest(dataiter1, dataiter2): | ||
|
||
# try to ensure iterators are of same length | ||
assert(batch1 and batch2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this ensuring same length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that (as far as I could tell), there was no easy way of getting the number of batches in an iterator. It doesn't have a size or length method. So, I'm using the zip_longest (https://docs.python.org/3.0/library/itertools.html#itertools.zip_longest) to iterate through the iterators together. If they are don't have the same length, a fill value (None by default) will be returned in place of an item (or batch in this case). Therefore, if one of batch1 or batch2 is none, the assertion will be triggered.
|
||
# check whether seed_aug changes the iterator behavior | ||
dataiter = mx.io.ImageRecordIter( | ||
dataiter1 = mx.io.ImageRecordIter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is getting tested here that is not being tested in the first test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was already here when I got here =) I think it was meant as a sanity check to ensure that the seed aug parameter is not influencing the iterator (other than in the case where the iterator has augmentations applied to its images).
if sys.version_info >= (3,0): | ||
from itertools import zip_longest | ||
else: | ||
from itertools import izip_longest as zip_longest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion for handling different python versions
try:
from itertools import izip_longest as zip_longest
except:
from itertools import zip_longest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated it with your suggestion. Much nicer.
c92ccba
to
451291b
Compare
@anirudhacharya all good now? |
39117c5
to
88d2d6d
Compare
LGTM. |
@ChaiBapchya @stu1130 Could you please review this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Have you run the test_ImageRecordIter_seed_augmentation at least 10000 times to see if it's still flaky?
@stu1130 - not 10000x. I will do that tomorrow and confirm ^^ |
@stu1130 - ok each test takes around 10 seconds - might take a tad bit longer than expected... |
@stu1130 - I can confirm, all good and flake free, please merge at your convenience |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@anirudh2290 could you take a look at this and merge it if it looks good |
@anirudh2290 any chance you could merge this? =) |
@marcoabreu @sandeep-krishnamurthy Can someone merge this PR? |
…re each augmentation to guarantee reproducibilit
88d2d6d
to
1861e85
Compare
@sandeep-krishnamurthy @nswamy could someone merge this PR? |
* Moves seed_aug parameter to ImageRecParserParam and re-seeds RNG before each augmentation to guarantee reproducibilit * Update image record iterator tests to check the whole iterator not only first image
* Moves seed_aug parameter to ImageRecParserParam and re-seeds RNG before each augmentation to guarantee reproducibilit * Update image record iterator tests to check the whole iterator not only first image
* Moves seed_aug parameter to ImageRecParserParam and re-seeds RNG before each augmentation to guarantee reproducibilit * Update image record iterator tests to check the whole iterator not only first image
Description
Fixes flakiness with ImageRecordIter augmentation seed tests (#11359) by moving the seed_aug parameter and RNG seeding out of the default augmenter. We now seed the RNG before each image augmentation (if seed_aug parameter is set). This guarantees reproducibility while still enabling multithreaded augmentation (thus fixing the flakiness).
Additionally, it makes the test more robust by testing the whole iterator and not just the first image.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments