-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Replaced rand_r with std:: random generation #13574
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good change, some comments.
@@ -176,13 +176,17 @@ void LstmForwardTraining(DType* ws, | |||
if (dropout > 0.0f) { | |||
#pragma omp parallel for num_threads(omp_threads) | |||
for (int j = 0; j < T * N * H * D; j++) { | |||
int rand_data = rand_r(&seed_); | |||
static thread_local std::random_device device; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, is static needed when you use thread_local? https://stackoverflow.com/questions/22794382/are-c11-thread-local-variables-automatically-static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it correct that we are seeding inside the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting link, thanks. I would still prefer the explicit way for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it correct that we are seeding inside the loop?
Since it's static we doing it only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For every thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
@@ -1890,7 +1896,11 @@ void VanillaRNNForwardTraining(DType* ws, | |||
if (dropout > 0.0f && l > 0) { | |||
#pragma omp parallel for num_threads(omp_threads) | |||
for (int i = 0; i < T * N * I; i++) { | |||
int rand_data = rand_r(&seed_); | |||
static thread_local std::random_device device; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we wrap this up in a function? Or would interfere with thread_local? Seems we are duplicating code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure where to put it, do you have any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
static thread_local std::default_random_engine generator(device()); | ||
static thread_local std::uniform_real_distribution<float> distribution; | ||
static thread_local auto dice = std::bind(distribution, generator); | ||
return dice(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a RandInt method and use this in rnn_impl?
Would be interested to get understand what happens memory-wise with these thread_locals, and also the openmp parallelization when I see you next =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is file is in tests/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no facepalm emoticon, so I had thumbs-up
@perdasilva @larroy Is your concern answered? Is this PR good to go? |
@mxnet-label-bot Update [pr-awaiting-response] |
@lebeg @larroy @perdasilva ping for update |
I think this PR is good to merge, can somebody do this? |
@mxnet-label-bot Update [pr-awaiting-merge] |
@apeforest - Can you please take a look at this PR? |
@anirudhacharya @lebeg all good from my side |
Can anybody merge this? |
@apeforest Could you please take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we have random number generator as operator resources, the thread local random number generator inside CPU RNN operator looks like a hack to me.
@@ -176,13 +176,17 @@ void LstmForwardTraining(DType* ws, | |||
if (dropout > 0.0f) { | |||
#pragma omp parallel for num_threads(omp_threads) | |||
for (int j = 0; j < T * N * H * D; j++) { | |||
int rand_data = rand_r(&seed_); | |||
static thread_local std::random_device device; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably addressed at the framework level by providing APIs to get rand numbers. We should not expect developers who implement operators to handle thread local variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eric-haibin-lin In general, I don't see why a thread local variable is an issue - there is a separate generator for every thread and nothing needs to be done additionally to ensure thread safety. Otherwise locking needs to be in place.
I could add a generate random number function to the framework API, but wouldn't change the implementation. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in case of 1 function for all there will be complications with setting the seed if needed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest to refer to RandGenerator<xpu>
as I mentioned below. It handles thread-safe problem without locking. The implement is somewhat tricky though.
@szha can you add a bit more information on this? Do you mean there is a random number generator operator? |
@lebeg I recommend looking at the dropout implementation (in |
Looks like the change is extracted from #11148 . Reviewed the code before and related comment and response are here: /~https://github.com/apache/incubator-mxnet/pull/11148/files#r210174235 |
@szha @TaoLv thanks, quoting here again for reference:
|
@lebeg Can we port it to /~https://github.com/apache/incubator-mxnet/blob/master/include/mxnet/random_generator.h ? It is designed exactly for non-threadsafe rand number generators (in which cuda device random api is not thread-safe). I guess this is what @szha meant. Ref: |
Usually using Say if you use 4 threads to fill an array A[4], each Gaussian random generator fills one entry with its head element - Also it will be great to let users set their global seed, as some (maybe not-well-designed) algorithms are sensitive to seed number. This means all random generators within the system should take same input seed as their constructor argument. (I know currently this rule is not well followed in mxnet though.) This is why |
Thank you @yzhliu for your input! Now it's indeed much clearer to me. I will take a look at the RandGenerator implementation and modify the change accordingly. |
@mxnet-label-bot update [pr-awaiting-response, Operator] |
@lebeg Did you get a chance to look at the RandGenerator implementation ? Can you please make the relevant code changes for that? Thank you! |
Closing this for now, no capacity to work on this right now. |
Description
rand_r
is not thread-safe and is not available on Android. Part of bigger #11148Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments