-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark/mnist #5680
Benchmark/mnist #5680
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not change default random seed to 1.
Better to add testing set in the demo. |
What's the purpose of this PR? Does it want to compare the acc with other frameworks, like tensorflow/pytorch? |
I compare our refactorization with Paddle V1 / TensorFlow. |
I will merge the tensorflow code into this PR, and make it more clear. Sorry for the misleading. |
And what's the result of this benchmark? Does this PR compare the computation batch by batch? If we initialize parameter the same, does each batch's result is the same with tensorflow? We may need to control the error against tensorflow under 1e-6 with double compile. |
python/paddle/v2/fluid/layers.py
Outdated
attr=param_attr, | ||
shape=param_shape, | ||
dtype=dtype, | ||
initializer=NormalInitializer(0.0, std, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much is the effect of this initialization?
I know this change make it same as v2 initialization. But perhaps the default XavierInitializer is more common choice. We should allow the user to supply initializer. In fact, it is can be supplied in param_attr. Adding an initializer here will ignore the initializer setting in param_attr and cause confusing.
In order to make the sample mnist code, it is better to change the example to set the initializer through param_attr. And if the different std does not have much effect, we don't even need to bother to change the mnist example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for such a late reply caused by the trip. I compared the forward cost, accuracy, loss with v2/tensorflow batch by batch. Because our default Initializer is UniformInitializer, has a
-0.02 compared with the benchmark at the first several batches. With the NormalInitalizer w e have reached the same result with Paddle V2, even at the last decimal. I think we can draw the conclusion that our implementation has got the same accuracy with the Paddle V2.
Tensorflow uses a special random number generator algorithm Philox
which we don't have. We can reach the approximal same result when we fill Variable with the same random value, but there is a difference at 6 to 8 decimal place, caused by the conv2d operator implementation difference.
We should allow the user to supply initializer. In fact, it is can be supplied in param_attr. Adding an initializer here will ignore the initializer setting in param_attr and cause confusing.
We are working on make Initializer configurable. . Sure will only change the mnist example configuration in the benchmark test and set fc layer with XavierInitilizer as default.
#5819
#5805
#5760
3d808a9
to
689b551
Compare
Since this PR contains TensorFlow code, this benchmark will not be merged, the small fix has been included in our develop branch.
The comparison result is shown in #5862