-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance the unit test framework to explicitly test whether the operator correctly handles gradients for multiple inputs. #3857
Enhance the unit test framework to explicitly test whether the operator correctly handles gradients for multiple inputs. #3857
Conversation
# input_vars, check_names, place) | ||
# In fact, the above two lines can be used to replace following | ||
# codes. But most of the gradient operators need to handle the case | ||
# where one of more of the gradient of the input is not needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one of more of the gradient of the input is
--> some input gradients are
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work for making mul
/rowwise_add
can ignore some IG.
However, I think the gradient unit-test framework should not be changed for the following reason.
If a user wants to check a mul
operator gradient without some IGs, he can invoke check_grad
many times, like check_grad("mul", input_to_check=["X"], ignore=["Y"])
, check_grad("mul", input_to_check["Y"], ignore=["X"])
.
- The disadvantage for invoking
check_grad
many times is that the numeric gradient will be calculated many times. It could be slow, but unit tests are not used by users, so it is OK. - To make users invoke
check_grad
by themselves can handle if some IGs are not able to be ignored. For example, OpA has two inputs, 'X' and 'Y' and gradient 'Y' should always be calculated, and itYG
should never be nullptr. - To make users invoke
check_grad
can make error log clearer. The error could becheck mul gradient error when calculating X and ignore Y
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review qingqing01#1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reyoung Thanks!I also thought it was not good to change the check_grad
code at first. I merge your patch.
Invoke check_grad many times for no_grad_set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
self.inputs, ["X"], | ||
"Out", | ||
max_relative_error=0.5, | ||
no_grad_set={"Y"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qingqing01 The framework should automatically test cases with certain input gradient ignored without explicitly writing code here. We should change GradientChecker.check_grad() to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emailweixu Thanks for your comments. Ok, in the first comments: 4470332, the check_grad
is automatically testing. I'll change it again :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The framework should automatically test cases with certain input gradient ignored without explicitly writing code here. We should change GradientChecker.check_grad() to do this.
- To make users invoke check_grad by themselves can handle if some IGs are not able to be ignored. For example, OpA has two inputs, 'X' and 'Y' and gradient 'Y' should always be calculated, and it YG should never be nullptr.
- Also,
check_grad
is not only used to check single operator's gradient but also the gradient inside a whole neural network. It is hard to automatically testing a whole network.
- Also,
- To make users invoke check_grad can make error log clearer. The error would be
test_mul_grad_ingore_x error
. - To make user-interface easily, there maybe a parameter in
check_grad
, liketest_no_input_grad
. Iftest_no_input_grad=True
, multiplecheck_grad
s will be invoked.
Fix #3797
If all input gradients of forwarding operator do not need to calculate, the backward operator will be a NOP operator: /~https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/backward.cc#L75
So, there is no need to handle this case for the backward op with only one output.