Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug fix] Fix problem where dp grad merge not compatible with ClipGradientByGlobalNorm function #36334

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion python/paddle/fluid/clip.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
from .data_feeder import check_variable_and_dtype
from .framework import in_dygraph_mode
from .layer_helper import LayerHelper
from .framework import default_main_program

__all__ = [
'set_gradient_clip', 'ErrorClipByValue', 'ClipGradByValue',
Expand Down Expand Up @@ -547,7 +548,12 @@ def _static_clip(self, params_grads):
scale_input = (scale_var.astype('float16')
if g.dtype == core.VarDesc.VarType.FP16 else
scale_var)
p.block.append_op(
# NOTE(Yuang Liu): For pure dp with gradient merge, the p and g
# will be in different blocks with the gradient clip related ops.
# We need to handle the correct block, otherwise will encounter
# a 'NotFoundError' during compile time.
block = default_main_program().current_block()
block.append_op(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看下g.block可行否

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可行,但考虑到未来存在更新g.block的可能性,使用current_block是最保险的。

type='elementwise_mul',
inputs={'X': g,
'Y': scale_input},
Expand Down