-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【PaddlePaddle Hackathon 4】[103] 新增tie_weights能力 提交rfc文档 #5098
Conversation
Thanks for your contribution! |
Codecov Report
@@ Coverage Diff @@
## develop #5098 +/- ##
===========================================
+ Coverage 46.36% 51.44% +5.07%
===========================================
Files 448 465 +17
Lines 64619 66479 +1860
===========================================
+ Hits 29958 34197 +4239
+ Misses 34661 32282 -2379 see 177 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感谢您的RFC. 为了更加方便浏览,请您将截图更改为markdown语法的code block, 并且附上代码链接,例如:
def tie_weights(self):
"""
Tie the weights between the input embeddings and the output embeddings.
"""
if hasattr(self, "get_output_embeddings") and hasattr(self, "get_input_embeddings"):
output_embeddings = self.get_output_embeddings()
if output_embeddings is not None:
self._tie_or_clone_weights(output_embeddings, self.get_input_embeddings())
@sijunhe 你好 已经重新提交了一个版本的rfc文档 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
图片可以从这个PR里删除了
@gongel 来review一下
|
||
(1) [代码链接1](/~https://github.com/qiuwenbogdut/PaddleNLP/blob/develop/examples/language_model/transformer-xl/mem_transformer.py#L811) | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其实paddlenlp内大部分的tie_weights实现是直接在模型layer定义层面实现的,见例子,而不是类似transformers一样在模型以外统一实现的。当然,这个项目的目标就是看一下能否在模型外统一实现,而不用每个模型都自己实现一次
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
类似transformers 一样模型外统一实现, 我是在paddleNLP目录下这里model_utils.py#L897添加tie weight代码吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看了一下,paddle里面tie_weghts实现有两种
- 一种在modeling.py中定义了tie_weghts函数,相应的模型也实现了get_input_embeding()和get_output_embeding()来获取输入和输出embeding层。
- 一种直接将输入embeding的weight,赋值给输出层weight,在定义模型层的时候
我们在model_utils.py中实现tie_weghts,考虑以上两种情况
- 将输入和输出embeding层的weight进行绑定
- 获取输入embeding层,获取输出weight,将输入embeding层的weight赋值给输出层 embeding weight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weight直接复制在Paddle是不可行的,类似于这种操作,是无法修改输出层的weight:
output_embeddings.weight = input_embeddings.weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gongel 在本地写了一下小脚本进行了一下测试
output_embeddings.weight = input_embeddings.weight
操作之后,
- input_emending.weight的id 也和 output_embedding.id是一致的.
- 修改input_emending中weight的值, output_embedding中的值也会跟着改变
测试代码如下:
import numpy as np
from paddle.nn import Embedding
"""step1 定义两个不同的embedding 对象 AA 和 BB"""
print('------------step1')
AA = Embedding(1,2)
BB = Embedding(1,2)
AA.weight = BB.weight # 进行权重的绑定
""" step2 测试一下绑定结果"""
print('------------step2')
print('检测 AA 和 BB 的id是否一致:', AA is BB,id(AA), id(BB)) # AA 和 BB 的id 不一致
print('检测 AA.weight 和 BB.weight 的id是否一致:',AA.weight is BB.weight,id(AA.weight), id(BB.weight)) # 但是AA.weight 和 BB.weight 的id是一致的
print("AA.weight: ",AA.weight)
print("BB.weight: ",BB.weight)
""" step3 尝试修改一下AA的weight的值 BB的weight的值是否也跟着会一起修改"""
# 修改一下其中一个AA 的权重值, 看一下 BB的权重值会不会变化
print('------------step3')
AA.weight.set_value(np.array([[4.0,6.0]],dtype=np.float32))
print('检测 修改后的 AA.weight 和 BB.weight 的id是否一致:',AA.weight is BB.weight,id(AA.weight), id(BB.weight)) # AA.weight 和 BB.weight 的id是一致的
print("AA.weight 修改后的值: ",AA.weight)
print("BB.weight:",BB.weight)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
1. 获取模型input embedding 权重对象 A | ||
2. 获取模型 output embedding 权重对象 B | ||
3. 让A和B 都指向同一个权重值 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对于paddle这个3会是一个难点,所以之前paddlenlp才会有类似这种实现
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
您的意思是, 预训练模型后面可能会接不通的head, 比如
- 分类的head(线性层),
- language modeling` head
- 等等
有可能有些任务下是没有output embeding的 比如基于ernie 的分类任务.
也有 将预训练模型的input embeding 输入到 language modeling` head 来初始化output embeding. 这种情况是一个实现难点?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
维持一个对象可行,将embedding的weight直接传给head来构建linear输出层,期望是在get_input_embeding()拿到weight,然后传给head层,注意考虑模型实例化和tie_weights的先后顺序。
## API实现方案 | ||
|
||
# 六、测试和验收的考量 | ||
参考:[新增API 测试及验收规范](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/dev_guides/api_contributing_guides/api_accpetance_criteria_cn.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一块可以想一下需要增加怎样的单测,能够证明tie weights成功了, 可以看下当前tests/transformers
底下的测试,考虑如何在/~https://github.com/PaddlePaddle/PaddleNLP/blob/develop/tests/transformers/test_modeling_common.py#L61 里面对于实现了tie_weights的模型增加一个通用的测试
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
写测试的时候考虑两种情况:
- 一种在modeling.py中定义了tie_weghts函数,相应的模型也实现了get_input_embeding()和get_output_embeding()来获取输入和输出embeding层。
- 在定义模型层的时候 一种直接将输入embeding层的weight,赋值给输出层weight,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如上,该方法不可行。
@qiuwenbogdut 目前PaddleNLP现存的带有tie_weights函数的,基本上都是实现不正确的。如何验证呢?(后期加单测也需要)我的建议是:有两个办法
|
收到,看你的demo确实符合预期,期待你的实现! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qiuwenbogdut 您好。请您先不要在这个PR里实现代码,可以现将我们讨论的内容整合入RFC, 我们将这个合入。代码实现可以新开启一个PR~
@sijunhe 好的,明天将将讨论的内容先整合入RFC先,提交代码再创建一个pR |
6ef980d
to
f21a999
Compare
f21a999
to
a52af67
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM,weight赋值的时候注意transpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 期待你的实现!
PR types
Others
PR changes
Docs
Description
[103] 新增tie_weights能力 提交rfc文档