Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PaddlePaddle Hackathon 4】[103] 新增tie_weights能力 提交rfc文档 #5098

Merged
merged 3 commits into from
Mar 10, 2023

Conversation

qiuwenbogdut
Copy link
Contributor

PR types

Others

PR changes

Docs

Description

[103] 新增tie_weights能力 提交rfc文档

@CLAassistant
Copy link

CLAassistant commented Mar 4, 2023

CLA assistant check
All committers have signed the CLA.

@paddle-bot
Copy link

paddle-bot bot commented Mar 4, 2023

Thanks for your contribution!

@codecov
Copy link

codecov bot commented Mar 4, 2023

Codecov Report

Merging #5098 (a52af67) into develop (9497c54) will increase coverage by 5.07%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop    #5098      +/-   ##
===========================================
+ Coverage    46.36%   51.44%   +5.07%     
===========================================
  Files          448      465      +17     
  Lines        64619    66479    +1860     
===========================================
+ Hits         29958    34197    +4239     
+ Misses       34661    32282    -2379     

see 177 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Collaborator

@sijunhe sijunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢您的RFC. 为了更加方便浏览,请您将截图更改为markdown语法的code block, 并且附上代码链接,例如:

def tie_weights(self):
    """
    Tie the weights between the input embeddings and the output embeddings.
    """
    if hasattr(self, "get_output_embeddings") and hasattr(self, "get_input_embeddings"):
        output_embeddings = self.get_output_embeddings()
        if output_embeddings is not None:
            self._tie_or_clone_weights(output_embeddings, self.get_input_embeddings())

代码链接

@qiuwenbogdut
Copy link
Contributor Author

@sijunhe 你好 已经重新提交了一个版本的rfc文档

Copy link
Collaborator

@sijunhe sijunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

图片可以从这个PR里删除了
@gongel 来review一下


(1) [代码链接1](/~https://github.com/qiuwenbogdut/PaddleNLP/blob/develop/examples/language_model/transformer-xl/mem_transformer.py#L811)

```python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其实paddlenlp内大部分的tie_weights实现是直接在模型layer定义层面实现的,见例子,而不是类似transformers一样在模型以外统一实现的。当然,这个项目的目标就是看一下能否在模型外统一实现,而不用每个模型都自己实现一次

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

类似transformers 一样模型外统一实现, 我是在paddleNLP目录下这里model_utils.py#L897添加tie weight代码吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看了一下,paddle里面tie_weghts实现有两种

  1. 一种在modeling.py中定义了tie_weghts函数,相应的模型也实现了get_input_embeding()和get_output_embeding()来获取输入和输出embeding层。
  2. 一种直接将输入embeding的weight,赋值给输出层weight,在定义模型层的时候

我们在model_utils.py中实现tie_weghts,考虑以上两种情况

  1. 将输入和输出embeding层的weight进行绑定
  2. 获取输入embeding层,获取输出weight,将输入embeding层的weight赋值给输出层 embeding weight

Copy link
Member

@gongel gongel Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weight直接复制在Paddle是不可行的,类似于这种操作,是无法修改输出层的weight:

output_embeddings.weight = input_embeddings.weight

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gongel 在本地写了一下小脚本进行了一下测试
output_embeddings.weight = input_embeddings.weight
操作之后,

  • input_emending.weight的id 也和 output_embedding.id是一致的.
  • 修改input_emending中weight的值, output_embedding中的值也会跟着改变

测试代码如下:

import numpy as np
from paddle.nn import Embedding

"""step1 定义两个不同的embedding 对象 AA 和 BB"""
print('------------step1')
AA = Embedding(1,2)
BB = Embedding(1,2)

AA.weight = BB.weight # 进行权重的绑定

""" step2 测试一下绑定结果"""
print('------------step2')
print('检测 AA 和 BB 的id是否一致:', AA is BB,id(AA), id(BB))                               # AA 和 BB 的id 不一致
print('检测 AA.weight 和 BB.weight 的id是否一致:',AA.weight is BB.weight,id(AA.weight), id(BB.weight))   # 但是AA.weight 和 BB.weight 的id是一致的

print("AA.weight: ",AA.weight)
print("BB.weight: ",BB.weight)



""" step3 尝试修改一下AA的weight的值 BB的weight的值是否也跟着会一起修改"""
# 修改一下其中一个AA 的权重值, 看一下 BB的权重值会不会变化
print('------------step3')
AA.weight.set_value(np.array([[4.0,6.0]],dtype=np.float32))

print('检测 修改后的 AA.weight 和 BB.weight 的id是否一致:',AA.weight is BB.weight,id(AA.weight), id(BB.weight)) # AA.weight 和 BB.weight 的id是一致的
print("AA.weight 修改后的值: ",AA.weight)
print("BB.weight:",BB.weight)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image


1. 获取模型input embedding 权重对象 A
2. 获取模型 output embedding 权重对象 B
3. 让A和B 都指向同一个权重值
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于paddle这个3会是一个难点,所以之前paddlenlp才会有类似这种实现

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

您的意思是, 预训练模型后面可能会接不通的head, 比如

  • 分类的head(线性层),
  • language modeling` head
  • 等等
    有可能有些任务下是没有output embeding的 比如基于ernie 的分类任务.

也有 将预训练模型的input embeding 输入到 language modeling` head 来初始化output embeding. 这种情况是一个实现难点?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

维持一个对象可行,将embedding的weight直接传给head来构建linear输出层,期望是在get_input_embeding()拿到weight,然后传给head层,注意考虑模型实例化和tie_weights的先后顺序。

## API实现方案

# 六、测试和验收的考量
参考:[新增API 测试及验收规范](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/dev_guides/api_contributing_guides/api_accpetance_criteria_cn.html)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一块可以想一下需要增加怎样的单测,能够证明tie weights成功了, 可以看下当前tests/transformers底下的测试,考虑如何在/~https://github.com/PaddlePaddle/PaddleNLP/blob/develop/tests/transformers/test_modeling_common.py#L61 里面对于实现了tie_weights的模型增加一个通用的测试

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前发现的自带实现tie weight的模型有:

单元测试是先实现 测试这些已有模型的input_embedding 和 output_embeding 是否指向同一个权重对象吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

写测试的时候考虑两种情况:

  1. 一种在modeling.py中定义了tie_weghts函数,相应的模型也实现了get_input_embeding()和get_output_embeding()来获取输入和输出embeding层。
  2. 在定义模型层的时候 一种直接将输入embeding层的weight,赋值给输出层weight,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如上,该方法不可行。

@gongel
Copy link
Member

gongel commented Mar 7, 2023

@qiuwenbogdut 目前PaddleNLP现存的带有tie_weights函数的,基本上都是实现不正确的。如何验证呢?(后期加单测也需要)我的建议是:有两个办法

  1. 直接判断输出层weight和输入层weight的id,如果一致即通过,否则Failed.
  2. 训练几个step,经过几个反向后,看下输出层weight和输入层weight是否一致,如果一致即通过,否则Failed.

@qiuwenbogdut qiuwenbogdut requested a review from gongel March 8, 2023 10:11
@gongel
Copy link
Member

gongel commented Mar 8, 2023

收到,看你的demo确实符合预期,期待你的实现!

Copy link
Collaborator

@sijunhe sijunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiuwenbogdut 您好。请您先不要在这个PR里实现代码,可以现将我们讨论的内容整合入RFC, 我们将这个合入。代码实现可以新开启一个PR~

@qiuwenbogdut
Copy link
Contributor Author

@sijunhe 好的,明天将将讨论的内容先整合入RFC先,提交代码再创建一个pR

@qiuwenbogdut
Copy link
Contributor Author

@sijunhe @gongel rfc 文档已经整合更新, 辛苦review以下 谢谢

Copy link
Member

@gongel gongel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,weight赋值的时候注意transpose.

Copy link
Collaborator

@sijunhe sijunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 期待你的实现!

@sijunhe sijunhe merged commit 20f1edd into PaddlePaddle:develop Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants