-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLM] Support gpt3 fine grained dybatch v1 #7080
Conversation
Thanks for your contribution! |
Codecov Report
@@ Coverage Diff @@
## develop #7080 +/- ##
===========================================
- Coverage 59.91% 59.78% -0.13%
===========================================
Files 556 558 +2
Lines 82037 82217 +180
===========================================
+ Hits 49149 49152 +3
- Misses 32888 33065 +177
|
bac5c29
to
bd27eee
Compare
bd27eee
to
c7ec8bf
Compare
def set_state_dict(self, state_dict): | ||
dtype = paddle.get_default_dtype() | ||
|
||
for k, v in state_dict.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的if代码有点多啊?能改的和llama里的一样吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议先这样写,因为gpt的模型来源比较复杂,参数名比较乱,这种if写法已经尽可能多的兼容各种命名的模型了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
代码质量很好,除了以下两个 comment,还有一个小建议:添加单测,等 #7056 合入之后编写一个 test_predictor 的单测呗。
cls, pretrained_model_name_or_path, from_hf_hub: bool = False, subfolder: str | None = None, *args, **kwargs | ||
): | ||
# TODO: Support safetensors loading. | ||
kwargs["use_safetensors"] = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kwargs["use_safetensors"] = False | |
kwargs["use_safetensors"] = kwargs.get("use_safetensors", False) |
建议使用这个,因为单分片 safetensors 是可以支持 inferencemodel 加载的。
position_ids = tgt_pos | ||
attention_mask = (tgt_generation_mask - 1) * 1e4 | ||
else: | ||
attention_mask = (attention_mask - 1) * 1e4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里建议使用:paddle.finfo(attention_mask.dtype).min 的方式来转化 attention_mask 的值。
在 bf16 和 fp16 下面不同值域不一样,建议用这个来得到该 dtype 下的最小值。
上面的 tgt_attention_mask 也是需要调整一下。
可以的,我想补充单测和comment指出的问题 放在下个PR一块做吧,这个PR先合一版? |
PR types
Others
PR changes
Others
Description
Support gpt3 fine grained dybatch v1.