-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLM Inference] Support Qwen2_Moe Inference Model #8892
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8892 +/- ##
===========================================
- Coverage 54.05% 53.88% -0.18%
===========================================
Files 650 652 +2
Lines 103884 104356 +472
===========================================
+ Hits 56155 56230 +75
- Misses 47729 48126 +397 ☔ View full report in Codecov by Sentry. |
@@ -1,4 +1,4 @@ | |||
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. | |||
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里恢复为2023
@@ -24,6 +24,7 @@ | |||
fused_rms_norm, | |||
masked_multihead_attention, | |||
variable_length_memory_efficient_attention, | |||
fused_moe, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
上面已经import过了
shared_expert_ffn1_weight_attrs=None, | ||
shared_expert_ffn1_weight_scale_attrs=None, | ||
shared_expert_ffn2_weight_attrs=None, | ||
shared_expert_ffn2_weight_scale_attrs=None, | ||
shared_expert_gate_weight_attrs=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些以及下面的shared_expert_intermediate_size都放进MoeConfig里去
@@ -0,0 +1,15 @@ | |||
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2024
if ( | ||
token != self.unk_token | ||
if (self.convert_tokens_to_ids(token) == self.convert_tokens_to_ids(self.unk_token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不要修改,恢复
# Copyright 2018 The OpenAI Team Authors and HuggingFace Inc. team. | ||
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
删去
config=config, | ||
dtype=predictor_args.dtype, | ||
) | ||
model.eval() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里代码是不是可以梳理设计下,每新增一个模型都需要增加相关的模型初始化方式
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
恩恩,这个工作在计划中,预计九月份有结论
Co-authored-by: yuanlehome <yuanlehome@163.com>
PR types
New features
PR changes
Models
Description
Support Qwen-Moe Inference Model
TODO: