Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLM Inference] Support Qwen2_Moe Inference Model #8892

Merged
merged 1 commit into from
Aug 28, 2024

Conversation

CJ77Qi
Copy link
Contributor

@CJ77Qi CJ77Qi commented Aug 7, 2024

PR types

New features

PR changes

Models

Description

Support Qwen-Moe Inference Model

  • 目前支持bf16/wint8,单卡推理
  • 已在Qwen/Qwen1.5-MoE-A2.7B验证

TODO:

  • 支持Qwen/Qwen2-57B-A14B 多卡推理,以及wint4

Copy link

paddle-bot bot commented Aug 7, 2024

Thanks for your contribution!

@CLAassistant
Copy link

CLAassistant commented Aug 7, 2024

CLA assistant check
All committers have signed the CLA.

Copy link

codecov bot commented Aug 7, 2024

Codecov Report

Attention: Patch coverage is 0% with 476 lines in your changes missing coverage. Please review.

Project coverage is 53.88%. Comparing base (f6fc7ff) to head (674b24d).
Report is 227 commits behind head on develop.

Files with missing lines Patch % Lines
...lp/experimental/transformers/qwen2_moe/modeling.py 0.00% 397 Missing ⚠️
...erimental/transformers/fused_transformer_layers.py 0.00% 77 Missing ⚠️
paddlenlp/experimental/transformers/__init__.py 0.00% 1 Missing ⚠️
...lp/experimental/transformers/qwen2_moe/__init__.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8892      +/-   ##
===========================================
- Coverage    54.05%   53.88%   -0.18%     
===========================================
  Files          650      652       +2     
  Lines       103884   104356     +472     
===========================================
+ Hits         56155    56230      +75     
- Misses       47729    48126     +397     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@CJ77Qi CJ77Qi changed the title [Inference LLM] Support Qwen2_Moe Inference Model [LLM Inference] Support Qwen2_Moe Inference Model Aug 26, 2024
@@ -1,4 +1,4 @@
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里恢复为2023

@@ -24,6 +24,7 @@
fused_rms_norm,
masked_multihead_attention,
variable_length_memory_efficient_attention,
fused_moe,
Copy link
Collaborator

@yuanlehome yuanlehome Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上面已经import过了

Comment on lines 198 to 202
shared_expert_ffn1_weight_attrs=None,
shared_expert_ffn1_weight_scale_attrs=None,
shared_expert_ffn2_weight_attrs=None,
shared_expert_ffn2_weight_scale_attrs=None,
shared_expert_gate_weight_attrs=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些以及下面的shared_expert_intermediate_size都放进MoeConfig里去

@@ -0,0 +1,15 @@
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2024

if (
token != self.unk_token
if (self.convert_tokens_to_ids(token) == self.convert_tokens_to_ids(self.unk_token)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里不要修改,恢复

Comment on lines 2 to 3
# Copyright 2018 The OpenAI Team Authors and HuggingFace Inc. team.
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删去

config=config,
dtype=predictor_args.dtype,
)
model.eval()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里代码是不是可以梳理设计下,每新增一个模型都需要增加相关的模型初始化方式

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

恩恩,这个工作在计划中,预计九月份有结论

@wawltor wawltor merged commit 34a71c8 into PaddlePaddle:develop Aug 28, 2024
9 of 12 checks passed
Mangodadada pushed a commit to Mangodadada/PaddleNLP that referenced this pull request Sep 10, 2024
Co-authored-by: yuanlehome <yuanlehome@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants