[LLM Inference] Support Qwen2_Moe Inference Model #8892

CJ77Qi · 2024-08-07T10:50:15Z

PR types

New features

PR changes

Models

Description

Support Qwen-Moe Inference Model

目前支持bf16/wint8，单卡推理
已在Qwen/Qwen1.5-MoE-A2.7B验证

TODO:

支持Qwen/Qwen2-57B-A14B 多卡推理，以及wint4

paddle-bot · 2024-08-07T10:50:21Z

Thanks for your contribution!

CLAassistant · 2024-08-07T10:56:56Z

All committers have signed the CLA.

codecov · 2024-08-07T11:26:17Z

Codecov Report

Attention: Patch coverage is 0% with 476 lines in your changes missing coverage. Please review.

Project coverage is 53.88%. Comparing base (f6fc7ff) to head (674b24d).
Report is 227 commits behind head on develop.

Files with missing lines	Patch %	Lines
...lp/experimental/transformers/qwen2_moe/modeling.py	0.00%	397 Missing ⚠️
...erimental/transformers/fused_transformer_layers.py	0.00%	77 Missing ⚠️
paddlenlp/experimental/transformers/__init__.py	0.00%	1 Missing ⚠️
...lp/experimental/transformers/qwen2_moe/__init__.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8892      +/-   ##
===========================================
- Coverage    54.05%   53.88%   -0.18%     
===========================================
  Files          650      652       +2     
  Lines       103884   104356     +472     
===========================================
+ Hits         56155    56230      +75     
- Misses       47729    48126     +397

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yuanlehome · 2024-08-27T03:31:46Z

paddlenlp/experimental/transformers/__init__.py

@@ -1,4 +1,4 @@
-# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.


这里恢复为2023

yuanlehome · 2024-08-27T03:32:11Z

paddlenlp/experimental/transformers/fused_transformer_layers.py

@@ -24,6 +24,7 @@
    fused_rms_norm,
    masked_multihead_attention,
    variable_length_memory_efficient_attention,
+    fused_moe,


上面已经import过了

yuanlehome · 2024-08-27T03:33:40Z

paddlenlp/experimental/transformers/fused_transformer_layers.py

+        shared_expert_ffn1_weight_attrs=None,
+        shared_expert_ffn1_weight_scale_attrs=None,
+        shared_expert_ffn2_weight_attrs=None,
+        shared_expert_ffn2_weight_scale_attrs=None,
+        shared_expert_gate_weight_attrs=None,


这些以及下面的shared_expert_intermediate_size都放进MoeConfig里去

yuanlehome · 2024-08-27T03:34:31Z

paddlenlp/experimental/transformers/qwen2_moe/__init__.py

@@ -0,0 +1,15 @@
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


yuanlehome · 2024-08-27T03:34:57Z

paddlenlp/transformers/tokenizer_utils.py

-            if (
-                token != self.unk_token
+            if (self.convert_tokens_to_ids(token) == self.convert_tokens_to_ids(self.unk_token)


这里不要修改，恢复

yuanlehome · 2024-08-27T03:35:54Z

paddlenlp/experimental/transformers/qwen2_moe/modeling.py

+# Copyright 2018 The OpenAI Team Authors and HuggingFace Inc. team.
+# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.


wawltor · 2024-08-28T03:43:54Z

llm/predict/predictor.py

+                        config=config,
+                        dtype=predictor_args.dtype,
+                    )
+                model.eval()


这里代码是不是可以梳理设计下，每新增一个模型都需要增加相关的模型初始化方式

恩恩，这个工作在计划中，预计九月份有结论

Co-authored-by: yuanlehome <yuanlehome@163.com>

CJ77Qi changed the title ~~[Inference LLM] Support Qwen2_Moe Inference Model~~ [LLM Inference] Support Qwen2_Moe Inference Model Aug 26, 2024

yuanlehome reviewed Aug 27, 2024

View reviewed changes

yuanlehome force-pushed the qwen2_moe branch from 90ee61e to 43ecf04 Compare August 27, 2024 12:48

supprot qwen-moe

674b24d

yuanlehome force-pushed the qwen2_moe branch from 43ecf04 to 674b24d Compare August 27, 2024 12:57

yuanlehome approved these changes Aug 27, 2024

View reviewed changes

wawltor reviewed Aug 28, 2024

View reviewed changes

wawltor merged commit 34a71c8 into PaddlePaddle:develop Aug 28, 2024
9 of 12 checks passed

Mangodadada pushed a commit to Mangodadada/PaddleNLP that referenced this pull request Sep 10, 2024

supprot qwen-moe (PaddlePaddle#8892)

a2bf616

Co-authored-by: yuanlehome <yuanlehome@163.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM Inference] Support Qwen2_Moe Inference Model #8892

[LLM Inference] Support Qwen2_Moe Inference Model #8892

CJ77Qi commented Aug 7, 2024 •

edited

Loading

paddle-bot bot commented Aug 7, 2024

CLAassistant commented Aug 7, 2024 •

edited

Loading

codecov bot commented Aug 7, 2024 •

edited

Loading

yuanlehome Aug 27, 2024

yuanlehome Aug 27, 2024 •

edited

Loading

yuanlehome Aug 27, 2024

yuanlehome Aug 27, 2024

yuanlehome Aug 27, 2024

yuanlehome Aug 27, 2024

wawltor Aug 28, 2024

yuanlehome Aug 28, 2024

		@@ -1,4 +1,4 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
		# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,15 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

		# Copyright 2018 The OpenAI Team Authors and HuggingFace Inc. team.
		# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.

[LLM Inference] Support Qwen2_Moe Inference Model #8892

[LLM Inference] Support Qwen2_Moe Inference Model #8892

Conversation

CJ77Qi commented Aug 7, 2024 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Aug 7, 2024

CLAassistant commented Aug 7, 2024 • edited Loading

codecov bot commented Aug 7, 2024 • edited Loading

Codecov Report

yuanlehome Aug 27, 2024

Choose a reason for hiding this comment

yuanlehome Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

yuanlehome Aug 27, 2024

Choose a reason for hiding this comment

yuanlehome Aug 27, 2024

Choose a reason for hiding this comment

yuanlehome Aug 27, 2024

Choose a reason for hiding this comment

yuanlehome Aug 27, 2024

Choose a reason for hiding this comment

wawltor Aug 28, 2024

Choose a reason for hiding this comment

yuanlehome Aug 28, 2024

Choose a reason for hiding this comment

CJ77Qi commented Aug 7, 2024 •

edited

Loading

CLAassistant commented Aug 7, 2024 •

edited

Loading

codecov bot commented Aug 7, 2024 •

edited

Loading

yuanlehome Aug 27, 2024 •

edited

Loading