Skip to content

Commit

Permalink
Support minicpm3-4b (#2465)
Browse files Browse the repository at this point in the history
* Support minicpm3-4b

* update doc and fix typo

* comments

* sum -> max

* add TODO

* use pytorch engine's rotary embedding
  • Loading branch information
AllentDan authored Sep 23, 2024
1 parent 8b9f6ab commit f3bef7b
Show file tree
Hide file tree
Showing 9 changed files with 563 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ For detailed inference benchmarks in more devices and more settings, please refe
<li>Phi-3-mini (3.8B)</li>
<li>Phi-3.5-mini (3.8B)</li>
<li>Phi-3.5-MoE (16x3.8B)</li>
<li>MiniCPM3 (4B)</li>
</ul>
</td>
<td>
Expand Down
1 change: 1 addition & 0 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ LMDeploy TurboMindエンジンは卓越した推論能力を持ち、さまざ
<li>Phi-3-mini (3.8B)</li>
<li>Phi-3.5-mini (3.8B)</li>
<li>Phi-3.5-MoE (16x3.8B)</li>
<li>MiniCPM3 (4B)</li>
</ul>
</td>
<td>
Expand Down
1 change: 1 addition & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
<li>Phi-3-mini (3.8B)</li>
<li>Phi-3.5-mini (3.8B)</li>
<li>Phi-3.5-MoE (16x3.8B)</li>
<li>MiniCPM3 (4B)</li>
</ul>
</td>
<td>
Expand Down
1 change: 1 addition & 0 deletions docs/en/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ The TurboMind engine doesn't support window attention. Therefore, for models tha
| QWen2 | 0.5B - 72B | LLM | Yes | No | No | Yes |
| DeepSeek-MoE | 16B | LLM | Yes | No | No | No |
| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No |
| MiniCPM3 | 4B | LLM | Yes | No | No | No |
| Gemma | 2B-7B | LLM | Yes | No | No | No |
| Dbrx | 132B | LLM | Yes | No | No | No |
| StarCoder2 | 3B-15B | LLM | Yes | No | No | No |
Expand Down
1 change: 1 addition & 0 deletions docs/zh_cn/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ turbomind 引擎不支持 window attention。所以,对于应用了 window att
| QWen2 | 0.5B - 72B | LLM | Yes | No | No | Yes |
| DeepSeek-MoE | 16B | LLM | Yes | No | No | No |
| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No |
| MiniCPM3 | 4B | LLM | Yes | No | No | No |
| Gemma | 2B-7B | LLM | Yes | No | No | No |
| Dbrx | 132B | LLM | Yes | No | No | No |
| StarCoder2 | 3B-15B | LLM | Yes | No | No | No |
Expand Down
3 changes: 3 additions & 0 deletions lmdeploy/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -887,6 +887,7 @@ def match(cls, model_path: str) -> Optional[str]:


@MODELS.register_module(name='minicpmv-2d6')
@MODELS.register_module(name='minicpm3')
@MODELS.register_module(name='qwen')
class Qwen7BChat(BaseChatTemplate):
"""Chat template for Qwen-7B-Chat."""
Expand Down Expand Up @@ -924,6 +925,8 @@ def match(cls, model_path: str) -> Optional[str]:
return 'qwen'
if 'minicpm-v-2_6' in model_path.lower():
return 'minicpmv-2d6'
if 'minicpm3-' in model_path.lower():
return 'minicpm3'


@MODELS.register_module(name='codellama')
Expand Down
32 changes: 32 additions & 0 deletions lmdeploy/pytorch/configurations/minicpm3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Copyright (c) OpenMMLab. All rights reserved.
from lmdeploy.pytorch.config import ModelConfig

from .builder import AutoModelConfigBuilder


class MiniCPM3ModelConfigBuilder(AutoModelConfigBuilder):

@classmethod
def condition(cls, hf_config):
"""config."""
return hf_config.architectures[0] in ['MiniCPM3ForCausalLM']

@classmethod
def build(cls, hf_config, model_path: str = None):
"""build."""
head_dim = (hf_config.qk_nope_head_dim + hf_config.qk_rope_head_dim)
k_head_dim = head_dim
v_head_dim = head_dim
num_attention_heads = hf_config.num_attention_heads
num_key_value_heads = hf_config.num_key_value_heads
return ModelConfig(hidden_size=hf_config.hidden_size,
num_layers=hf_config.num_hidden_layers,
num_attention_heads=num_attention_heads,
num_key_value_heads=num_key_value_heads,
bos_token_id=hf_config.bos_token_id,
eos_token_id=hf_config.eos_token_id,
head_dim=head_dim,
k_head_dim=k_head_dim,
v_head_dim=v_head_dim,
vocab_size=hf_config.vocab_size,
multi_query_attention=False)
Loading

0 comments on commit f3bef7b

Please sign in to comment.