Skip to content

Commit

Permalink
Support llama3.2 LLM models in turbomind engine(#2596)
Browse files Browse the repository at this point in the history
* update

* update doc

* fix typo

* update
  • Loading branch information
lvhan028 authored Oct 24, 2024
1 parent d00e470 commit cd3e791
Show file tree
Hide file tree
Showing 9 changed files with 13 additions and 8 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ For detailed inference benchmarks in more devices and more settings, please refe
<li>Llama2 (7B - 70B)</li>
<li>Llama3 (8B, 70B)</li>
<li>Llama3.1 (8B, 70B)</li>
<li>Llama3.2 (1B, 3B)</li>
<li>InternLM (7B - 20B)</li>
<li>InternLM2 (7B - 20B)</li>
<li>InternLM2.5 (7B)</li>
Expand Down
1 change: 1 addition & 0 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ LMDeploy TurboMindエンジンは卓越した推論能力を持ち、さまざ
<li>Llama2 (7B - 70B)</li>
<li>Llama3 (8B, 70B)</li>
<li>Llama3.1 (8B, 70B)</li>
<li>Llama3.2 (1B, 3B)</li>
<li>InternLM (7B - 20B)</li>
<li>InternLM2 (7B - 20B)</li>
<li>InternLM2.5 (7B)</li>
Expand Down
1 change: 1 addition & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
<li>Llama2 (7B - 70B)</li>
<li>Llama3 (8B, 70B)</li>
<li>Llama3.1 (8B, 70B)</li>
<li>Llama3.2 (1B, 3B)</li>
<li>InternLM (7B - 20B)</li>
<li>InternLM2 (7B - 20B)</li>
<li>InternLM2.5 (7B)</li>
Expand Down
2 changes: 1 addition & 1 deletion docs/en/quantization/w4a16.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ lmdeploy serve gradio ./internlm2_5-7b-chat-4bit --server_name {ip_addr} --serve

## Evaluation

Please refer to [OpenCompass](https://opencompass.readthedocs.io/en/latest/index.html) about model evaluation with LMDeploy.
Please refer to [OpenCompass](https://opencompass.readthedocs.io/en/latest/index.html) about model evaluation with LMDeploy. Here is the [guide](https://opencompass.readthedocs.io/en/latest/advanced_guides/evaluation_lmdeploy.html)

## Inference

Expand Down
3 changes: 2 additions & 1 deletion docs/en/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes |
| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
| Llama3.2 | 3B | LLM | Yes | Yes | Yes | Yes |
| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes |
Expand All @@ -20,7 +21,6 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
| Qwen2 | 1.5B - 72B | LLM | Yes | Yes | Yes | Yes |
| Mistral | 7B | LLM | Yes | Yes | Yes | - |
| Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
| Qwen2-VL | 2B, 7B, 72B | MLLM | Yes | Yes | Yes | - |
| DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
| Baichuan | 7B | LLM | Yes | Yes | Yes | Yes |
| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes |
Expand Down Expand Up @@ -49,6 +49,7 @@ The TurboMind engine doesn't support window attention. Therefore, for models tha
| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | Yes |
| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | No | - |
| Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | No | - |
| Llama3.2-VL | 8B, 90B | MLLM | Yes | Yes | Yes | No | - |
| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | - |
| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
Expand Down
2 changes: 1 addition & 1 deletion docs/zh_cn/quantization/w4a16.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ lmdeploy serve gradio ./internlm2_5-7b-chat-4bit --server-name {ip_addr} --serve

## 模型评测

我们使用 [OpenCompass](https://opencompass.readthedocs.io/zh-cn/latest/index.html) 评测量化模型在各个维度上的能力
我们使用 [OpenCompass](https://opencompass.readthedocs.io/zh-cn/latest/index.html) 评测量化模型在各个维度上的能力。方法请参考[此处](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/evaluation_lmdeploy.html)

## 模型推理

Expand Down
3 changes: 2 additions & 1 deletion docs/zh_cn/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes |
| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
| Llama3.2 | 3B | LLM | Yes | Yes | Yes | Yes |
| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes |
Expand All @@ -20,7 +21,6 @@
| Qwen2 | 1.5B - 72B | LLM | Yes | Yes | Yes | Yes |
| Mistral | 7B | LLM | Yes | Yes | Yes | - |
| Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
| Qwen2-VL | 2B, 7B, 72B | MLLM | Yes | Yes | Yes | - |
| DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
| Baichuan | 7B | LLM | Yes | Yes | Yes | Yes |
| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes |
Expand Down Expand Up @@ -49,6 +49,7 @@ turbomind 引擎不支持 window attention。所以,对于应用了 window att
| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | Yes |
| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | No | - |
| Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | No | - |
| Llama3.2-VL | 8B, 90B | MLLM | Yes | Yes | Yes | No | - |
| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | - |
| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -772,7 +772,7 @@ def match(cls, model_path: str) -> Optional[str]:
return 'llama3'


@MODELS.register_module(name='llama3_1')
@MODELS.register_module(name=['llama3_1', 'llama3_2'])
class Llama3_1(Llama3):
"""Chat template of LLaMA3.1 model."""

Expand Down
6 changes: 3 additions & 3 deletions lmdeploy/turbomind/supported_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,9 +84,9 @@ def _is_head_dim_128(cfg):
if num_attn_head == 40:
# baichuan-13B, baichuan2-13B not supported by turbomind
support_by_turbomind = False
elif arch == 'Qwen2ForCausalLM':
# qwen2 0.5b size_per_head is 64, which hasn't been supported
# by turbomind yet
elif arch in ['Qwen2ForCausalLM', 'LlamaForCausalLM']:
# the head_dim of qwen2 0.5b and llama3.2-1b is 64, which
# hasn't been supported by turbomind yet
support_by_turbomind = _is_head_dim_128(cfg)
elif arch in ('ChatGLMModel', 'ChatGLMForConditionalGeneration'):
# chatglm1/2/3 is not working yet
Expand Down

0 comments on commit cd3e791

Please sign in to comment.