[hybrid] Support tensor parallel and cache structure for fused attention op. #40101

FeixLiu · 2022-03-03T01:59:27Z

PR types

Others

PR changes

Others

Describe

fuse_attention_op 修改如下：
增加了CacheKV input（optional），用作生成模型while当中的上一轮cache的值使用。
增加了CacheKVOut output，作为生成模型while当中的本轮更新过后的cache的值使用。
修改 attribute，新增ring_id属性（optional，默认值-1），作为分布式训练tensor parallel的通讯组标识使用。

Update fused_attention op support tensor model parallel and cache structure.
For tensor model parallel, first column parallel linear, then row parallel linear, then we will get partial out, we can use allreduce to get the final output.

paddle-bot-old · 2022-03-03T01:59:32Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle/fluid/operators/fused/fused_attention_op.cc

python/paddle/incubate/nn/functional/fused_transformer.py

cache structure support for fuse attention

wangxicoding

LGTM

XieYunshen

LGTM for
set_tests_properties(test_static_model_parallel_fused_attention PROPERTIES TIMEOUT 120)

Superjomn

LGTM

limin2021

LGTM.

TCChenlong

LGTM

wangxicoding reviewed Mar 4, 2022

View reviewed changes

paddle/fluid/operators/fused/fused_attention_op.cc Outdated Show resolved Hide resolved

python/paddle/incubate/nn/functional/fused_transformer.py Outdated Show resolved Hide resolved

FeixLiu force-pushed the update_fused_attention_op_with_dist branch from e89ce26 to ab370ad Compare March 4, 2022 02:39

FeixLiu force-pushed the update_fused_attention_op_with_dist branch from bf2cb02 to a8db839 Compare March 7, 2022 03:12

FeixLiu force-pushed the update_fused_attention_op_with_dist branch 2 times, most recently from 81ca93e to c750011 Compare March 8, 2022 07:29

FeixLiu force-pushed the update_fused_attention_op_with_dist branch from ce9d167 to 5b878bf Compare March 9, 2022 08:34

FeixLiu changed the title ~~[WIP] Update fused attention op with dist~~ [WIP] Update fused attention op with dist and support cache structure Mar 9, 2022

mp support for fuse attention

e088d36

cache structure support for fuse attention

FeixLiu force-pushed the update_fused_attention_op_with_dist branch from 5b878bf to e088d36 Compare March 9, 2022 09:09

wangxicoding and others added 3 commits March 9, 2022 12:51

support mp backward, add dist unittest

789085f

update api for cache kv

762c0a5

add fused_attention cache test

c84080b

FeixLiu changed the title ~~[WIP] Update fused attention op with dist and support cache structure~~ [hybrid] Support tensor parallel and cache structure for fused attention op. Mar 10, 2022

FeixLiu and others added 2 commits March 10, 2022 14:31

update the enforce

10832fe

refine some code

8aa340d

wangxicoding mentioned this pull request Mar 10, 2022

[hybrid] Fused attention support cache & tensor model parallel #40361

Closed

FeixLiu added 4 commits March 10, 2022 19:06

fix sample code error

838319f

update doc

858e712

update for ut

c003600

update return value

67f3b8e

wangxicoding approved these changes Mar 11, 2022

View reviewed changes

FeixLiu requested review from XieYunshen and limin2021 March 11, 2022 06:17

XieYunshen approved these changes Mar 11, 2022

View reviewed changes

FeixLiu requested review from Superjomn and XieYunshen March 11, 2022 06:18

Superjomn approved these changes Mar 11, 2022

View reviewed changes

limin2021 approved these changes Mar 11, 2022

View reviewed changes

TCChenlong approved these changes Mar 11, 2022

View reviewed changes

wangxicoding merged commit 1882c49 into PaddlePaddle:develop Mar 11, 2022

FeixLiu deleted the update_fused_attention_op_with_dist branch March 11, 2022 09:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hybrid] Support tensor parallel and cache structure for fused attention op. #40101

[hybrid] Support tensor parallel and cache structure for fused attention op. #40101

FeixLiu commented Mar 3, 2022 •

edited

Loading

paddle-bot-old bot commented Mar 3, 2022

wangxicoding left a comment

XieYunshen left a comment

Superjomn left a comment

limin2021 left a comment

TCChenlong left a comment

[hybrid] Support tensor parallel and cache structure for fused attention op. #40101

[hybrid] Support tensor parallel and cache structure for fused attention op. #40101

Conversation

FeixLiu commented Mar 3, 2022 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Mar 3, 2022

wangxicoding left a comment

Choose a reason for hiding this comment

XieYunshen left a comment

Choose a reason for hiding this comment

Superjomn left a comment

Choose a reason for hiding this comment

limin2021 left a comment

Choose a reason for hiding this comment

TCChenlong left a comment

Choose a reason for hiding this comment

FeixLiu commented Mar 3, 2022 •

edited

Loading