Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hybrid] Support tensor parallel and cache structure for fused attention op. #40101

Merged

Conversation

FeixLiu
Copy link
Contributor

@FeixLiu FeixLiu commented Mar 3, 2022

PR types

Others

PR changes

Others

Describe

fuse_attention_op 修改如下:
增加了CacheKV input(optional),用作生成模型while当中的上一轮cache的值使用。
增加了CacheKVOut output,作为生成模型while当中的本轮更新过后的cache的值使用。
修改 attribute,新增ring_id属性(optional,默认值-1),作为分布式训练tensor parallel的通讯组标识使用。

Update fused_attention op support tensor model parallel and cache structure.
For tensor model parallel, first column parallel linear, then row parallel linear, then we will get partial out, we can use allreduce to get the final output.
image

@paddle-bot-old
Copy link

paddle-bot-old bot commented Mar 3, 2022

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@FeixLiu FeixLiu force-pushed the update_fused_attention_op_with_dist branch from e89ce26 to ab370ad Compare March 4, 2022 02:39
@FeixLiu FeixLiu force-pushed the update_fused_attention_op_with_dist branch from bf2cb02 to a8db839 Compare March 7, 2022 03:12
@FeixLiu FeixLiu force-pushed the update_fused_attention_op_with_dist branch 2 times, most recently from 81ca93e to c750011 Compare March 8, 2022 07:29
@FeixLiu FeixLiu force-pushed the update_fused_attention_op_with_dist branch from ce9d167 to 5b878bf Compare March 9, 2022 08:34
@FeixLiu FeixLiu changed the title [WIP] Update fused attention op with dist [WIP] Update fused attention op with dist and support cache structure Mar 9, 2022
cache structure support for fuse attention
@FeixLiu FeixLiu force-pushed the update_fused_attention_op_with_dist branch from 5b878bf to e088d36 Compare March 9, 2022 09:09
@FeixLiu FeixLiu changed the title [WIP] Update fused attention op with dist and support cache structure [hybrid] Support tensor parallel and cache structure for fused attention op. Mar 10, 2022
Copy link
Contributor

@wangxicoding wangxicoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@FeixLiu FeixLiu requested review from XieYunshen and limin2021 March 11, 2022 06:17
Copy link
Contributor

@XieYunshen XieYunshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for
set_tests_properties(test_static_model_parallel_fused_attention PROPERTIES TIMEOUT 120)

@FeixLiu FeixLiu requested review from Superjomn and XieYunshen March 11, 2022 06:18
Copy link
Contributor

@Superjomn Superjomn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@limin2021 limin2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@TCChenlong TCChenlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangxicoding wangxicoding merged commit 1882c49 into PaddlePaddle:develop Mar 11, 2022
@FeixLiu FeixLiu deleted the update_fused_attention_op_with_dist branch March 11, 2022 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants