Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Paddle Inference] Implement conv2d_fusion NHWC format using cutlass #47989

Merged
merged 89 commits into from
Jan 3, 2023

Conversation

zhoutianzi666
Copy link
Contributor

@zhoutianzi666 zhoutianzi666 commented Nov 15, 2022

PR types

Performance optimization

PR changes

Others

Describe

  • 此pr支持了用户使用 cutlass 来运行原生GPU fp16 推理。
  • 用户在使用原生fp16进行推理时,除了用enable_use_gpu指定低精度推理外,还需使用Python API config.exp_enable_use_cutlass()或C++ API config.Exp_EnableUseCutlass()即可使用cutlass来进行fp16 推理。

Paddle-Inference Demo resnet50测试数据,算上DTH和HTD

T4 trt/fp16 paddle/fp16
1 1.78526 1.55
16 9.69 11.2
  • yolov5s PaddleTest测试数据,性能较trt差距较大的原因在于:插入了太多layout转换的kernel,此kernel已被优化实现,以后考虑向量化继续优化。目前性能如下。
T4 trt/fp16 原生fp16
1 8.7 12.7

@paddle-bot
Copy link

paddle-bot bot commented Nov 15, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added contributor External developers status: proposed labels Nov 15, 2022
zhangjun
zhangjun previously approved these changes Dec 23, 2022
Copy link
Contributor

@zhangjun zhangjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

zhangjun
zhangjun previously approved these changes Dec 26, 2022
YuanRisheng
YuanRisheng previously approved these changes Dec 26, 2022
zhangjun
zhangjun previously approved these changes Dec 27, 2022
Copy link
Contributor

@zhangjun zhangjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要增加单测

@zhoutianzi666 zhoutianzi666 dismissed stale reviews from zhangjun and YuanRisheng via ceacbf9 December 29, 2022 05:13
@zhoutianzi666
Copy link
Contributor Author

需要增加单测

done!

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@zhhsplendid zhhsplendid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for type registration. We approve conv for not having int kernel

Copy link
Contributor

@zyfncg zyfncg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for including fluid header in phi

@qingqing01 qingqing01 merged commit c123dd1 into PaddlePaddle:develop Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants