[Paddle Inference] Implement conv2d_fusion NHWC format using cutlass #47989

zhoutianzi666 · 2022-11-15T04:02:08Z

PR types

Performance optimization

PR changes

Others

Describe

此pr支持了用户使用 cutlass 来运行原生GPU fp16 推理。
用户在使用原生fp16进行推理时，除了用enable_use_gpu指定低精度推理外，还需使用Python API config.exp_enable_use_cutlass()或C++ API config.Exp_EnableUseCutlass()即可使用cutlass来进行fp16 推理。

Paddle-Inference Demo resnet50测试数据，算上DTH和HTD

T4	trt/fp16	paddle/fp16
1	1.78526	1.55
16	9.69	11.2

yolov5s PaddleTest测试数据，性能较trt差距较大的原因在于：插入了太多layout转换的kernel，此kernel已被优化实现，以后考虑向量化继续优化。目前性能如下。

T4	trt/fp16	原生fp16
1	8.7	12.7

paddle-bot · 2022-11-15T04:02:11Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_all.h

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_bias.cu

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_bias_add_relu.cu

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_util.cu

paddle/phi/kernels/fusion/cutlass/conv2d_fusion.cu

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_bias_add_relu.cu

zhangjun

LGTM

… function

zhangjun

LGTM

qingqing01

需要增加单测

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_bias_relu_few_channels.cu

… brought by others

zhoutianzi666 · 2023-01-03T03:09:22Z

需要增加单测

done!

XiaoguangHu01

LGTM

zhhsplendid

LGTM for type registration. We approve conv for not having int kernel

zyfncg

LGTM for including fluid header in phi

zhoutianzi666 added 2 commits November 13, 2022 15:38

precision ok for resnet50

1e92e18

tests ok for resnet50

935b709

paddle-bot bot added contributor External developers status: proposed labels Nov 15, 2022

zhoutianzi666 added 19 commits November 18, 2022 08:55

commit all files

4591a5a

delete useless files

5280865

delete gpudnn 修改

7d47f0d

merge develop

c1bd00e

delete CMake

4d41d28

delete dnn/conv_kernel.cu 的debug语句

a712c32

make code clean

4c7f7a2

clean code

f6eff0d

clean code

634ae7a

clean code

5331d4d

clean code

09de122

clean code

05fc551

clean code

6b9a860

clean code

599a7b0

clean code

df491a4

clean code

bc5fb29

not edit transfer_layout_kernel.cc

dad7d43

clean code

0b318b4

clean code

8543467

MARD1NO reviewed Nov 22, 2022

View reviewed changes

zhoutianzi666 added 5 commits November 23, 2022 13:04

REPEAT

7a29554

merge develop

8a10204

merge develop

e9b21a7

commit

e302169

commit

53f32dd

zhoutianzi666 added 2 commits December 22, 2022 13:05

merge develop

3a9e4d7

remove moe in CMakeLists.txt

3b820b7

zhangjun reviewed Dec 23, 2022

View reviewed changes

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_bias_add_relu.cu Show resolved Hide resolved

zhangjun previously approved these changes Dec 23, 2022

View reviewed changes

add CHECK_EQ(groups == 1, true);

69c722f

zhoutianzi666 dismissed zhangjun’s stale review via 69c722f December 26, 2022 02:22

zhoutianzi666 added 3 commits December 26, 2022 02:45

add new_desc.SetAttr(beta, 1.f); in silu_fuse

bdd277e

remove some fluid header from phi

1dfcc80

add cutlass_enable = false;

9ff49ef

zhangjun previously approved these changes Dec 26, 2022

View reviewed changes

remove WARMUP and REPEAT from .h , put them in ProfileToGetBestConfig…

52c8a8f

… function

zhoutianzi666 dismissed zhangjun’s stale review via 52c8a8f December 26, 2022 11:58

YuanRisheng previously approved these changes Dec 26, 2022

View reviewed changes

zhangjun previously approved these changes Dec 27, 2022

View reviewed changes

qingqing01 reviewed Dec 28, 2022

View reviewed changes

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_bias_relu_few_channels.cu Show resolved Hide resolved

zhoutianzi666 dismissed stale reviews from zhangjun and YuanRisheng via ceacbf9 December 29, 2022 05:13

add check padding_algorithm && add test_cutlass_conv2d_fusion_op.py

9dd83bf

zhoutianzi666 force-pushed the paddle_cutlass branch from ceacbf9 to 9dd83bf Compare December 29, 2022 05:17

zhoutianzi666 added 5 commits December 29, 2022 07:00

add cuda version >=11000 when exp_enable_use_cutlass

0466eae

modify unittests/ir/inference/CMakeLists.txt

93cd43f

remove cuda version check in test_cutlass_conv2d_fusion_op.py

63581c1

Merge branch 'develop' into paddle_cutlass

c55cc8f

add float in conv2d_fusion.cu if not will report error, this is a bug…

901b27c

… brought by others

qingqing01 approved these changes Jan 3, 2023

View reviewed changes

XiaoguangHu01 approved these changes Jan 3, 2023

View reviewed changes

zhhsplendid approved these changes Jan 3, 2023

View reviewed changes

zyfncg approved these changes Jan 3, 2023

View reviewed changes

qingqing01 merged commit c123dd1 into PaddlePaddle:develop Jan 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Paddle Inference] Implement conv2d_fusion NHWC format using cutlass #47989

[Paddle Inference] Implement conv2d_fusion NHWC format using cutlass #47989

zhoutianzi666 commented Nov 15, 2022 •

edited

Loading

paddle-bot bot commented Nov 15, 2022

zhangjun left a comment

zhangjun left a comment

qingqing01 left a comment

zhoutianzi666 commented Jan 3, 2023

XiaoguangHu01 left a comment

zhhsplendid left a comment

zyfncg left a comment

[Paddle Inference] Implement conv2d_fusion NHWC format using cutlass #47989

[Paddle Inference] Implement conv2d_fusion NHWC format using cutlass #47989

Conversation

zhoutianzi666 commented Nov 15, 2022 • edited Loading

PR types

PR changes

Describe

paddle-bot bot commented Nov 15, 2022

zhangjun left a comment

Choose a reason for hiding this comment

zhangjun left a comment

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment

zhoutianzi666 commented Jan 3, 2023

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

zhhsplendid left a comment

Choose a reason for hiding this comment

zyfncg left a comment

Choose a reason for hiding this comment

zhoutianzi666 commented Nov 15, 2022 •

edited

Loading