Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize group_norm op backward #39944

Merged
merged 13 commits into from
Mar 14, 2022
Merged

Conversation

Zjq9409
Copy link
Contributor

@Zjq9409 Zjq9409 commented Feb 25, 2022

PR types

Performance optimization

PR changes

OPs

Describe

优化group_norm反向计算公式、添加向量化计算代码,反向计算性能数据如下:

op 配置 竞品耗时 优化前 优化前相比竞品 优化后 优化后相比竞品 加速比
shape:[2,256,38,26] format:NCHW num_groups:32 0.02885 0.04573 差于 (58.51%) 0.02180 优于 (24.70%) 2.10
shape:[2,256,152,104] format:NCHW num_groups:32 0.21768 0.22975 差于 (5.54%) 0.19293 优于 (12.56%) 1.19
shape:[2,256,76,52] format:NCHW num_groups:32 0.07168 0.07838 差于 (9.35%) 0.06992 打平 (3.97%) 1.12
shape:[2,256,25,33] format:NCHW num_groups:32 0.02506 0.03627 差于 (44.73%) 0.01939 优于 (23.21%) 1.87
shape:[2,256,100,132] format:NCHW num_groups:32 0.18664 0.19681 差于 (5.45%) 0.16327 优于 (10.37%) 1.21
shape:[2,256,50,66] format:NCHW num_groups:32 0.06226 0.07368 差于 (18.34%) 0.06251 打平 (0.63%) 1.18
shape:[2,256,10,11] format:NCHW num_groups:32 0.01426 0.01283 优于 (10.03%) 0.01133 优于 (21.16%) 1.13
shape:[2,256,13,13] format:NCHW num_groups:32 0.01529 0.01422 优于 (7.00%) 0.01121 优于 (25.66%) 1.27
shape:[2,256,144,128] format:NCHW num_groups:32 0.25624 0.26543 打平 (3.59%) 0.21781 优于 (13.19%) 1.22
shape:[2,256,74,68] format:NCHW num_groups:32 0.07861 0.09157 差于 (16.49%) 0.07826 打平 (1.62%) 1.17
shape:[2,128,25,42] format:NCHW num_groups:32 0.01751 0.02577 差于 (47.17%) 0.01512 优于 (10.85%) 1.70
shape:[2,128,70,76] format:NCHW num_groups:32 0.05347 0.05619 差于 (5.09%) 0.04923 优于 (7.86%) 1.14
shape:[2,256,280,304] format:NCHW num_groups:32 1.10346 1.17337 差于 (6.34%) 0.94262 优于 (14.38%) 1.24
shape:[2,256,304,232] format:NCHW num_groups:32 0.90551 0.97509 差于 (7.68%) 0.77200 优于 (15.32%) 1.26
shape:[2,128,35,38] format:NCHW num_groups:32 0.01892 0.02555 差于 (35.04%) 0.01650 优于 (9.89%) 1.55
shape:[2,256,272,312] format:NCHW num_groups:32 1.10848 1.17801 差于 (6.27%) 0.93751 优于 (15.32%) 1.26
shape:[2,256,320,312] format:NCHW num_groups:32 1.28528 1.37059 差于 (6.64%) 1.07318 优于 (17.14%) 1.28
shape:[2,256,280,200] format:NCHW num_groups:32 0.75756 0.80689 差于 (6.51%) 0.64160 优于 (14.67%) 1.26
shape:[2,128,140,152] format:NCHW num_groups:32 0.15343 0.15773 打平 (2.80%) 0.13413 优于 (12.52%) 1.18
shape:[2,128,136,156] format:NCHW num_groups:32 0.16582 0.16450 打平 (0.80%) 0.13413 优于 (19.37%) 1.23

前向+反向的性能数据如下:

op 配置 竞品耗时 优化前 优化前相比竞品 优化后 优化后相比竞品 加速比
shape:[2,256,38,26] format:NCHW num_groups:32 0.05002 0.08798 差于 (75.89%) 0.03881 优于 (22.61%) 2.27
shape:[2,256,152,104] format:NCHW num_groups:32 0.44987 0.39136 优于 (13.01%) 0.34259 优于 (24.31%) 1.14
shape:[2,256,76,52] format:NCHW num_groups:32 0.14665 0.13921 优于 (5.07%) 0.11571 优于 (21.81%) 1.20
shape:[2,256,25,33] format:NCHW num_groups:32 0.04453 0.07385 差于 (65.84%) 0.03541 优于 (20.84%) 2.09
shape:[2,256,100,132] format:NCHW num_groups:32 0.38314 0.33415 优于 (12.79%) 0.28951 优于 (24.59%) 1.15
shape:[2,256,50,66] format:NCHW num_groups:32 0.12724 0.13091 打平 (2.88%) 0.10457 优于 (17.75%) 1.25
shape:[2,256,10,11] format:NCHW num_groups:32 0.02663 0.02523 优于 (5.26%) 0.02228 优于 (16.68%) 1.13
shape:[2,256,13,13] format:NCHW num_groups:32 0.02831 0.02853 打平 (0.78%) 0.02237 优于 (20.42%) 1.28
shape:[2,256,144,128] format:NCHW num_groups:32 0.51986 0.45103 优于 (13.24%) 0.38964 优于 (24.38%) 1.16
shape:[2,256,74,68] format:NCHW num_groups:32 0.17181 0.16131 优于 (6.11%) 0.13441 优于 (22.09%) 1.20
shape:[2,128,25,42] format:NCHW num_groups:32 0.03422 0.05010 差于 (46.41%) 0.02949 优于 (12.54%) 1.70
shape:[2,128,70,76] format:NCHW num_groups:32 0.10730 0.09786 优于 (8.80%) 0.08188 优于 (23.53%) 1.20
shape:[2,256,280,304] format:NCHW num_groups:32 2.25950 1.95465 优于 (13.49%) 1.69268 优于 (24.68%) 1.15
shape:[2,256,304,232] format:NCHW num_groups:32 1.86614 1.62497 优于 (12.92%) 1.39096 优于 (25.40%) 1.17
shape:[2,128,35,38] format:NCHW num_groups:32 0.03697 0.05034 差于 (36.16%) 0.03142 优于 (13.56%) 1.60
shape:[2,256,272,312] format:NCHW num_groups:32 2.25328 1.97186 优于 (12.49%) 1.69253 优于 (24.91%) 1.17
shape:[2,256,320,312] format:NCHW num_groups:32 2.63393 2.31147 优于 (12.24%) 1.96908 优于 (25.41%) 1.17
shape:[2,256,280,200] format:NCHW num_groups:32 1.52122 1.34076 优于 (11.86%) 1.14964 优于 (24.46%) 1.17
shape:[2,128,140,152] format:NCHW num_groups:32 0.31577 0.27354 优于 (13.37%) 0.24205 优于 (23.43%) 1.13
shape:[2,128,136,156] format:NCHW num_groups:32 0.32905 0.27918 优于 (15.16%) 0.24215 优于 (26.38%) 1.15

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@Zjq9409 Zjq9409 changed the title Group norm back opt group_norm backward optimize Feb 28, 2022
@Zjq9409 Zjq9409 changed the title group_norm backward optimize optimize group_norm op backward Feb 28, 2022
@Zjq9409 Zjq9409 force-pushed the group_norm_back_opt branch from f38ccc6 to b3503d6 Compare March 1, 2022 02:24
@ZzSean
Copy link
Contributor

ZzSean commented Mar 2, 2022

PR描述里简单说下优化方法

@Zjq9409
Copy link
Contributor Author

Zjq9409 commented Mar 3, 2022

PR描述里简单说下优化方法

已加

@Zjq9409 Zjq9409 force-pushed the group_norm_back_opt branch from 3949d0d to e8ba911 Compare March 9, 2022 08:54
@Zjq9409 Zjq9409 force-pushed the group_norm_back_opt branch from 22ee1f8 to 88ad343 Compare March 10, 2022 03:12
@Zjq9409 Zjq9409 force-pushed the group_norm_back_opt branch from 88ad343 to f18ab5a Compare March 10, 2022 03:14
Copy link
Contributor

@ZzSean ZzSean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZzSean ZzSean merged commit 5720537 into PaddlePaddle:develop Mar 14, 2022
Zjq9409 added a commit to Zjq9409/Paddle that referenced this pull request Mar 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants