Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grid_sampler optimization #39751

Merged
merged 33 commits into from
Feb 28, 2022

Conversation

AshburnLee
Copy link
Contributor

@AshburnLee AshburnLee commented Feb 20, 2022

PR types

Performance optimization

PR changes

OPs

Describe

功能

  • 经过开发测试,3Dkernel整体性能不能优于优化前的1D kernel。
  • 经过分析,发现该OP实现过程中存在重复操作,导致每次该op执行时都会有一个EigenMetaKernel被launch,而该kernel的耗时占比不能被忽略,故删除。
  • 经进一步分析,当block大小为512,经输出img计算得到的grid大小远小于SM数(V100 80个SM),而相同的case,竞品block设为256(paddle设为256后,实际性能整体差于竞品,故保持512),grid大小为74,接近SM数。故代码中添加了对于block大小为512时,grid大小的判断和重新设置,LaunchConfig1D中有类似的处理。效果如下

在模型20个case上的效果

前向
截屏2022-02-28 14 48 14

反向
截屏2022-02-23 20 08 31

结论

  • 前向:将SM数考虑进去后,模型case性能优于develop,除case#7(从差于竞品10.79%距离缩小到差于8.38%),其他不差于竞品。对与上次优化输出img为300*4的5个case,与竞品差距大幅度减小(分别是9.11%->2.14%、10.79%->8.38%、12.94%->3.32%、13.93%->4.49%、10.24%->1.53%)。
  • 反向:模型case性能优于develop。但是由于反向逻辑存在原子操作,其掩盖了上述处理得到的性能收益(同一个case的前/反向有相同的处理规模,但前/反向的耗时差距很大,原子操作是瓶颈)。
  • op benchmark case 较优化前有明显提升(见CI-op-benchmark)。

update Paddle USERNAME repo
@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@AshburnLee AshburnLee changed the title Grid sampler fw bilinear Grid_sampler optimization Feb 25, 2022
Copy link
Contributor

@ZzSean ZzSean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZzSean ZzSean merged commit 2c66775 into PaddlePaddle:develop Feb 28, 2022
@AshburnLee AshburnLee deleted the grid_sampler_fw_bilinear branch February 28, 2022 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants