replace dropout_grad implementation with cuda kernel #39795

zhangting2020 · 2022-02-22T01:33:08Z

Performance optimization

OPs

replace dropout_grad implementation with cuda kernel

修改点：将原来的eigen kernel移除

反向算子某些分支等价于scale，使用ScaleFunctor和Elementwise模版去优化。同时为了保证fp16下的算子精度，修改了已有的ScaleFunctor，计算采用fp32，结果转为fp16
反向算子中部分分支，原来是对输出乘以0，使用cudaMemset置0

paddle-bot-old · 2022-02-22T01:33:12Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

AnnaTrainingG

LGTM

ZzSean

LGTM，建议把优化前后的性能变化补充在PR中

replace implementation with cuda kernel

e588f40

AnnaTrainingG approved these changes Feb 22, 2022

View reviewed changes

ZzSean approved these changes Feb 23, 2022

View reviewed changes

zhangting2020 merged commit 64f1485 into PaddlePaddle:develop Feb 25, 2022

Provide feedback