Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize where_op and abs_grad_op by the elementwise interface #39609

Merged
merged 6 commits into from
Feb 24, 2022

Conversation

huangxu96
Copy link
Contributor

@huangxu96 huangxu96 commented Feb 16, 2022

PR types

Performance optimization

PR changes

OPs

Describe

通过elementwise 接口优化了wehere_op和abs_grad_op。 elementwise 接口打包了一系列性能优化技巧,对于有elementwise行为的op有通用的性能提升。通过重写functor的形式,将代码里的循环遍历元素改写为通过elementwise接口调用functor实现。

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@huangxu96 huangxu96 changed the title Optimize the where_op by the elementwise_op funtion Optimize where_op and abs_grad_op by the elementwise interface Feb 23, 2022
AnnaTrainingG
AnnaTrainingG previously approved these changes Feb 23, 2022
@@ -17,9 +17,33 @@
#include "paddle/fluid/platform/for_range.h"
#include "paddle/phi/kernels/abs_grad_kernel.h"
#include "paddle/phi/kernels/funcs/complex_functors.h"
#if defined(__NVCC__) || defined(__HIPCC__)
#include "paddle/fluid/operators/elementwise/elementwise_op_impl.cu.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

phi下不能include fluid路径下的文件,参考cast 修改一下

Copy link
Contributor

@AnnaTrainingG AnnaTrainingG Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR描述里面介绍清楚一点做的工作,比如:添加哪些functor,调用哪个Kernel等

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

auto functor = CondFunctor<T>();
std::vector<const framework::Tensor*> ins = {condition, X, Y};
std::vector<framework::Tensor*> outs = {out};
paddle::operators::LaunchSameDimsElementwiseCudaKernel<T>(dev_ctx, ins,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议改成phi::funcs的那种调用

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

T><<<config.block_per_grid.x, config.thread_per_block.x, 0, stream>>>(
numel, cond_data, x_data, y_data, out_data);
auto functor = CondFunctor<T>();
std::vector<const framework::Tensor*> ins = {condition, X, Y};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

相关的framework Tensor后续可以改成DensorTensor

@@ -20,6 +21,15 @@ namespace platform = paddle::platform;
namespace paddle {
namespace operators {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下个PR里每个函数加上功能说明

@AnnaTrainingG AnnaTrainingG merged commit c969955 into PaddlePaddle:develop Feb 24, 2022
return cond ? x : y;
}
};

template <typename T>
__global__ void WhereCUDAKernel(const int N, const bool* cond, const T* x,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数可以删除了?

@@ -20,6 +21,15 @@ namespace platform = paddle::platform;
namespace paddle {
namespace operators {

template <typename T>
struct CondFunctor {
HOSTDEVICE inline CondFunctor() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

默认构造函数,可以不用显式写。

@@ -154,6 +154,53 @@ struct AbsFunctor<T, NoComplex<T, Real<T>>> {
int64_t numel_;
};

template <typename T>
struct AbsGradCUDAFunctor {
HOSTDEVICE inline AbsGradCUDAFunctor() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

默认构造函数可以不用显式定义。

};

template <>
struct AbsGradCUDAFunctor<phi::dtype::complex<float>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functor定义可以简化下,参考:

template <typename T>
struct CudaAbsFunctor<T, phi::funcs::Complex<T, phi::funcs::Real<T>>> {
__device__ __forceinline__ phi::funcs::Real<T> operator()(const T x) const {
return abs(x);
}
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants