Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support comparision binarary ops #33064

Merged

Conversation

JamesLim-sy
Copy link
Contributor

@JamesLim-sy JamesLim-sy commented May 23, 2021

PR types

Performance optimization

PR changes

OPs

Describe

  1. Supporting cuda binary comparision OPs whose return type is boolean, namely:
    equal
    not euqal
    large_than
    less_than
    large_equal
    less_equal

  2. The performance variation is below:
    图片

  3. As can be seen in the table, the time cost of most of test cases reflect the great improvment in logical ops after optimization of Elementwise and Broadcast op, while the develop branch takes the adavantage of thrust::cuda_cub::core:: dealing with the first test case.

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@JamesLim-sy JamesLim-sy force-pushed the Support_comparision_binarary_ops branch from 6379fff to 259ecb2 Compare May 24, 2021 14:07
@Xreki
Copy link
Contributor

Xreki commented May 25, 2021

验证一下#33086 里面的这个case。

@JamesLim-sy
Copy link
Contributor Author

验证一下#33086 里面的这个case。

本地验证本PR中不出现 #33086 中提及的问题,通过了正确性验证。

@wawltor
Copy link
Contributor

wawltor commented Jun 1, 2021

辛苦代码合入后也把相应的代码合入到2.1分支上,#33086 该PR的改动将不会cherry-pick分支

@Xreki
Copy link
Contributor

Xreki commented Jun 2, 2021

辛苦代码合入后也把相应的代码合入到2.1分支上,#33086 该PR的改动将不会cherry-pick分支

@wawltor 这个PR涉及到一系列相关的PR,且依赖的PR没有合到2.1分支,所以这个PR也不好cherry-pick到2.1哈。请还是继续cherry-pick #33086 吧。

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM both for the modification of the PR and op benchmark ci.

  • op benchmark ci跳过了controlflow目录中的源码的修改,所以ci没有测试到比较类op。本pr没有修改elementwise_xxx计算的代码,所以op benchmark ci上的性能问题纯属波动。
  • 2个review建议,下个PR一起看下。

* To pack the input and output tnesors into vector for
* LaunchElementwiseCudaKernel
*/
template <typename T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

模板里面最好标明是OutT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下个PR修改掉

ops::CompareOpKernel<plat::CUDADeviceContext, \
ops::func##Functor<double>, void>);

REGISTER_CUDA_COMPARE_KERNEL(equal, CudaEqual)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

虽然是为了减少公共代码,但是Functor后缀还是不要放到宏里面的好,宏参数还是最好传一个完整的函数:REGISTER_CUDA_COMPARE_KERNEL(equal, CudaEqualFunctor),直观便于理解。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下个PR中一并修改掉

@Xreki Xreki merged commit 0f15496 into PaddlePaddle:develop Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants