-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpFunctor and replace cast, scale,clip, bce_loss and abs_grad with elementwise_no_broadcast #38500
Conversation
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for cast kernel
f44ad4b
f44ad4b
to
c7acf55
Compare
c7acf55
to
3e1ad79
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for cast cuda kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this pr, but I think it better to code like this :
ScaleFunctor(InT scale_data, InT bias_data, bool is_bias_after_sacle) :
: bias(bias_data), scale(scale_data), bias_after_scale(is_bias_after_sacle) {}
PR types
Others
PR changes
OPs
Describe
Add OpFunctor and replace cast, scale, full, clip, bce_loss and abs_grad with elementwise_no_broadcast
cast 当前在pten中修改不触发benchmark,现补充性能测试:
case0 出现性能下降,主要原因是case规模比较小,机器波动影响占比较大。