-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add maximum limit for grid of reduce, elementwise, gather and scatter #40813
add maximum limit for grid of reduce, elementwise, gather and scatter #40813
Conversation
@@ -128,6 +128,8 @@ inline GpuLaunchConfig GetGpuLaunchConfig1D( | |||
// Number of threads per block shall be larger than 64. | |||
threads = std::max(64, threads); | |||
int blocks = DivUp(DivUp(numel, vec_size), threads); | |||
int limit_blocks = context.GetCUDAMaxGridDimSize()[0]; | |||
if (blocks > limit_blocks) blocks = limit_blocks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
根据C++ Style要求,if条件最好加上大括号吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已加上{},感谢~
@@ -1044,7 +1056,7 @@ void ReduceKernel(const KPDevice& dev_ctx, | |||
|
|||
auto x_dim = phi::vectorize<int>(x.dims()); | |||
auto config = ReduceConfig<Ty>(origin_reduce_dims, x_dim); | |||
config.Run(); | |||
config.Run(x.place()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可不可以把这个LimitGridDim
写在外部,就可以直接用dev_ctx了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config.Run()里面有对block数量做限制,所以把thread数量限制一起放在config.Run()里面了,这样可被复用
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
能不能给一下出错的算子配置、修改前的线程数和修改后的线程数? |
PR types
Bug fixes
PR changes
OPs
Describe
The grid number of reduce、elementwise and masked_select has not been limited, which may raise a bug like:
So we add a maximum limit for grid of reduce, elementwise, gather and scatter kernel.