-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
webgpu: Optimize AvgPool when filter size = input size #6762
Conversation
AvgPool is very pool in cityscapes architecture in DeepLabV3. With this change, AvgPool becomes 3.07 ms from 24.77 ms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Jiajia! This perf improvement looks pretty great!
I am not sure if I understand correctly: this change gains performance because WebGPU's mean
(reduce
) op is optimized by workgroup? If I am right, I think I could not apply this idea to WebGL because mean
op and pool
op have similar implementations.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @gyagp and @qjia7)
tfjs-backend-webgpu/src/kernels/AvgPool.ts
line 61 at r1 (raw file):
transpose({inputs: {x: reshapeX}, backend, attrs: {perm: [1, 0]}}); const meanX = mean( {inputs: {x: transposeX}, backend, attrs: {keepDims: false, axis: 1}});
Could we avoid transpose
op here? Then we do meanX on axis 0, like:
const meanX = mean(
{inputs: {x: transposeX}, backend, attrs: {keepDims: false, axis: 0}});
Code quote:
const transposeX =
transpose({inputs: {x: reshapeX}, backend, attrs: {perm: [1, 0]}});
const meanX = mean(
{inputs: {x: transposeX}, backend, attrs: {keepDims: false, axis: 1}});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one reason. Another reason is that using reduce
makes the data accessing contiguous in memory.
For webgl, I remember @pyu10055 ever said that webgl reduction op are using parallel algorithm that reduce the array in multiple shader calls.
. So maybe using reduce
is still faster than the current pool2d
algorithm. You can have a try. But for current AvgPool op in this model, webgpu does behave much slower than webgl. But after the optimization, it becomes better.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @gyagp and @Linchenn)
tfjs-backend-webgpu/src/kernels/AvgPool.ts
line 61 at r1 (raw file):
Previously, Linchenn wrote…
Could we avoid
transpose
op here? Then we do meanX on axis 0, like:const meanX = mean( {inputs: {x: transposeX}, backend, attrs: {keepDims: false, axis: 0}});
Done. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for detailed explanation! LGTM!
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @gyagp and @Linchenn)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
AvgPool is very poor in cityscapes architecture in DeepLabV3.
With this change, AvgPool becomes 3.07 ms from 24.77 ms on TGL.
To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.
This change is