-
Notifications
You must be signed in to change notification settings - Fork 950
Add packed batchnormalization kernel behind an environment flag #1330
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @nsthorat, and @dsmilkov)
src/kernels/backend_webgl.ts, line 654 at r1 (raw file):
x: Tensor4D, mean: Tensor4D|Tensor1D, variance: Tensor4D|Tensor1D, varianceEpsilon: number, scale?: Tensor4D|Tensor1D, offset?: Tensor4D|Tensor1D): Tensor4D {
don't you need to clean up after yourself here?
src/kernels/packing_util.ts, line 3 at r1 (raw file):
/** * @license * Copyright 2018 Google Inc. All Rights Reserved.
LLC
src/kernels/webgl/batchnorm_packed_gpu.ts, line 3 at r1 (raw file):
/** * @license * Copyright 2017 Google Inc. All Rights Reserved.
2018 LLC
src/kernels/webgl/batchnorm_packed_gpu.ts, line 23 at r1 (raw file):
export class BatchNormPackedProgram implements GPGPUProgram { variableNames: string[]; outputShape: number[] = [];
don't initialize this
src/kernels/webgl/batchnorm_packed_gpu.ts, line 26 at r1 (raw file):
userCode: string; supportsBroadcasting = true; packedInputs = true;
what is this bit?
src/kernels/webgl/batchnorm_packed_gpu.ts, line 61 at r1 (raw file):
${scaleSnippet}; vec4 x = getX(rc.x, rc.y, rc.z, rc.w);
would this just be getXAtOutCoords(), and same for the other ones?
src/kernels/webgl/batchnorm_packed_gpu.ts, line 67 at r1 (raw file):
vec4 inv = scale * inversesqrt(variance + vec4(${varianceEpsilon})); gl_FragColor = (x - mean) * inv + offset;
why is this not setOutput?
src/kernels/webgl/shader_compiler.ts, line 643 at r1 (raw file):
return ` vec4 ${funcName}(int index) { vec2 uv = packedUVfrom1D(${packedTexShape[0]}, ${
the auto formatter for backticks is messed up, can you do this:
vec2 uv = packedUVFrom1D(
${packedTexShape[0]}, ${packedTexShape[1]}, index);
Same for getPackedSampler4D below
src/kernels/webgl/unpack_gpu.ts, line 42 at r1 (raw file):
${dtype} rc = getOutputCoords(); vec2 modCoord = mod(vec2(${ rank === 1 ? 'rc' : innerDims.join(',')}), 2.);
can you move this outside userCode
so this line reads easier?
src/kernels/webgl/unpack_gpu.ts, line 47 at r1 (raw file):
setOutput( modCoord.x == 0. ? (modCoord.y == 0. ? packedInput.r : packedInput.g) :
just curious -- why did this change?
src/ops/batchnorm_test.ts, line 47 at r1 (raw file):
const endNumBytes = tf.memory().numBytes; expect(endNumBytes - startNumBytes).toEqual(16);
also check numTensors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Few small comments in addition to Nikhil's
Reviewed 5 of 9 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @dsmilkov)
src/kernels/backend_webgl.ts, line 655 at r1 (raw file):
varianceEpsilon: number, scale?: Tensor4D|Tensor1D, offset?: Tensor4D|Tensor1D): Tensor4D { const packXProgram = new PackProgram(x.shape);
move the 3 lines needed to produce a packed tensor into a packedTensor = packTensor(tensor)
method to reuse code.
src/kernels/backend_webgl.ts, line 694 at r1 (raw file):
this.makePackedTensor<Tensor4D>(packedX.shape)); const unpackProgram = new UnpackProgram(batchNorm.shape);
similarly here "unpackTensor(tensor)"
src/kernels/webgl/shader_compiler.ts, line 93 at r1 (raw file):
const shape = inInfo.shapeInfo.logicalShape; switch (shape.length) { case 1:
while you are touching this code, can we make packaging/unpacking work for any rank (if not too complex - otherwise future PR.)
src/kernels/webgl/shader_compiler.ts, line 846 at r1 (raw file):
return ` vec4 ${funcName}(int b2, int b, int row, int col) { vec2 uv = packedUVfrom4D(${texNumR}, ${texNumC}, ${texelsInBatch2}, ${
auto-formatter messed up, split manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan)
src/kernels/backend_webgl.ts, line 654 at r1 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
don't you need to clean up after yourself here?
Done
src/kernels/backend_webgl.ts, line 655 at r1 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
move the 3 lines needed to produce a packed tensor into a
packedTensor = packTensor(tensor)
method to reuse code.
Done
src/kernels/backend_webgl.ts, line 694 at r1 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
similarly here "unpackTensor(tensor)"
Done
src/kernels/packing_util.ts, line 3 at r1 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
LLC
Done
src/kernels/webgl/batchnorm_packed_gpu.ts, line 3 at r1 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
2018 LLC
Done
src/kernels/webgl/batchnorm_packed_gpu.ts, line 23 at r1 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
don't initialize this
Done
src/kernels/webgl/batchnorm_packed_gpu.ts, line 26 at r1 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
what is this bit?
Done
forgot to clean up after myself from a different branch
src/kernels/webgl/batchnorm_packed_gpu.ts, line 61 at r1 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
would this just be getXAtOutCoords(), and same for the other ones?
Right now getSamplerAtOutputCoords
, which also handles broadcasting, doesn't support packed textures. I included a note in the PR description with a link to a GitHub issue I created for adding this support: tensorflow/tfjs#803
I thought it might make sense to add a packed version of getSamplerAtOutputCoords
as a separate PR. What do you think?
src/kernels/webgl/batchnorm_packed_gpu.ts, line 67 at r1 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
why is this not setOutput?
Done
setOutput only sets R, but here we need to set RGBA. I added another setOutput snippet to the shaderCompiler that sets all four channels - makeShader
will now select the appropriate version of setOutput
based on whether the output is packed.
src/kernels/webgl/shader_compiler.ts, line 93 at r1 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
while you are touching this code, can we make packaging/unpacking work for any rank (if not too complex - otherwise future PR.)
I thought it might make sense to include packing/unpacking for 3D tensors once we have ops that depend on them, just so it would be easier to test. It's a bit difficult to test packing / unpacking in isolation - we could test them as custom ops, but we have to access the texture manager within the WebGLBackend in order to create packed output tensors, which is difficult in a test since the texture manager is a private property of the WebGLBackend.
src/kernels/webgl/shader_compiler.ts, line 643 at r1 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
the auto formatter for backticks is messed up, can you do this:
vec2 uv = packedUVFrom1D( ${packedTexShape[0]}, ${packedTexShape[1]}, index);
Same for getPackedSampler4D below
Done
src/kernels/webgl/shader_compiler.ts, line 846 at r1 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
auto-formatter messed up, split manually
Done
src/kernels/webgl/unpack_gpu.ts, line 42 at r1 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
can you move this outside
userCode
so this line reads easier?
Done
src/kernels/webgl/unpack_gpu.ts, line 47 at r1 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
just curious -- why did this change?
Previously, modCoord
was computed from rc.y, rc.x
(note y and x are reversed - I was thinking in terms of width / height). Now the dimensions are in logical order, so the channels are read differently. I should have done it this way originally.
src/ops/batchnorm_test.ts, line 47 at r1 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
also check numTensors
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 8 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @nsthorat)
src/kernels/backend_webgl.ts, line 676 at r2 (raw file):
const unpacked = this.unpackTensor(batchNorm); packedInputs.forEach(d => d.dispose());
fyi, you can do tf.dispose([packedInputs, batchNorm])
. It will walk any object/list and do a deep dispose
src/kernels/webgl/shader_compiler.ts, line 93 at r1 (raw file):
Previously, annxingyuan wrote…
I thought it might make sense to include packing/unpacking for 3D tensors once we have ops that depend on them, just so it would be easier to test. It's a bit difficult to test packing / unpacking in isolation - we could test them as custom ops, but we have to access the texture manager within the WebGLBackend in order to create packed output tensors, which is difficult in a test since the texture manager is a private property of the WebGLBackend.
Makes sense. Future PR then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @nsthorat)
src/kernels/backend_webgl.ts, line 676 at r2 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
fyi, you can do
tf.dispose([packedInputs, batchNorm])
. It will walk any object/list and do a deep dispose
+1
src/kernels/webgl/batchnorm_packed_gpu.ts, line 61 at r1 (raw file):
Previously, annxingyuan wrote…
Right now
getSamplerAtOutputCoords
, which also handles broadcasting, doesn't support packed textures. I included a note in the PR description with a link to a GitHub issue I created for adding this support: tensorflow/tfjs#803I thought it might make sense to add a packed version of
getSamplerAtOutputCoords
as a separate PR. What do you think?
sounds good
src/kernels/webgl/batchnorm_packed_gpu.ts, line 67 at r1 (raw file):
Previously, annxingyuan wrote…
Done
setOutput only sets R, but here we need to set RGBA. I added another setOutput snippet to the shaderCompiler that sets all four channels -
makeShader
will now select the appropriate version ofsetOutput
based on whether the output is packed.
Perfect!
src/kernels/webgl/unpack_gpu.ts, line 47 at r1 (raw file):
Previously, annxingyuan wrote…
Previously,
modCoord
was computed fromrc.y, rc.x
(note y and x are reversed - I was thinking in terms of width / height). Now the dimensions are in logical order, so the channels are read differently. I should have done it this way originally.
Got it -- wouldn't there be some fundamental problem that we would have caught somewhere since this is a matter of output correctness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @annxingyuan)
src/kernels/backend_webgl.ts, line 676 at r2 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
+1
Done
src/kernels/webgl/batchnorm_packed_gpu.ts, line 61 at r1 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
sounds good
Done
src/kernels/webgl/batchnorm_packed_gpu.ts, line 67 at r1 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
Perfect!
Done
src/kernels/webgl/shader_compiler.ts, line 93 at r1 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
Makes sense. Future PR then.
Done
src/kernels/webgl/unpack_gpu.ts, line 47 at r1 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
Got it -- wouldn't there be some fundamental problem that we would have caught somewhere since this is a matter of output correctness?
No, because modCoord
was also being fed the coordinates in reverse (rc.y rc.x
). This PR reverses both the coordinates being fed to modCoord
and the order in which we read packedInput
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 8 files at r2.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @nsthorat)
This PR addresses tensorflow/tfjs#774
Changes
shaderCompiler
.pack_gpu
andunpack_gpu
.WEBGL_PACK_BATCHNORMALIZATION
(defaults tofalse
) which determines whether we use the packed batch normalization kernel.Notes
We've been adding environment flags to hide packed kernels that don't have an obvious performance benefit. In the case of batch normalization, the overhead of repeatedly packing its fixed inputs erases the performance gain from the packed kernel. I created an issue for us to think about uploading such inputs as packed in the first place: Fixed inputs to packed ops should be uploaded as packed tfjs#804.
I realize that the unpacked batch normalization kernel uses
get${var}AtOutCoords
rather than managing broadcasting itself. I've created an issue for us to addget${var}AtOutCoords
support for packed textures: ShaderCompiler should support get${Tex}AtOutCoords / broadcasting for packed textures tfjs#803. I think it makes sense as a separate PR.FEATURE, PERF
For repository owners only:
Please remember to apply all applicable tags to your pull request.
Tags: FEATURE, BREAKING, BUG, PERF, DEV, DOC, SECURITY
For more info see: /~https://github.com/tensorflow/tfjs/blob/master/DEVELOPMENT.md
This change is