Skip to content
This repository has been archived by the owner on Aug 15, 2019. It is now read-only.

Add packed batchnormalization kernel behind an environment flag #1330

Merged
merged 18 commits into from
Oct 23, 2018

Conversation

annxingyuan
Copy link
Collaborator

@annxingyuan annxingyuan commented Oct 18, 2018

This PR addresses tensorflow/tfjs#774

Changes

  • Added packed 1D and 4D input sampling and output coordinate matching to shaderCompiler.
  • Added support for n-D inputs to pack_gpu and unpack_gpu.
  • Added packed batch normalization kernel.
  • Added environment flag WEBGL_PACK_BATCHNORMALIZATION (defaults to false) which determines whether we use the packed batch normalization kernel.

Notes

  1. We've been adding environment flags to hide packed kernels that don't have an obvious performance benefit. In the case of batch normalization, the overhead of repeatedly packing its fixed inputs erases the performance gain from the packed kernel. I created an issue for us to think about uploading such inputs as packed in the first place: Fixed inputs to packed ops should be uploaded as packed tfjs#804.

  2. I realize that the unpacked batch normalization kernel uses get${var}AtOutCoords rather than managing broadcasting itself. I've created an issue for us to add get${var}AtOutCoords support for packed textures: ShaderCompiler should support get${Tex}AtOutCoords / broadcasting for packed textures tfjs#803. I think it makes sense as a separate PR.

FEATURE, PERF


For repository owners only:

Please remember to apply all applicable tags to your pull request.
Tags: FEATURE, BREAKING, BUG, PERF, DEV, DOC, SECURITY

For more info see: /~https://github.com/tensorflow/tfjs/blob/master/DEVELOPMENT.md


This change is Reviewable

@annxingyuan annxingyuan self-assigned this Oct 18, 2018
@annxingyuan annxingyuan changed the title WIP Pack batchnormalization Add packed batchnormalization kernel behind an environment flag Oct 18, 2018
Copy link
Contributor

@nsthorat nsthorat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan, @nsthorat, and @dsmilkov)


src/kernels/backend_webgl.ts, line 654 at r1 (raw file):

      x: Tensor4D, mean: Tensor4D|Tensor1D, variance: Tensor4D|Tensor1D,
      varianceEpsilon: number, scale?: Tensor4D|Tensor1D,
      offset?: Tensor4D|Tensor1D): Tensor4D {

don't you need to clean up after yourself here?


src/kernels/packing_util.ts, line 3 at r1 (raw file):

/**
 * @license
 * Copyright 2018 Google Inc. All Rights Reserved.

LLC


src/kernels/webgl/batchnorm_packed_gpu.ts, line 3 at r1 (raw file):

/**
 * @license
 * Copyright 2017 Google Inc. All Rights Reserved.

2018 LLC


src/kernels/webgl/batchnorm_packed_gpu.ts, line 23 at r1 (raw file):

export class BatchNormPackedProgram implements GPGPUProgram {
  variableNames: string[];
  outputShape: number[] = [];

don't initialize this


src/kernels/webgl/batchnorm_packed_gpu.ts, line 26 at r1 (raw file):

  userCode: string;
  supportsBroadcasting = true;
  packedInputs = true;

what is this bit?


src/kernels/webgl/batchnorm_packed_gpu.ts, line 61 at r1 (raw file):

        ${scaleSnippet};

        vec4 x = getX(rc.x, rc.y, rc.z, rc.w);

would this just be getXAtOutCoords(), and same for the other ones?


src/kernels/webgl/batchnorm_packed_gpu.ts, line 67 at r1 (raw file):

        vec4 inv = scale * inversesqrt(variance + vec4(${varianceEpsilon}));

        gl_FragColor = (x - mean) * inv + offset;

why is this not setOutput?


src/kernels/webgl/shader_compiler.ts, line 643 at r1 (raw file):

  return `
    vec4 ${funcName}(int index) {
      vec2 uv = packedUVfrom1D(${packedTexShape[0]}, ${

the auto formatter for backticks is messed up, can you do this:

vec2 uv = packedUVFrom1D(
  ${packedTexShape[0]}, ${packedTexShape[1]}, index);

Same for getPackedSampler4D below


src/kernels/webgl/unpack_gpu.ts, line 42 at r1 (raw file):

        ${dtype} rc = getOutputCoords();
        vec2 modCoord = mod(vec2(${
        rank === 1 ? 'rc' : innerDims.join(',')}), 2.);

can you move this outside userCode so this line reads easier?


src/kernels/webgl/unpack_gpu.ts, line 47 at r1 (raw file):

        setOutput(
          modCoord.x == 0. ?
            (modCoord.y == 0. ? packedInput.r : packedInput.g) :

just curious -- why did this change?


src/ops/batchnorm_test.ts, line 47 at r1 (raw file):

    const endNumBytes = tf.memory().numBytes;

    expect(endNumBytes - startNumBytes).toEqual(16);

also check numTensors

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Few small comments in addition to Nikhil's

Reviewed 5 of 9 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan and @dsmilkov)


src/kernels/backend_webgl.ts, line 655 at r1 (raw file):

      varianceEpsilon: number, scale?: Tensor4D|Tensor1D,
      offset?: Tensor4D|Tensor1D): Tensor4D {
    const packXProgram = new PackProgram(x.shape);

move the 3 lines needed to produce a packed tensor into a packedTensor = packTensor(tensor) method to reuse code.


src/kernels/backend_webgl.ts, line 694 at r1 (raw file):

        this.makePackedTensor<Tensor4D>(packedX.shape));

    const unpackProgram = new UnpackProgram(batchNorm.shape);

similarly here "unpackTensor(tensor)"


src/kernels/webgl/shader_compiler.ts, line 93 at r1 (raw file):

  const shape = inInfo.shapeInfo.logicalShape;
  switch (shape.length) {
    case 1:

while you are touching this code, can we make packaging/unpacking work for any rank (if not too complex - otherwise future PR.)


src/kernels/webgl/shader_compiler.ts, line 846 at r1 (raw file):

  return `
    vec4 ${funcName}(int b2, int b, int row, int col) {
      vec2 uv = packedUVfrom4D(${texNumR}, ${texNumC}, ${texelsInBatch2}, ${

auto-formatter messed up, split manually

Copy link
Collaborator Author

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @annxingyuan)


src/kernels/backend_webgl.ts, line 654 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

don't you need to clean up after yourself here?

Done


src/kernels/backend_webgl.ts, line 655 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

move the 3 lines needed to produce a packed tensor into a packedTensor = packTensor(tensor) method to reuse code.

Done


src/kernels/backend_webgl.ts, line 694 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

similarly here "unpackTensor(tensor)"

Done


src/kernels/packing_util.ts, line 3 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

LLC

Done


src/kernels/webgl/batchnorm_packed_gpu.ts, line 3 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

2018 LLC

Done


src/kernels/webgl/batchnorm_packed_gpu.ts, line 23 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

don't initialize this

Done


src/kernels/webgl/batchnorm_packed_gpu.ts, line 26 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

what is this bit?

Done

forgot to clean up after myself from a different branch


src/kernels/webgl/batchnorm_packed_gpu.ts, line 61 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

would this just be getXAtOutCoords(), and same for the other ones?

Right now getSamplerAtOutputCoords, which also handles broadcasting, doesn't support packed textures. I included a note in the PR description with a link to a GitHub issue I created for adding this support: tensorflow/tfjs#803

I thought it might make sense to add a packed version of getSamplerAtOutputCoords as a separate PR. What do you think?


src/kernels/webgl/batchnorm_packed_gpu.ts, line 67 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

why is this not setOutput?

Done

setOutput only sets R, but here we need to set RGBA. I added another setOutput snippet to the shaderCompiler that sets all four channels - makeShader will now select the appropriate version of setOutput based on whether the output is packed.


src/kernels/webgl/shader_compiler.ts, line 93 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

while you are touching this code, can we make packaging/unpacking work for any rank (if not too complex - otherwise future PR.)

I thought it might make sense to include packing/unpacking for 3D tensors once we have ops that depend on them, just so it would be easier to test. It's a bit difficult to test packing / unpacking in isolation - we could test them as custom ops, but we have to access the texture manager within the WebGLBackend in order to create packed output tensors, which is difficult in a test since the texture manager is a private property of the WebGLBackend.


src/kernels/webgl/shader_compiler.ts, line 643 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

the auto formatter for backticks is messed up, can you do this:

vec2 uv = packedUVFrom1D(
  ${packedTexShape[0]}, ${packedTexShape[1]}, index);

Same for getPackedSampler4D below

Done


src/kernels/webgl/shader_compiler.ts, line 846 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

auto-formatter messed up, split manually

Done


src/kernels/webgl/unpack_gpu.ts, line 42 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

can you move this outside userCode so this line reads easier?

Done


src/kernels/webgl/unpack_gpu.ts, line 47 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

just curious -- why did this change?

Previously, modCoord was computed from rc.y, rc.x (note y and x are reversed - I was thinking in terms of width / height). Now the dimensions are in logical order, so the channels are read differently. I should have done it this way originally.


src/ops/batchnorm_test.ts, line 47 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

also check numTensors

Done

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 8 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @nsthorat)


src/kernels/backend_webgl.ts, line 676 at r2 (raw file):

    const unpacked = this.unpackTensor(batchNorm);

    packedInputs.forEach(d => d.dispose());

fyi, you can do tf.dispose([packedInputs, batchNorm]). It will walk any object/list and do a deep dispose


src/kernels/webgl/shader_compiler.ts, line 93 at r1 (raw file):

Previously, annxingyuan wrote…

I thought it might make sense to include packing/unpacking for 3D tensors once we have ops that depend on them, just so it would be easier to test. It's a bit difficult to test packing / unpacking in isolation - we could test them as custom ops, but we have to access the texture manager within the WebGLBackend in order to create packed output tensors, which is difficult in a test since the texture manager is a private property of the WebGLBackend.

Makes sense. Future PR then.

Copy link
Contributor

@nsthorat nsthorat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @nsthorat)


src/kernels/backend_webgl.ts, line 676 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

fyi, you can do tf.dispose([packedInputs, batchNorm]). It will walk any object/list and do a deep dispose

+1


src/kernels/webgl/batchnorm_packed_gpu.ts, line 61 at r1 (raw file):

Previously, annxingyuan wrote…

Right now getSamplerAtOutputCoords, which also handles broadcasting, doesn't support packed textures. I included a note in the PR description with a link to a GitHub issue I created for adding this support: tensorflow/tfjs#803

I thought it might make sense to add a packed version of getSamplerAtOutputCoords as a separate PR. What do you think?

sounds good


src/kernels/webgl/batchnorm_packed_gpu.ts, line 67 at r1 (raw file):

Previously, annxingyuan wrote…

Done

setOutput only sets R, but here we need to set RGBA. I added another setOutput snippet to the shaderCompiler that sets all four channels - makeShader will now select the appropriate version of setOutput based on whether the output is packed.

Perfect!


src/kernels/webgl/unpack_gpu.ts, line 47 at r1 (raw file):

Previously, annxingyuan wrote…

Previously, modCoord was computed from rc.y, rc.x (note y and x are reversed - I was thinking in terms of width / height). Now the dimensions are in logical order, so the channels are read differently. I should have done it this way originally.

Got it -- wouldn't there be some fundamental problem that we would have caught somewhere since this is a matter of output correctness?

Copy link
Collaborator Author

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @annxingyuan)


src/kernels/backend_webgl.ts, line 676 at r2 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

+1

Done


src/kernels/webgl/batchnorm_packed_gpu.ts, line 61 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

sounds good

Done


src/kernels/webgl/batchnorm_packed_gpu.ts, line 67 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

Perfect!

Done


src/kernels/webgl/shader_compiler.ts, line 93 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

Makes sense. Future PR then.

Done


src/kernels/webgl/unpack_gpu.ts, line 47 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

Got it -- wouldn't there be some fundamental problem that we would have caught somewhere since this is a matter of output correctness?

No, because modCoord was also being fed the coordinates in reverse (rc.y rc.x). This PR reverses both the coordinates being fed to modCoord and the order in which we read packedInput.

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 8 files at r2.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @nsthorat)

@annxingyuan annxingyuan merged commit 23d0913 into master Oct 23, 2018
@annxingyuan annxingyuan deleted the pack_batchnormalization branch October 23, 2018 14:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants