Skip to content
This repository has been archived by the owner on Aug 15, 2019. It is now read-only.

Optimize addN op by the multiple upload of textures #1538

Merged
merged 9 commits into from
Apr 15, 2019

Conversation

Lewuathe
Copy link
Contributor

@Lewuathe Lewuathe commented Feb 5, 2019

PERF

As well as the optimization for concat op, we can optimize addN op by uploading multiple textures up to WEBGL_MAX_TEXTURES_IN_SHADER so that we can reduce the overhead of texture upload.

Here is the benchmark of optimized addN ops. We can see the significant performance improvement in case of over 4000 tensors.

Environment

  • macOS: 10.13.6
  • Chrome: 71.0.3578.98
const nums = [1, 2, 3, 4, 5, 6, 7, 8, 9];
const results = nums.map(n => {
  const tensors = [];
  for (let i = 0; i < n * 1000; i++) {
    tensors.push(tf.ones([100, 100]));
  }
  const start = performance.now();
  const res = tf.addN(tensors);
  res.dataSync();
  return performance.now() - start;
});
console.log(results);

chart

See: tensorflow/tfjs#989


This change is Reviewable

@Lewuathe Lewuathe changed the title Optimize addN op by limit of the number of textures Optimize addN op by the multiple upload of textures Feb 5, 2019
@@ -0,0 +1,51 @@
/**
* @license
* Copyright 2017 Google Inc. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2019 LLC

@nsthorat
Copy link
Contributor

Looks good overall, one small note is that this isn't going to be supported when packing is enabled. If you feel motivated you can add the packed version of this, otherwise we can do this in a follow up and this LGTM :)

@Lewuathe
Copy link
Contributor Author

@nsthorat Thanks! Is there any good reference (or code) to implement packed version?

@nsthorat
Copy link
Contributor

Check out the code here for binary ops that are packed: /~https://github.com/tensorflow/tfjs-core/blob/master/src/kernels/webgl/binaryop_packed_gpu.ts

@Lewuathe
Copy link
Contributor Author

@nsthorat Thank you! Try to take a look.

@nsthorat
Copy link
Contributor

nsthorat commented Apr 8, 2019

Let me know when this is ready to review :)

@Lewuathe
Copy link
Contributor Author

Lewuathe commented Apr 9, 2019

@nsthorat Ah, sorry. It's basically ready for the review.

Copy link
Contributor

@nsthorat nsthorat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r1, 2 of 3 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @Lewuathe)


src/kernels/webgl/addn_gpu.ts, line 3 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

2019 LLC

This license still needs updating 2019 Google LLC.


src/kernels/webgl/addn_packed_gpu.ts, line 3 at r2 (raw file):

/**
 * @license
 * Copyright 2017 Google Inc. All Rights Reserved.

2019 Google LLC


src/kernels/webgl/backend_webgl.ts, line 1474 at r2 (raw file):

  addN<T extends Tensor>(tensors: T[]): T {
    if (tensors.length === 1) {
      return tensors[0];

return tensors[0].clone()


src/kernels/webgl/backend_webgl.ts, line 1490 at r2 (raw file):

    const shapes = tensors.map(t => t.shape);
    // We can make sure shapes are identical in op level.
    const usePackedOp = ENV.getBool('WEBGL_PACK_BINARY_OPERATIONS');

this isn't really a binary operation, I think we might want to make a new flag for this

.reduce((d1, d2) => upcastType(d1, d2));
const shapes = tensors.map(t => t.shape);
// We can make sure shapes are identical in op level.
const usePackedOp = ENV.getBool('WEBGL_PACK_BINARY_OPERATIONS');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just check WEBGL_PACK here rather than checking WEBGL_PACK_BINARY_OPERATIONS or creating a new flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out!

@Lewuathe
Copy link
Contributor Author

@nsthorat @annxingyuan I updated it accordingly. Please take a look when you get a chance.

Copy link
Contributor

@nsthorat nsthorat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r4.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @annxingyuan and @Lewuathe)

@annxingyuan annxingyuan merged commit 32b05de into tensorflow:master Apr 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants