Skip to content
This repository has been archived by the owner on Aug 15, 2019. It is now read-only.

Optimize concat for the WebGL backend #1449

Merged
merged 7 commits into from
Dec 12, 2018
Merged

Optimize concat for the WebGL backend #1449

merged 7 commits into from
Dec 12, 2018

Conversation

dsmilkov
Copy link
Contributor

@dsmilkov dsmilkov commented Dec 12, 2018

Speedup WebGL concat by 5x without warmup and 2x with shader warmup.

Benchmark used:

const tensors = [];
for (let i = 0; i < 400; i++) {
  tensors.push(tf.ones([1, 696]));
}
const start = performance.now();
const res = tf.concat(tensors, 0 /* axis */);
res.dataSync();
console.log('Took', performance.now() - start);

Without warmup: 626ms (this PR) vs 3066ms (master)
With warmup: 34ms (this PR) vs 65ms (master)

The benchmark above reflects a real workflow of preparing training data (stacking 400 examples, 696 values each), taken from the Audio recognition codelab. Measuring both with warmup and without warmup is important since in the use-case of collecting examples, the number of examples is dynamic and causes recompilation of the shaders.

Details

  • Add a new WebGL flag WEBGL_MAX_TEXTURES_IN_SHADER that gives the maximum number of textures we can have as uniform samples in a single shader.
  • Change concat_gpu to take arbitrary number of inputs.
  • When concating a large number of tensors, do binary concatenation (divide-and-conquer) to reduce number of concat calls.

PERF


This change is Reviewable

Copy link
Collaborator

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay! Are there any other ops that work like concat used to that we could quickly optimize?

@dsmilkov
Copy link
Contributor Author

Good q. A quick search for (Tensor[] or T[]) in backend.ts gives addN() as another candidate.

Copy link
Contributor

@nsthorat nsthorat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r1.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @dsmilkov and @nsthorat)


src/kernels/backend_webgl.ts, line 632 at r1 (raw file):

    }
    if (tensors.length > ENV.get('WEBGL_MAX_TEXTURES_IN_SHADER')) {
      const midIndex = tensors.length >> 2;

can you use division instead of bitshifting here for readability?

Copy link
Contributor Author

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 1 approvals obtained


src/kernels/backend_webgl.ts, line 632 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

can you use division instead of bitshifting here for readability?

Done.

@dsmilkov dsmilkov merged commit f3ee5aa into master Dec 12, 2018
@dsmilkov dsmilkov deleted the optimize-concat branch December 12, 2018 22:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants