Optimize concat for the WebGL backend #1449

dsmilkov · 2018-12-12T15:27:16Z

Speedup WebGL concat by 5x without warmup and 2x with shader warmup.

Benchmark used:

const tensors = [];
for (let i = 0; i < 400; i++) {
  tensors.push(tf.ones([1, 696]));
}
const start = performance.now();
const res = tf.concat(tensors, 0 /* axis */);
res.dataSync();
console.log('Took', performance.now() - start);

Without warmup: 626ms (this PR) vs 3066ms (master)
With warmup: 34ms (this PR) vs 65ms (master)

The benchmark above reflects a real workflow of preparing training data (stacking 400 examples, 696 values each), taken from the Audio recognition codelab. Measuring both with warmup and without warmup is important since in the use-case of collecting examples, the number of examples is dynamic and causes recompilation of the shaders.

Details

Add a new WebGL flag WEBGL_MAX_TEXTURES_IN_SHADER that gives the maximum number of textures we can have as uniform samples in a single shader.
Change concat_gpu to take arbitrary number of inputs.
When concating a large number of tensors, do binary concatenation (divide-and-conquer) to reduce number of concat calls.

PERF

This change is

annxingyuan

Yay! Are there any other ops that work like concat used to that we could quickly optimize?

dsmilkov · 2018-12-12T16:47:13Z

Good q. A quick search for (Tensor[] or T[]) in backend.ts gives addN() as another candidate.

nsthorat

Reviewed 5 of 5 files at r1.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @dsmilkov and @nsthorat)

src/kernels/backend_webgl.ts, line 632 at r1 (raw file):

    }
    if (tensors.length > ENV.get('WEBGL_MAX_TEXTURES_IN_SHADER')) {
      const midIndex = tensors.length >> 2;

can you use division instead of bitshifting here for readability?

dsmilkov

Reviewable status: complete! 2 of 1 approvals obtained

src/kernels/backend_webgl.ts, line 632 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

can you use division instead of bitshifting here for readability?

Done.

dsmilkov added 3 commits December 11, 2018 19:11

save

fd7f9cd

save

8ef881d

save

730ade9

dsmilkov requested review from nsthorat and annxingyuan December 12, 2018 15:27

annxingyuan approved these changes Dec 12, 2018

View reviewed changes

save

d5054b7

dsmilkov mentioned this pull request Dec 12, 2018

Optimize addN in the WebGL tensorflow/tfjs#989

Closed

nsthorat approved these changes Dec 12, 2018

View reviewed changes

save

ff156c1

dsmilkov commented Dec 12, 2018

View reviewed changes

dsmilkov added 2 commits December 12, 2018 15:10

Merge branch 'master' into optimize-concat

62f885c

save

e36d3a0

dsmilkov merged commit f3ee5aa into master Dec 12, 2018

dsmilkov deleted the optimize-concat branch December 12, 2018 22:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize concat for the WebGL backend #1449

Optimize concat for the WebGL backend #1449

dsmilkov commented Dec 12, 2018 •

edited

Loading

annxingyuan left a comment

dsmilkov commented Dec 12, 2018

nsthorat left a comment

dsmilkov left a comment

Optimize concat for the WebGL backend #1449

Optimize concat for the WebGL backend #1449

Conversation

dsmilkov commented Dec 12, 2018 • edited Loading

annxingyuan left a comment

Choose a reason for hiding this comment

dsmilkov commented Dec 12, 2018

nsthorat left a comment

Choose a reason for hiding this comment

dsmilkov left a comment

Choose a reason for hiding this comment

dsmilkov commented Dec 12, 2018 •

edited

Loading