[Speed] concat operator math kernel improvement #8764

dzhwinter · 2018-03-06T02:37:23Z

This issue is part of #8567
@chengduoZH is doing this job.

Analysis the `concat` operation

The input is a list of tensors and axis which indicates the concation axis. The shape of input's tensor can be any, but only the dimension of axis can be different.
For example, the input is two tensors.

case 1:
- t_a's shape: [9,2,3,4]
- t_b's shape:[3,2,3,4]
- axis = 0,

Obviously, the output's shape is [12,2,3,4]. To simply solve this case, we can reshape t_a to [9, 24] and t_b to [3, 24], finally concate the two tensor longitudinally. The output's shape is [12, 24]. In this case, we only copy two times.

case 2:
- t_a's shape: [9,2,3,4]
- t_b's shape:[9,3,3,4]
- axis = 1,

To simply solve this case, we can reshape t_a to [9, 2, 12] and t_b to [9, 3, 12], finally concate the two tensor on the second axis. The output's shape is [9,5,12]. In this case, we should copy 18 times.

case 3:
- t_a's shape: [9,2,3,4]
- t_b's shape:[9,2,3,3]
- axis = 3,

Firstly, we reshape t_a to [54, 4] and t_b to [54, 3], finally concate the two tensor horizontally. The output's shape is [54, 7]. This is the worst case, we should copy 108 times.

TODO

use one Cuda kernel to complete those copies. All of those cases can be solved by one strategy.

The text was updated successfully, but these errors were encountered:

chengduoZH · 2018-03-07T06:59:03Z

I have completed the current stage of optimizing the concat_op.(#8669)
To further optimizing, maybe we need use the multi-stream strategy.

dzhwinter changed the title ~~[Operator]concat math kernel improvement~~ [Speed] concat operator math kernel improvement Mar 6, 2018

chengduoZH closed this as completed Mar 7, 2018

chengduoZH self-assigned this Mar 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Speed] concat operator math kernel improvement #8764

[Speed] concat operator math kernel improvement #8764

dzhwinter commented Mar 6, 2018 •

edited

Loading

chengduoZH commented Mar 7, 2018

[Speed] concat operator math kernel improvement #8764

[Speed] concat operator math kernel improvement #8764

Comments

dzhwinter commented Mar 6, 2018 • edited Loading

Analysis the concat operation

chengduoZH commented Mar 7, 2018

dzhwinter commented Mar 6, 2018 •

edited

Loading

Analysis the `concat` operation