Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Speed] concat operator math kernel improvement #8764

Closed
1 task done
dzhwinter opened this issue Mar 6, 2018 · 1 comment
Closed
1 task done

[Speed] concat operator math kernel improvement #8764

dzhwinter opened this issue Mar 6, 2018 · 1 comment
Assignees

Comments

@dzhwinter
Copy link
Contributor

dzhwinter commented Mar 6, 2018

This issue is part of #8567
@chengduoZH is doing this job.

Analysis the concat operation

The input is a list of tensors and axis which indicates the concation axis. The shape of input's tensor can be any, but only the dimension of axis can be different.
For example, the input is two tensors.

  • case 1:
    • t_a's shape: [9,2,3,4]
    • t_b's shape:[3,2,3,4]
    • axis = 0,

Obviously, the output's shape is [12,2,3,4]. To simply solve this case, we can reshape t_a to [9, 24] and t_b to [3, 24], finally concate the two tensor longitudinally. The output's shape is [12, 24]. In this case, we only copy two times.

  • case 2:
    • t_a's shape: [9,2,3,4]
    • t_b's shape:[9,3,3,4]
    • axis = 1,

To simply solve this case, we can reshape t_a to [9, 2, 12] and t_b to [9, 3, 12], finally concate the two tensor on the second axis. The output's shape is [9,5,12]. In this case, we should copy 18 times.

  • case 3:
    • t_a's shape: [9,2,3,4]
    • t_b's shape:[9,2,3,3]
    • axis = 3,

Firstly, we reshape t_a to [54, 4] and t_b to [54, 3], finally concate the two tensor horizontally. The output's shape is [54, 7]. This is the worst case, we should copy 108 times.

TODO

  • use one Cuda kernel to complete those copies. All of those cases can be solved by one strategy.
@dzhwinter dzhwinter changed the title [Operator]concat math kernel improvement [Speed] concat operator math kernel improvement Mar 6, 2018
@chengduoZH
Copy link
Contributor

I have completed the current stage of optimizing the concat_op.(#8669)
To further optimizing, maybe we need use the multi-stream strategy.

@chengduoZH chengduoZH self-assigned this Mar 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants