[OpenCL][kernel] Set PriorBox and PriorBoxVar as const weights in Box coder #5930
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
【问题】
SSD 网络结构中有 6 个
prior_box
,该 op 没有 opencl 实现,因此 SSD 网络在 opencl 后端执行时在此处会额外带来 12 次 io_copy 和 layout_cast 的开销,严重影响性能。【分析】
#5788 已新增了一个消融 pass
ssd_boxes_calc_offline_pass
,用来离线计算prior_box
,同时该 pass 内对prior_box
后的reshape/flatten
和concat
进行了融合,本 PR 只需适配应用该 pass 即可,即将box_coder
的 输入参数PriorBox
和PriorBoxVar
当成 weights 常量。【效果】
原始模型结构:
应用pass后的精简模型结构: