-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable the detection of subgraph composed of grad ops #21223
Conversation
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop
* Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop
8b4e7d0
to
ead917c
Compare
…ittest. test=develop
ead917c
to
e4f20ca
Compare
d0c45d4
to
32ac208
Compare
2328611
to
0127093
Compare
test=develop
c28a79d
to
d887700
Compare
…_role for fused op. test=develop
0b8ebe0
to
dc8263b
Compare
test=develop
dc8263b
to
1a5ebe9
Compare
test=develop
1874258
to
0e96b09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good code, I put some tiny readability issues. Other things LGTM.
@@ -370,6 +373,12 @@ ir::Graph *BuildStrategy::Apply(ir::Graph *graph, | |||
"GPU, skipped."; | |||
continue; | |||
} | |||
} else if (pass->Type() == "fusion_group_pass") { | |||
pass->Set("use_gpu", new bool(use_cuda)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny issue:
pass->Set("use_gpu", new bool(use_cuda))
I saw other "passes" wrote Set<bool>
instead of Set
. Do you have to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I'd like to use Set<bool>
, I will make the code more clear.
y_shape = y->Var()->GetShape(); | ||
} | ||
if (x_shape.size() == 0U || x_shape.size() != y_shape.size()) { | ||
static bool IsEqual(const std::vector<int64_t>& l, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Are you implementing "equality" for std::vector<int64_t>? I think "l == r" may work because std::vector has "=="
-
If l.size() == 0 or r.size() == 0, also returns "false". So if you need this, I suggest to rename function as:
EqualAndNotEmtpy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you implementing "equality" for std::vector<int64_t>? I think "l == r" may work because std::vector has "=="
Thanks a lot to this. It will help me a lot.
If l.size() == 0 or r.size() == 0, also returns "false". So if you need this, I suggest to rename function as: EqualAndNotEmtpy
Yes. I need them not empty. I changed the name to IsEqualAndNotEmpty
.
|
||
class FusionGroupPaddingRNNTest(PaddingRNNTestBase): | ||
def set_customed_config(self): | ||
# Enable fusion_group_pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that we don't need this comment because the following code contains same information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
|
||
class PaddingRNNTestBase(unittest.TestCase): | ||
def setUp(self): | ||
self.reader = Reader() | ||
self.device_count = 1 | ||
|
||
def prepare_program(self, config, parallel=True): | ||
# Default exec_strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we delete the comment "Default exec_strategy"?
Because you initialize a default exec_strategy but set some values. Then it is not default, I think. Same as "Default build_strategy"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean this is the default build_strategy and exec_strategy used for this program. I will refine the annotate to make it not ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
前景介绍
#19884 中实现的子图匹配方法是:
X
,则将var加入子图,然后将var的输出elementwise类型的op节点加入子图。总的原理是:从一个var节点出发,将与该节点互联的elementwise类型的op节点以及它的输入、输出节点都加入子图。该子图匹配方法存在一个问题,如下图所示:
左图中ABC均为elementwise类型的op节点,并且互联,原子图匹配方法会将ABC匹配到一个子图中,从而变成右图所示,会在图中引入环。
子图匹配方法在Paddle中早有实现,TensorRT、ngraph集成都是采用子图匹配的方式,一个通用的子图匹配方法原先实现在
inference/analysis/subgraph_detector.h/.cc
中,#22094 将subgraph_detector
源码移动到了framework/ir
目录下。这个PR的工作:
subgraph_detector
,可成功匹配图中的前向、反向子图。BuildStrategy
中加入接口enable_auto_fusion
,来控制是否打开fusion_group功能。build_strategy.enable_auto_fusion = True