add memory switch mechanism in operator kernel switch #6991

QiJune · 2017-12-25T08:38:56Z

jacquesqiao · 2017-12-25T08:52:50Z

paddle/framework/operator.cc

+    // TODO(qijun) get appropriate DeviceContext from DeviceContext pool
+    platform::DeviceContext* trans_dev_ctx = nullptr;
+
+    // TODO(qijun) get appropriate DataTransformFn from global map


This is in progress in /~https://github.com/PaddlePaddle/Paddle/pull/6953/files

Yes, I see. I think that the interface of DataTransFormFn should be like this:

using DataTransformFn = std::function<void( const Variable& in, Variable* out, platform::DeviceContext* ctx)>;

we should take variable as parameter, since not all data are LOD_TENSOR.

DeviceContext should be taken as a parameter to provide necessary handles.

ok, the interface will be updated~

the two variable may be in two different DeviceContext, is one DeviceContext enough?

maybe the interface should be like

using DataTransformFn = std::function< void( const KernelTypePair& pair, const platform::ExecutionContext& ctx, const Tensor& in, Tensor* out )>;

DataTransformFn should get device contextes it needed according to ExecutionContext and KernelTypePair

@jacquesqiao Do you mean this case?

MKLOp --> CUDAOp

We have to get two DeviceContext from global DeviceContext pool, one is MKLDNNDeviceContext, the other is CUDADeviceContext. But we can not transform mkl data to cuda data directly. We must transform mkl data to cpu data, then transform cpu data to cuda data. So, here we may have to do transformation twice.

So, the interface could be like this:

using DataTransformFn = std::function<void( const Variable& in, Variable* out, vector<platform::DeviceContext*> ctx)>;

@jacquesqiao Let's make the interface cleaner, we can get appropriate DeviceContext according to ExecutionContext and KernelTypePair.
Let's do it before DataTransform.

ok, I think prepare a vector of DeviceContext outside is ok~

I think to prepare a vector outside have a problem, the Operator::Run() will have to understand how many and what kind of device_contextes the certain transform_fn needs. On the other hand, the transform_fn need to know which DeviceContext in the Vector it needs for one variable. maybe now we should let the transform_fn itself to handle it, this will be easier and clearer.

DataTransformFn is general and should work for every operator and can be used in any other case if we want to transform data.
So, just let Operator::Run does these dirty work, like getting appropriate DeviceContext, justifying if a variable should be transformed or not. Anyway, we have to write such codes.

the transform_fn need to know which DeviceContext in the Vector it needs for one variable.

The DataTransformFn does not need to know, the caller of DataTransformFn needs to know. DataTransformFn just transform data.

using DataTransformFn = std::function<void( const Variable& in, Variable* out, vector<platform::DeviceContext*> ctx)>;

The caller has to pass correct variables and device context to DataTransFormFn.

dzhwinter · 2017-12-25T08:56:49Z

paddle/framework/operator.cc

+
+    // TODO(qijun) get appropriate DataTransformFn from global map
+    using DataTransformFn = std::function<void(
+        const Variable& in, Variable* out, platform::DeviceContext* ctx)>;


Variable = > Tensor?
We only use Tensor for kernel computing.

I am not quite sure why here need to be Variable but not Tensor, can you guys explain a bit?

jacquesqiao · 2017-12-25T09:11:47Z

paddle/framework/operator.cc

+        const Variable& in, Variable* out, platform::DeviceContext* ctx)>;
+    DataTransformFn trans_fun = nullptr;
+
+    for (auto var_name : input_vars) {


here has a problem that maybe not all the input vars need to be transformed

I have not think out an elegant solution yet. Maybe we can make some hard codes before make data transform, just like

auto input_vars = this->InputVars(); if (op_type == "blabla") { input_vars.erase(...); } else if () { ... }

QiJune · 2017-12-26T06:00:33Z

paddle/framework/operator.cc

+  if (actual_kernel_key == expected_kernel_key) {
+    kernel_iter->second->Compute(ctx);
+  } else {
+    Scope& op_scope = scope.NewScope();


@reyoung @dzhwinter @jacquesqiao I find that we can not cache the transformed result variables in current scope in order to reduce the transform times. Following is an example:

/ op2 op1 --- \ op3

The output of op1 is the input of op2 and op3.

If we make cache in current scope,

In the first batch training:
op2 runs first and creates a new variable (var_name + KernelType) and make data transform.
Then, op3 will check if this variable has been created or not. Since this new variable has been created by op2, op3 will directly use it and has no need to make data transform.

In the second batch training:
We have to make data transform again. But we still check if the new variable is created, the data transform will be skipped.

I check the Executor, in every batch, the local scope will be deleted. So this problem will not happen. I will change the cache to local scope instead of op scope.

since each batch will create a new local_scope, add a cache seems can work for our framework.

… memory_switch

jacquesqiao

LGTM！

QiJune added 2 commits December 25, 2017 16:30

add memory switch mechanism in operator kernel switch

c901a18

add wait

7c3ad74

QiJune requested review from reyoung, jacquesqiao and dzhwinter December 25, 2017 08:40

jacquesqiao reviewed Dec 25, 2017

View reviewed changes

fix ci

2f37231

dzhwinter reviewed Dec 25, 2017

View reviewed changes

jacquesqiao reviewed Dec 25, 2017

View reviewed changes

QiJune force-pushed the memory_switch branch from 027b023 to 2f37231 Compare December 26, 2017 05:38

QiJune commented Dec 26, 2017

View reviewed changes

dzhwinter mentioned this pull request Dec 26, 2017

Refine current codes to support multi-devices 2 #6746

Closed

26 tasks

QiJune added 5 commits December 26, 2017 15:18

Merge branch 'memory_switch' of /~https://github.com/QiJune/Paddle into…

b8ed9be

… memory_switch

Merge remote-tracking branch 'baidu/develop' into memory_switch

74f0f7f

Merge remote-tracking branch 'baidu/develop' into memory_switch

41e11cc

merge baidu/develop

aea3de0

wait before data transform

f1d5fc4

jacquesqiao approved these changes Dec 27, 2017

View reviewed changes

QiJune merged commit 94096ae into PaddlePaddle:develop Dec 27, 2017

QiJune mentioned this pull request Dec 27, 2017

cache memory in local scope #7058

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add memory switch mechanism in operator kernel switch #6991

add memory switch mechanism in operator kernel switch #6991

QiJune commented Dec 25, 2017

jacquesqiao Dec 25, 2017

QiJune Dec 25, 2017 •

edited

Loading

jacquesqiao Dec 25, 2017 •

edited

Loading

jacquesqiao Dec 25, 2017

jacquesqiao Dec 25, 2017

QiJune Dec 25, 2017 •

edited

Loading

QiJune Dec 25, 2017

jacquesqiao Dec 25, 2017

jacquesqiao Dec 26, 2017 •

edited

Loading

QiJune Dec 26, 2017 •

edited

Loading

dzhwinter Dec 25, 2017

dzhwinter Dec 25, 2017

jacquesqiao Dec 26, 2017

jacquesqiao Dec 25, 2017

QiJune Dec 25, 2017 •

edited

Loading

QiJune Dec 26, 2017

QiJune Dec 27, 2017

jacquesqiao Dec 27, 2017

jacquesqiao left a comment

add memory switch mechanism in operator kernel switch #6991

add memory switch mechanism in operator kernel switch #6991

Conversation

QiJune commented Dec 25, 2017

Choose a reason for hiding this comment

QiJune Dec 25, 2017 • edited Loading

Choose a reason for hiding this comment

jacquesqiao Dec 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QiJune Dec 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacquesqiao Dec 26, 2017 • edited Loading

Choose a reason for hiding this comment

QiJune Dec 26, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QiJune Dec 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacquesqiao left a comment

Choose a reason for hiding this comment

QiJune Dec 25, 2017 •

edited

Loading

jacquesqiao Dec 25, 2017 •

edited

Loading

QiJune Dec 25, 2017 •

edited

Loading

jacquesqiao Dec 26, 2017 •

edited

Loading

QiJune Dec 26, 2017 •

edited

Loading

QiJune Dec 25, 2017 •

edited

Loading