Multi-device support #6403

wangkuiyi · 2017-12-08T04:22:52Z

TODO 1. Kernel Selection with Fallback

Our current kernel selection mechanism is defined below

Lines 404 to 429 in 7d85b6d

    
           void OperatorWithKernel::Run(const Scope& scope, 
        
                                        const platform::DeviceContext& dev_ctx) const { 
        
             RuntimeInferShapeContext infer_shape_ctx(*this, scope); 
        
             this->InferShape(&infer_shape_ctx); 
        
             ExecutionContext ctx(*this, scope, dev_ctx); 
        
             // check if op[type] has kernel registered. 
        
             auto& all_op_kernels = AllOpKernels(); 
        
             auto kernels_iter = all_op_kernels.find(type_); 
        
             if (kernels_iter == all_op_kernels.end()) { 
        
               PADDLE_THROW( 
        
                   "There are no kernels which are registered in the %s operator.", type_); 
        
             } 
        
             // check if op[type] have kernel for kernel_key 
        
             OpKernelMap& kernels = kernels_iter->second; 
        
             auto kernel_key = GetKernelType(ctx); 
        
             auto kernel_iter = kernels.find(kernel_key); 
        
             if (kernel_iter == kernels.end()) { 
        
               PADDLE_THROW("The operator %s does not support %s", type_, kernel_key); 
        
             } 
        
             kernel_iter->second->Compute(ctx); 
        
           }

Please be aware that all our computational operators (i.e., except for control-flow operators like WhileOp and IfElseOp and I/O operators like Send, Recv, ListenAndDo, ReadFile) are derived from the base class OperatorWithKernels.

Each computational operator class has multiple kernels, each a function making use of a specific acceleration device, e.g., MKL, CUDA, etc.

The OperatorWithKernel::Run posted above selects a kernel from kernel_key and runs it.

Our current implementation assumes that all computational operators in a program run on the same device. However, this is not true. For example, it is technically difficult to implement CRFOp on CUDA, so our CRFOp has only the CPU kernel. So, if we assign a program including the CRFOp to run on a CUDA device, it would crash.

Thus we need a fallback mechanism for finding the right kernel. In particular, we need to change the system to provide a priority list of devices, instead of a single device, to a program. For example, [ROCm, CUDA, MKL, CPU]. And we need to change the implementation of OperatorWithKernel::Run to take such a priority list, and finds and runs the existing kernel of the highest priority.

The text was updated successfully, but these errors were encountered:

wangkuiyi · 2017-12-08T04:29:08Z

TODO 2: More Specific Types of Places and DeviceContexts

Currently, we have two types of Places: CPUPlace and GPUPlace

These are far from sufficient. We might need a hierarchy of more places:

CPUPlace -- X32Place
            X64Place -- MKLPlace
            ARMPlace -- NeonPlace
GPUPlace -- CUDAPlace
            ROCmPlace

Similarly, we have two device-context class CPUDeviceContext and CUDADeviceContext. We need to add more.

QiJune · 2017-12-08T07:27:32Z

To support multi-devices, we have to make some refines to our codes now. I have make several issues here:

need to add initialization stage to fluid need to add initialization stage to fluid #6416
need to refine the design of DeviceContext need to refine the design of DeviceContext #6415
need to refine our code and make it more general to support multi-device need to refine our code and make it more general to support multi-device #6417

I also have a question here:

Should we add device field in our framwork.proto Should we add device field in our framwork.proto #6035

helinwang · 2017-12-12T01:00:35Z

Thus we need a fallback mechanism for finding the right kernel.

Another approach is to explicitly set the placement in the OpDesc during transpiling, as @QiJune mentioned in "Should we add device field in our framwork.proto #6035 "

jacquesqiao · 2017-12-13T02:49:15Z

maybe we need to add placement in OpDesc during transpiling, because the Program is different if all operators are on CPU or some operators on CPU and other on GPU, we need some operators to do data sync.

wangkuiyi assigned abhinavarora and QiJune Dec 8, 2017

QiJune mentioned this issue Dec 11, 2017

Refine device context #6433

Merged

dzhwinter mentioned this issue Dec 11, 2017

Paddle Place split into multiplace #6454

Closed

tonyyang-svail added the MultiDevices label Dec 11, 2017

tonyyang-svail added this to the Release 0.11.1 milestone Dec 11, 2017

QiJune mentioned this issue Dec 11, 2017

统一MKL的宏定义命名 #6452

Closed

helinwang assigned helinwang and unassigned helinwang Dec 12, 2017

jacquesqiao mentioned this issue Dec 13, 2017

auto data sync between different devices #6549

Closed

shanyi15 closed this as completed Aug 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-device support #6403

Multi-device support #6403

wangkuiyi commented Dec 8, 2017 •

edited by QiJune

Loading

wangkuiyi commented Dec 8, 2017 •

edited by QiJune

Loading

QiJune commented Dec 8, 2017 •

edited

Loading

helinwang commented Dec 12, 2017

jacquesqiao commented Dec 13, 2017

Multi-device support #6403

Multi-device support #6403

Comments

wangkuiyi commented Dec 8, 2017 • edited by QiJune Loading

wangkuiyi commented Dec 8, 2017 • edited by QiJune Loading

QiJune commented Dec 8, 2017 • edited Loading

helinwang commented Dec 12, 2017

jacquesqiao commented Dec 13, 2017

wangkuiyi commented Dec 8, 2017 •

edited by QiJune

Loading

wangkuiyi commented Dec 8, 2017 •

edited by QiJune

Loading

QiJune commented Dec 8, 2017 •

edited

Loading