Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-device support #6403

Closed
wangkuiyi opened this issue Dec 8, 2017 · 4 comments
Closed

Multi-device support #6403

wangkuiyi opened this issue Dec 8, 2017 · 4 comments
Assignees

Comments

@wangkuiyi
Copy link
Collaborator

wangkuiyi commented Dec 8, 2017

TODO 1. Kernel Selection with Fallback

Our current kernel selection mechanism is defined below

void OperatorWithKernel::Run(const Scope& scope,
const platform::DeviceContext& dev_ctx) const {
RuntimeInferShapeContext infer_shape_ctx(*this, scope);
this->InferShape(&infer_shape_ctx);
ExecutionContext ctx(*this, scope, dev_ctx);
// check if op[type] has kernel registered.
auto& all_op_kernels = AllOpKernels();
auto kernels_iter = all_op_kernels.find(type_);
if (kernels_iter == all_op_kernels.end()) {
PADDLE_THROW(
"There are no kernels which are registered in the %s operator.", type_);
}
// check if op[type] have kernel for kernel_key
OpKernelMap& kernels = kernels_iter->second;
auto kernel_key = GetKernelType(ctx);
auto kernel_iter = kernels.find(kernel_key);
if (kernel_iter == kernels.end()) {
PADDLE_THROW("The operator %s does not support %s", type_, kernel_key);
}
kernel_iter->second->Compute(ctx);
}

Please be aware that all our computational operators (i.e., except for control-flow operators like WhileOp and IfElseOp and I/O operators like Send, Recv, ListenAndDo, ReadFile) are derived from the base class OperatorWithKernels.

Each computational operator class has multiple kernels, each a function making use of a specific acceleration device, e.g., MKL, CUDA, etc.

The OperatorWithKernel::Run posted above selects a kernel from kernel_key and runs it.

Our current implementation assumes that all computational operators in a program run on the same device. However, this is not true. For example, it is technically difficult to implement CRFOp on CUDA, so our CRFOp has only the CPU kernel. So, if we assign a program including the CRFOp to run on a CUDA device, it would crash.

Thus we need a fallback mechanism for finding the right kernel. In particular, we need to change the system to provide a priority list of devices, instead of a single device, to a program. For example, [ROCm, CUDA, MKL, CPU]. And we need to change the implementation of OperatorWithKernel::Run to take such a priority list, and finds and runs the existing kernel of the highest priority.

@wangkuiyi
Copy link
Collaborator Author

wangkuiyi commented Dec 8, 2017

TODO 2: More Specific Types of Places and DeviceContexts

Currently, we have two types of Places: CPUPlace and GPUPlace

These are far from sufficient. We might need a hierarchy of more places:

CPUPlace -- X32Place
            X64Place -- MKLPlace
            ARMPlace -- NeonPlace
GPUPlace -- CUDAPlace
            ROCmPlace

Similarly, we have two device-context class CPUDeviceContext and CUDADeviceContext. We need to add more.

@QiJune
Copy link
Member

QiJune commented Dec 8, 2017

To support multi-devices, we have to make some refines to our codes now. I have make several issues here:

I also have a question here:

@helinwang
Copy link
Contributor

Thus we need a fallback mechanism for finding the right kernel.

Another approach is to explicitly set the placement in the OpDesc during transpiling, as @QiJune mentioned in "Should we add device field in our framwork.proto #6035 "

@helinwang helinwang assigned helinwang and unassigned helinwang Dec 12, 2017
@jacquesqiao
Copy link
Member

maybe we need to add placement in OpDesc during transpiling, because the Program is different if all operators are on CPU or some operators on CPU and other on GPU, we need some operators to do data sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants