-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-device support #6403
Comments
TODO 2: More Specific Types of Places and DeviceContexts Currently, we have two types of Places: CPUPlace and GPUPlace These are far from sufficient. We might need a hierarchy of more places:
Similarly, we have two device-context class CPUDeviceContext and CUDADeviceContext. We need to add more. |
To support multi-devices, we have to make some refines to our codes now. I have make several issues here:
I also have a question here:
|
maybe we need to add placement in OpDesc during transpiling, because the Program is different if all operators are on CPU or some operators on CPU and other on GPU, we need some operators to do data sync. |
TODO 1. Kernel Selection with Fallback
Our current kernel selection mechanism is defined below
Paddle/paddle/framework/operator.cc
Lines 404 to 429 in 7d85b6d
Please be aware that all our computational operators (i.e., except for control-flow operators like WhileOp and IfElseOp and I/O operators like Send, Recv, ListenAndDo, ReadFile) are derived from the base class
OperatorWithKernels
.Each computational operator class has multiple kernels, each a function making use of a specific acceleration device, e.g., MKL, CUDA, etc.
The
OperatorWithKernel::Run
posted above selects a kernel fromkernel_key
and runs it.Our current implementation assumes that all computational operators in a program run on the same device. However, this is not true. For example, it is technically difficult to implement CRFOp on CUDA, so our CRFOp has only the CPU kernel. So, if we assign a program including the CRFOp to run on a CUDA device, it would crash.
Thus we need a fallback mechanism for finding the right kernel. In particular, we need to change the system to provide a priority list of devices, instead of a single device, to a program. For example,
[ROCm, CUDA, MKL, CPU]
. And we need to change the implementation ofOperatorWithKernel::Run
to take such a priority list, and finds and runs the existing kernel of the highest priority.The text was updated successfully, but these errors were encountered: