-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
Is the reason that the CUDA context Is not shared across different processes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for the fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes cuda doesn't work well with fork. @arcadiaphy this PR looks good to me . Feel free to add another PR to improve docs for forking. Thank you !
@wkcn Naively sharing CUDA context across processes by forking will not work, I'm not sure if it's possible at all. |
I found an answer about |
@wkcn I've read this answer, it's possible to use separate device in forked process. But it still doesn't work for different device in mxnet if the main process has created the context, perhaps some cleanup needs to be done in pthread_at_fork. |
* fix custom op fork test * trigger CI
Description
The custom op fork test introduced in #14451 will cause error when running with gpu. The common situation is:
When CUDA context is created in main process, the forking process tries to access the same context, causing initialization error.
This PR adds checking on exitcode of forking process and removes this test in gpu tests.
BTW, right now the correct way to fork mxnet is to do it when the CUDA context is not created, otherwise CUDA error is very likely to happen, maybe we should add some warning in docs?
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments