-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrating to use native Pytorch AMP #2827
Conversation
First of all, I support removing nvidia apex and adding pytorch amp. 👍 Doing quick research regarding this decision, it's what seems to be the preferred, future-proof way: https://discuss.pytorch.org/t/torch-cuda-amp-vs-nvidia-apex/74994/2 |
Where should this documentation be added? |
Hi @sjrl we are planning a release in the next two weeks. Could this PR maybe make it into the new release? Did you have the chance to test multi-gpu training with AMP? In the todo list there is another open item "For torch.nn.DataParallel it looks like we would need to "apply autocast as part of your model’s forward method to ensure it’s enabled in side threads."" |
FYI: we might upgrade to torch 1.13.1 once it's released. |
Hey @julian-risch. Sorry there was a small miscommunication. I have verified that amp works with I haven't had time to test the multi-gpu training with |
Hey @julian-risch I first tried to get haystack/haystack/modeling/model/optimization.py Lines 71 to 79 in 77cea8b
And even trying to set this to true when using haystack/haystack/nodes/reader/farm.py Line 266 in 77cea8b
causes a multiprocessing error. So it looks like we would need to first fix the distributed training feature before confirming that amp works with it as well. |
So what's your opinion on the best way forward? I'd say we merge the changes that we have up to now and support just |
@julian-risch Yes, I completely agree. |
…th as a parameter in FARMReader.__init__ and FARMReader.train
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good to me! 👍 Thanks for putting in the extra effort and adding a fast test!
Related Issue(s): Issue #1512 Issue #1222
Proposed changes:
Migrating to Pytorch's native AMP https://pytorch.org/docs/stable/notes/amp_examples.html because it is much easier to use (no additional dependency on apex) and needs fewer code changes to support and it's recommended (https://discuss.pytorch.org/t/torch-cuda-amp-vs-nvidia-apex/74994).
Using the native AMP support in Pytorch requires using torch.cuda.amp.autocast and torch.cuda.amp.GradScaler together. These can easily be "turned off" so no autocasting or scaling occurs by passing the option
enabled=False
.For example, the following code performs a standard training loop because we have passed the option
enabled=False
to bothautocast
andGradScaler
And similarly it is easy to turn on AMP by passing
enabled=True
.There are breaking changes:
use_amp
in theFARMReader.train
method is now a typebool
instead of typestr
.Pre-flight checklist
TODO
FARMReader.train
works withuse_amp=True
(andFalse
) on a single GPUuse_amp
turned on (and off) and withgrad_acc_steps
use_amp
to trainer state dict so when restarting from a checkpoint AMP is set up as expectednvidia-smi
. It appears the "apply autocast as part of your model’s forward method to ensure it’s enabled in side threads." statement only refers to when we use multiple GPUs per process. Docs here. From what I understand I believe we only use one GPU per process which is recommended.