Migrating to use native Pytorch AMP #2827

sjrl · 2022-07-15T12:27:00Z

Related Issue(s): Issue #1512 Issue #1222

Proposed changes:
Migrating to Pytorch's native AMP https://pytorch.org/docs/stable/notes/amp_examples.html because it is much easier to use (no additional dependency on apex) and needs fewer code changes to support and it's recommended (https://discuss.pytorch.org/t/torch-cuda-amp-vs-nvidia-apex/74994).

Using the native AMP support in Pytorch requires using torch.cuda.amp.autocast and torch.cuda.amp.GradScaler together. These can easily be "turned off" so no autocasting or scaling occurs by passing the option enabled=False.

For example, the following code performs a standard training loop because we have passed the option enabled=False to both autocast and GradScaler

use_amp = False

# Creates model and optimizer in default precision
model = Net().cuda()
optimizer = optim.SGD(model.parameters(), ...)

# Creates a GradScaler once at the beginning of training.
scaler = GradScaler(enabled=use_amp)

for epoch in epochs:
    for input, target in data:
        optimizer.zero_grad()

        # Runs the forward pass with autocasting.
        with autocast(enable=use_amp):
            output = model(input)
            loss = loss_fn(output, target)

        # Scales loss.  Calls backward() on scaled loss to create scaled gradients.
        # Backward passes under autocast are not recommended.
        # Backward ops run in the same dtype autocast chose for corresponding forward ops.
        scaler.scale(loss).backward()

        # scaler.step() first unscales the gradients of the optimizer's assigned params.
        # If these gradients do not contain infs or NaNs, optimizer.step() is then called,
        # otherwise, optimizer.step() is skipped.
        scaler.step(optimizer)

        # Updates the scale for next iteration.
        scaler.update()

And similarly it is easy to turn on AMP by passing enabled=True.

There are breaking changes:

The input type for use_amp in the FARMReader.train method is now a type bool instead of type str.
Currently this PR deprecates apex and does not try to support Pytorch AMP and Apex AMP.

Pre-flight checklist

I have read the contributors guidelines
If this is a code change, I added tests or updated existing ones
If this is a code change, I updated the docstrings

TODO

Checked that FARMReader.train works with use_amp=True (and False) on a single GPU
Test tutorial 9 DPR training in GPU environment with use_amp turned on (and off) and with grad_acc_steps
Add use_amp to trainer state dict so when restarting from a checkpoint AMP is set up as expected
Test multi-gpu training with AMP
- Works with torch.nn.DistributedDataParallel (Link to docs)
- Works with torch.nn.DataParallel. Tested on 4 GPUs (g4dn.12xlarge instance), confirmed usage using nvidia-smi. It appears the "apply autocast as part of your model’s forward method to ensure it’s enabled in side threads." statement only refers to when we use multiple GPUs per process. Docs here. From what I understand I believe we only use one GPU per process which is recommended.
Open PR in /~https://github.com/deepset-ai/haystack-website editing this file explaining AMP
- I could imagine a section on that page where you briefly describe what AMP is, when to use it and how to use it. Perhaps also give readers an idea of how big a trade off in accuracy / speed it is.

CLAassistant · 2022-07-15T12:33:41Z

All committers have signed the CLA.

haystack/modeling/model/optimization.py

…e_1512

haystack/modeling/training/base.py

…e_1512

julian-risch · 2022-07-19T07:55:45Z

First of all, I support removing nvidia apex and adding pytorch amp. 👍 Doing quick research regarding this decision, it's what seems to be the preferred, future-proof way: https://discuss.pytorch.org/t/torch-cuda-amp-vs-nvidia-apex/74994/2
What we should once this PR is ready to be merged is a small benchmark. And we would also need to ensure that the documentation explains when to use this feature, how to use this feature and what to expect from it.

sjrl · 2022-07-19T08:20:36Z

And we would also need to ensure that the documentation explains when to use this feature, how to use this feature and what to expect from it.

Where should this documentation be added?

…e_1512

julian-risch · 2022-12-08T08:48:56Z

Hi @sjrl we are planning a release in the next two weeks. Could this PR maybe make it into the new release? Did you have the chance to test multi-gpu training with AMP? In the todo list there is another open item "For torch.nn.DataParallel it looks like we would need to "apply autocast as part of your model’s forward method to ensure it’s enabled in side threads.""

julian-risch · 2022-12-08T08:49:38Z

FYI: we might upgrade to torch 1.13.1 once it's released.

sjrl · 2022-12-08T19:34:24Z

Hi @sjrl we are planning a release in the next two weeks. Could this PR maybe make it into the new release? Did you have the chance to test multi-gpu training with AMP? In the todo list there is another open item "For torch.nn.DataParallel it looks like we would need to "apply autocast as part of your model’s forward method to ensure it’s enabled in side threads.""

Hey @julian-risch. Sorry there was a small miscommunication. I have verified that amp works with torch.nn.DataParallel, but I have not verified that it works with torch.nn.DistributedDataParallel. I have updated the checklist in the top message to reflect this.

I haven't had time to test the multi-gpu training with torch.nn.DistributedDataParallel. I can try to get this done before the next release, but I am fairly busy at the moment so I'd also be happy to receive help here if you have the time.

sjrl · 2022-12-08T21:05:11Z

Hey @julian-risch I first tried to get DistributedDataParallel to work today by turning on distributed training (without amp). Right now this option is hardcoded to be off since we are using the default value in the call to initialize_optimizer

haystack/haystack/modeling/model/optimization.py

Lines 71 to 79 in 77cea8b

    
           def initialize_optimizer( 
        
               model: AdaptiveModel, 
        
               n_batches: int, 
        
               n_epochs: int, 
        
               device: torch.device, 
        
               learning_rate: float, 
        
               optimizer_opts: Optional[Dict[Any, Any]] = None, 
        
               schedule_opts: Optional[Dict[Any, Any]] = None, 
        
               distributed: bool = False,

And even trying to set this to true when using initialize_optimizer in the training loop

haystack/haystack/nodes/reader/farm.py

Line 266 in 77cea8b

model, optimizer, lr_schedule = initialize_optimizer(

causes a multiprocessing error. So it looks like we would need to first fix the distributed training feature before confirming that amp works with it as well.

julian-risch · 2023-01-04T12:20:02Z

So what's your opinion on the best way forward? I'd say we merge the changes that we have up to now and support just torch.nn.DataParallel but not torch.nn.DistributedDataParallel. Investigating the multiprocessing error and supporting torch.nn.DistributedDataParallel should then become the topic of a separate issue that we can add to the backlog.

sjrl · 2023-01-04T12:23:33Z

@julian-risch Yes, I completely agree.

…th as a parameter in FARMReader.__init__ and FARMReader.train

julian-risch

Looks very good to me! 👍 Thanks for putting in the extra effort and adding a fast test!

Started making changes to use native Pytorch AMP

4693472

sjrl added the breaking change label Jul 15, 2022

Commented out mentions of apex

db35268

Fixing type errors for use_amp

55de33f

sjrl changed the title ~~Started making changes to use native Pytorch AMP~~ Migrating to use native Pytorch AMP Jul 15, 2022

sjrl requested a review from MichelBartels July 15, 2022 12:41

Update Documentation & Code Style

613b4f9

sjrl commented Jul 15, 2022

View reviewed changes

haystack/modeling/model/optimization.py Show resolved Hide resolved

sjrl added 2 commits July 15, 2022 14:56

Updated compute_loss functions to use torch.cuda.amp.autocast

1ac35fd

Merge branch 'issue_1512' of github.com:deepset-ai/haystack into issu…

4d648b6

…e_1512

sjrl commented Jul 15, 2022

View reviewed changes

haystack/modeling/training/base.py Show resolved Hide resolved

Update Documentation & Code Style

d017fd6

sjrl requested a review from julian-risch July 15, 2022 13:18

sjrl and others added 3 commits July 15, 2022 16:40

Updating docstrings

b24c899

Merge branch 'issue_1512' of github.com:deepset-ai/haystack into issu…

c2e6309

…e_1512

Update Documentation & Code Style

71a1c6b

julian-risch added the topic:speed label Jul 15, 2022

sjrl and others added 3 commits July 19, 2022 08:35

Add use_amp to trainer_checkpoint

a01c2bd

Merge branch 'issue_1512' of github.com:deepset-ai/haystack into issu…

11af9ad

…e_1512

Update Documentation & Code Style

22a1a2d

julian-risch added the action:needs documentation label Jul 19, 2022

sjrl and others added 6 commits July 19, 2022 12:33

Removed mentions of apex and started to add the necessary warnings

6292139

Merge branch 'issue_1512' of github.com:deepset-ai/haystack into issu…

7f2d70f

…e_1512

Update Documentation & Code Style

963e246

Updating docstrings

b061cc3

Merge branch 'issue_1512' of github.com:deepset-ai/haystack into issu…

84b62aa

…e_1512

Merge branch 'master' of github.com:deepset-ai/haystack into issue_1512

074a613

Merge branch 'main' of github.com:deepset-ai/haystack into issue_1512

2b51da0

sjrl force-pushed the issue_1512 branch from 02f2c96 to 2b51da0 Compare September 6, 2022 06:54

sjrl added 3 commits September 6, 2022 09:02

Update docs

5bbe05a

Merge branch 'main' of github.com:deepset-ai/haystack into issue_1512

a1732de

Merge branch 'main' of github.com:deepset-ai/haystack into issue_1512

9420be9

sjrl mentioned this pull request Oct 12, 2022

Data pipeline parallelism doesn't work for multiple GPUs when trying to fine-tune DPR. #3353

Closed

sjrl added 3 commits October 12, 2022 09:56

Exposed option

d21e516

Merge branch 'main' of github.com:deepset-ai/haystack into issue_1512

cc399cd

Merge branch 'main' of github.com:deepset-ai/haystack into issue_1512

571f001

masci linked an issue Nov 28, 2022 that may be closed by this pull request

Pytorch Native AMP support #1222

Closed

Merge branch 'main' of github.com:deepset-ai/haystack into issue_1512

d96ef5a

sjrl added 2 commits December 8, 2022 12:27

Add distributed option to training

2b41741

Fix mypy

ed4fd27

sjrl added 4 commits January 4, 2023 13:26

Merge branch 'main' of github.com:deepset-ai/haystack into issue_1512

c058a37

Remove distributed as an option to train methods

84eb252

Added fast training test for FARMReader. Needed to add max_query_leng…

e11c137

…th as a parameter in FARMReader.__init__ and FARMReader.train

Make max_query_length optional in FARMReader.train

ac55932

julian-risch approved these changes Jan 4, 2023

View reviewed changes

julian-risch added this to the 1.13.0 milestone Jan 4, 2023

Update lg

be5397b

agnieszka-m approved these changes Jan 5, 2023

View reviewed changes

sjrl merged commit e84fae2 into main Jan 5, 2023

sjrl deleted the issue_1512 branch January 5, 2023 08:14

sjrl mentioned this pull request Jan 5, 2023

Migrate from apex to Pytorch's native AMP #1512

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrating to use native Pytorch AMP #2827

Migrating to use native Pytorch AMP #2827

sjrl commented Jul 15, 2022 •

edited

Loading

CLAassistant commented Jul 15, 2022 •

edited

Loading

julian-risch commented Jul 19, 2022

sjrl commented Jul 19, 2022

julian-risch commented Dec 8, 2022

julian-risch commented Dec 8, 2022

sjrl commented Dec 8, 2022 •

edited

Loading

sjrl commented Dec 8, 2022

julian-risch commented Jan 4, 2023

sjrl commented Jan 4, 2023

julian-risch left a comment

Migrating to use native Pytorch AMP #2827

Migrating to use native Pytorch AMP #2827

Conversation

sjrl commented Jul 15, 2022 • edited Loading

Pre-flight checklist

TODO

CLAassistant commented Jul 15, 2022 • edited Loading

julian-risch commented Jul 19, 2022

sjrl commented Jul 19, 2022

julian-risch commented Dec 8, 2022

julian-risch commented Dec 8, 2022

sjrl commented Dec 8, 2022 • edited Loading

sjrl commented Dec 8, 2022

julian-risch commented Jan 4, 2023

sjrl commented Jan 4, 2023

julian-risch left a comment

Choose a reason for hiding this comment

sjrl commented Jul 15, 2022 •

edited

Loading

CLAassistant commented Jul 15, 2022 •

edited

Loading

sjrl commented Dec 8, 2022 •

edited

Loading