-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low bit optimizers quality #744
Comments
There was a bug with how we handled LR schedulers that make things go faster now #736 |
So nothing functionally changed and the optimizers are faster -> you were able to test on a larger model? |
Largest model we tried was on the order of 500M parameters, for 1B+ try it out and feel free share loss curves |
Did you test pretraining or just finetuning only? Thanks |
This was a finetuning benchmark Benchmark script for fine-tuning a timm model on resisc45 dataset is available at benchmarks/benchmark_low_bit_adam.py We were hoping to expand to more fine-tuning benchmarks /~https://github.com/pytorch/torchtune here And we're doing with fully quantized training here /~https://github.com/pytorch/ao/tree/main/torchao/prototype/quantized_training If you're doing research in this space feel free to reach out, the focus on the team now is more on infra and making sure things run fast |
I tried using the AO 8 bit optimizer in nanoGPT by swapping out AdamW with Adam8bit here and keeping the embedding and non-decayed parameters (layernorms etc) in a regular torch AdamW instance. This strategy seems to work with the lpmm codebase, where I get close to FP32 performance, but with AO I get severely degraded performance. Is there anything special I need to do with the AO low bit optimizers? Also, I wasn't able to get the AO 4 bit optimizer to work out of the box. I had to disable the compile call here or compile would complain about the input to |
Yeah for sure try to use ao and PyTorch nightlies. In the meantime @gau-nernst might make sense to do convergence benchmarks with at least llama 8B @tsengalb99 do you mind sharing more details on what you're observing too? What size of model? Loss curves? |
@tsengalb99 Yes, having more details about your training would be great to debug the issue. Input to ao/torchao/prototype/low_bit_optim/adam.py Lines 145 to 146 in af68031
Would you mind trying bnb AdamW8bit too? Our implementation should match bnb's one exactly. Also, I notice you are using ao Adam8bit. For a fair comparison, shouldn't AdamW8bit be used instead? @msaroufim Regarding llama-8B, are you thinking pre-training or fine-tuning? Should be a drop-in replacement for torchtune and torchtitan. I think @awgu ran a small test before? |
I was thinking of prioritizing larger scale finetuning experiments first and pretraining more of a hail mary |
Yes, that's where the error is being thrown
Ah, I totally forgot to put AdamW and not Adam. That's probably the issue - thanks! |
@tsengalb99 is this still an issue? |
No - it turns out the issue was Adam vs AdamW so not AO related. AO 4 bit matches lpmm on my pretraining experiment.
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Mark Saroufim ***@***.***>
Sent: Monday, August 26, 2024 4:55:58 PM
To: pytorch/ao ***@***.***>
Cc: Albert Tseng ***@***.***>; Mention ***@***.***>
Subject: Re: [pytorch/ao] Low bit optimizers quality (Issue #744)
@tsengalb99</~https://github.com/tsengalb99> is this still an issue?
—
Reply to this email directly, view it on GitHub<#744 (comment)>, or unsubscribe</~https://github.com/notifications/unsubscribe-auth/AH6WZSE3373IXUQ54ET6ENTZTO6A5AVCNFSM6AAAAABNBYW7ZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJRGMYDSMJWGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@tsengalb99 Was your error with Adam4bit like this?
I'm seeing this error with PyTorch 2.4. Seems like a torch.compile issue. @msaroufim Turns out our CI skip 4-bit optimizer for 2.4 ao/test/prototype/test_low_bit_optim.py Lines 85 to 86 in 37276d6
I think I initially set this line to @SunMarc Did you manage to successfully run 4-bit Adam when you implemented huggingface/transformers#31865? It seems like you added a test but it didn't run in CI since torchao was not installed? The only way to run 4-bit optim right now is to use PyTorch nightly 🌚. As of now, HF trainer is still guarding against PyTorch>=2.3 I will update the doc and make the test clearer. If I have time, maybe I try to make it work for 2.4 also. If our 4-bit Adam doesn't work for latest stable release, feel like it will hinder people trying it out. |
Indeed, i'm facing the same issue as you without pytorch nightly. When I opened this PR, I still needed to install torchao nightly for 4-bit optimizer and this might have worked with prior version of torch. I will update the req to let the user know that he needs to install torch nightly |
Yep, that's the issue I'm seeing. |
Sorry for the regression! I have fixed it in #755. Now 4-bit optim works in PyTorch 2.3 again now. It slips through our tests previously, and I have updated the tests to make sure this won't happen again. |
I saw that the quality numbers in /~https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim changed recently. Was there a bug in the AO low bit optimizer implementation before?
The text was updated successfully, but these errors were encountered: