Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LR scheduler issue with CPU offload optimizer #1649

Merged
merged 4 commits into from
Feb 2, 2025

Conversation

gau-nernst
Copy link
Collaborator

@gau-nernst gau-nernst commented Feb 1, 2025

As @ngc92 pointed out, there should be a synchronization to make sure param H2D finishes before the next forward pass. Hence, I added it. Thank you for the notice!

I also take this opportunity to address an issue regarding PyTorch's built-in LR schedulers (#959, #1209)

Edit: since synchronization issue is fixed by #1650, this PR will only address the LR scheduler issue

Copy link

pytorch-bot bot commented Feb 1, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1649

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit dc7bbce with merge base 122eb73 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 1, 2025
@gau-nernst gau-nernst added the topic: bug fix Use this tag for PRs that fix bugs label Feb 1, 2025
@gau-nernst gau-nernst changed the title Fix some CPU offload optimizer issues Fix LR scheduler issue with CPU offload optimizer Feb 2, 2025
@gau-nernst gau-nernst merged commit 6ffe236 into pytorch:main Feb 2, 2025
16 of 17 checks passed
@gau-nernst gau-nernst deleted the optim_offload_fix branch February 2, 2025 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: bug fix Use this tag for PRs that fix bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants