Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DeepSeek] Enable checkpoint load from HF #908

Merged
merged 1 commit into from
Mar 1, 2025
Merged

[DeepSeek] Enable checkpoint load from HF #908

merged 1 commit into from
Mar 1, 2025

Conversation

kwen2501
Copy link
Contributor

@kwen2501 kwen2501 commented Mar 1, 2025

Enable loading weights from HF checkpoint.

  1. Download
  • Added download.py to allow user download a HF checkpoint into local disk cache.
  • Usage: python download.py {model_id}
  1. Load weights
  • Added checkpoint.py to load tensors from HF cache dir into a model.
  • The model can be a model chunk or full model.
  1. Various code refactor
  • Moved ModelArgs into a separate file, adding DeepSeek config registry
  1. Added support for DeepSeek-V2
  • Greedy routing
  • Softmax score function
  1. Added example run.py based on DeepSeek-V2-Lite, a 16B toy MoE.
    torchrun --standalone --nproc-per-node 4 run.py

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 1, 2025
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

# torchrun --standalone --nproc-per-node 4 run.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this file named 'run' instead of 'train' because it's not supposed to train? It looks like it is using a training oriented pipeline-schedule (not fwd-only) but missing a loss function etc. Maybe its just a temporary file for bootstrapping...

Copy link
Contributor Author

@kwen2501 kwen2501 Mar 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a temp file for bootstrapping. And it will diverge into generate.py and train.py later.

@@ -0,0 +1,183 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: model_args.py? since ModelArgs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think both work. My purpose of creating this file is to store configs of different DeepSeek flavors, like the DeepSeek-V2-Lite in it.


if pp_rank == pp_size - 1:
print(y.shape)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a simple "Success - forward completed" or similar print just to show the forward completed properly?

Copy link
Contributor

@lessw2020 lessw2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - also nice to see the Lite version being supported as that will be better for single node work.
Also the SPMD weight loading logic is superb... :) Happy to see we are getting some re-use out of that previous work.

@kwen2501 kwen2501 merged commit b291ad6 into main Mar 1, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants