-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DeepSeek] Enable checkpoint load from HF #908
Conversation
# This source code is licensed under the BSD-style license found in the | ||
# LICENSE file in the root directory of this source tree. | ||
|
||
# torchrun --standalone --nproc-per-node 4 run.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this file named 'run' instead of 'train' because it's not supposed to train? It looks like it is using a training oriented pipeline-schedule (not fwd-only) but missing a loss function etc. Maybe its just a temporary file for bootstrapping...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a temp file for bootstrapping. And it will diverge into generate.py
and train.py
later.
@@ -0,0 +1,183 @@ | |||
# Copyright (c) Meta Platforms, Inc. and affiliates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: model_args.py? since ModelArgs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think both work. My purpose of creating this file is to store configs of different DeepSeek flavors, like the DeepSeek-V2-Lite
in it.
|
||
if pp_rank == pp_size - 1: | ||
print(y.shape) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a simple "Success - forward completed" or similar print just to show the forward completed properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great - also nice to see the Lite version being supported as that will be better for single node work.
Also the SPMD weight loading logic is superb... :) Happy to see we are getting some re-use out of that previous work.
Enable loading weights from HF checkpoint.
download.py
to allow user download a HF checkpoint into local disk cache.python download.py {model_id}
checkpoint.py
to load tensors from HF cache dir into a model.ModelArgs
into a separate file, adding DeepSeek config registrytorchrun --standalone --nproc-per-node 4 run.py