[DeepSeek] Enable checkpoint load from HF #908

kwen2501 · 2025-03-01T01:27:21Z

Enable loading weights from HF checkpoint.

Download

Added download.py to allow user download a HF checkpoint into local disk cache.
Usage: python download.py {model_id}

Load weights

Added checkpoint.py to load tensors from HF cache dir into a model.
The model can be a model chunk or full model.

Various code refactor

Moved ModelArgs into a separate file, adding DeepSeek config registry

Added support for DeepSeek-V2

Greedy routing
Softmax score function

Added example run.py based on DeepSeek-V2-Lite, a 16B toy MoE.
torchrun --standalone --nproc-per-node 4 run.py

wconstab · 2025-03-01T01:57:34Z

torchtitan/experiments/deepseek_v3/run.py

+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+# torchrun --standalone --nproc-per-node 4 run.py


is this file named 'run' instead of 'train' because it's not supposed to train? It looks like it is using a training oriented pipeline-schedule (not fwd-only) but missing a loss function etc. Maybe its just a temporary file for bootstrapping...

It is a temp file for bootstrapping. And it will diverge into generate.py and train.py later.

wconstab · 2025-03-01T01:58:53Z

torchtitan/experiments/deepseek_v3/model_config.py

@@ -0,0 +1,183 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.


nit: model_args.py? since ModelArgs

Hmm, I think both work. My purpose of creating this file is to store configs of different DeepSeek flavors, like the DeepSeek-V2-Lite in it.

lessw2020 · 2025-03-01T02:30:35Z

torchtitan/experiments/deepseek_v3/run.py

+
+    if pp_rank == pp_size - 1:
+        print(y.shape)
+


maybe a simple "Success - forward completed" or similar print just to show the forward completed properly?

lessw2020

Looks great - also nice to see the Lite version being supported as that will be better for single node work.
Also the SPMD weight loading logic is superb... :) Happy to see we are getting some re-use out of that previous work.

kwen2501 requested review from fegin, wconstab, lessw2020 and tianyu-l March 1, 2025 01:27

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 1, 2025

kwen2501 force-pushed the ds_load_hf branch from 45a4116 to 14a2a00 Compare March 1, 2025 01:35

Enable checkpoint load from HF

03d3d57

kwen2501 force-pushed the ds_load_hf branch from 14a2a00 to 03d3d57 Compare March 1, 2025 01:39

wconstab approved these changes Mar 1, 2025

View reviewed changes

lessw2020 reviewed Mar 1, 2025

View reviewed changes

lessw2020 approved these changes Mar 1, 2025

View reviewed changes

kwen2501 merged commit b291ad6 into main Mar 1, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DeepSeek] Enable checkpoint load from HF #908

[DeepSeek] Enable checkpoint load from HF #908

kwen2501 commented Mar 1, 2025 •

edited

Loading

wconstab Mar 1, 2025

kwen2501 Mar 1, 2025 •

edited

Loading

wconstab Mar 1, 2025

kwen2501 Mar 1, 2025

lessw2020 Mar 1, 2025

lessw2020 left a comment

		@@ -0,0 +1,183 @@
		# Copyright (c) Meta Platforms, Inc. and affiliates.

[DeepSeek] Enable checkpoint load from HF #908

[DeepSeek] Enable checkpoint load from HF #908

Conversation

kwen2501 commented Mar 1, 2025 • edited Loading

wconstab Mar 1, 2025

Choose a reason for hiding this comment

kwen2501 Mar 1, 2025 • edited Loading

Choose a reason for hiding this comment

wconstab Mar 1, 2025

Choose a reason for hiding this comment

kwen2501 Mar 1, 2025

Choose a reason for hiding this comment

lessw2020 Mar 1, 2025

Choose a reason for hiding this comment

lessw2020 left a comment

Choose a reason for hiding this comment

kwen2501 commented Mar 1, 2025 •

edited

Loading

kwen2501 Mar 1, 2025 •

edited

Loading