Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Possible PR discuss] Will a PR of training HF model be welcomed? #903

Open
junjzhang opened this issue Feb 28, 2025 · 4 comments
Open

[Possible PR discuss] Will a PR of training HF model be welcomed? #903

junjzhang opened this issue Feb 28, 2025 · 4 comments

Comments

@junjzhang
Copy link

junjzhang commented Feb 28, 2025

Hi! We are in the process of developing a novel training framework for Reinforcement Learning (RL) following TorchTitan. Recently, we've developed a feature to support direct training from Hugging Face (HF) models and the loading safetensors in online sharded fashion. This may substantially cuts down the cost of adapting a new model. All you have to do is implement the parallelism applying function.
Given this, I wonder whether a PR with the relevant code and a training example for training Hugging Face's Llama model is welcomed. I think this addition will be of great benefit to many in the community.
By the way, during my testing, I found that the HF Llama model demonstrates competitive TPS when compared to the model implemented in TorchTitan.

@lessw2020
Copy link
Contributor

lessw2020 commented Feb 28, 2025

Hi @junjzhang - I can only speak my opinion, but generically anything that helps Titan enable RL type training would be of significant interest.
We are also opening up a new "experimental" folder with the idea of enabling more contributions to have a home as well ... so that's another angle that may help your PR to land. The first PR landing in there currently also uses HF aspects for reference (see /~https://github.com/pytorch/torchtitan/blob/main/torchtitan/experiments/deepseek_v3/attn_mask_utils.py).

Thus, while I don't think anyone can say an unseen PR will 100% be accepted, I can say it would definitely be of interest, and I think it would be worth the effort to post the PR so it can be reviewed/discussed/considered for inclusion.
Thanks very much for opening up the discussion!
Maybe @tianyu-l can weigh in here as well.

@junjzhang
Copy link
Author

Hi @junjzhang - I can only speak my opinion, but generically anything that helps Titan enable RL type training would be of significant interest. We are also opening up a new "experimental" folder with the idea of enabling more contributions to have a home as well ... so that's another angle that may help your PR to land. The first PR landing in there currently also uses HF aspects for reference (see /~https://github.com/pytorch/torchtitan/blob/main/torchtitan/experiments/deepseek_v3/attn_mask_utils.py).

Thus, while I don't think anyone can say an unseen PR will 100% be accepted, I can say it would definitely be of interest, and I think it would be worth the effort to post the PR so it can be reviewed/discussed/considered for inclusion. Thanks very much for opening up the discussion! Maybe @tianyu-l can weigh in here as well.

Thanks for replying! I thought I could clean up my code and make a draft pr to experiments dir first!

@tianyu-l
Copy link
Contributor

tianyu-l commented Mar 2, 2025

Hey @junjzhang thanks for proposing! We agree this feature is good to have.

As @lessw2020 suggested, let's create new folder hosting HF training under the experiments folder:

  1. load HF model weights
  2. showcase an example of training by "implement the parallelism applying function", and reusing TrainSpec
  3. support converting weights back to HF formats

Relevant discussions:

Maybe we can work with other people who've shown interests and made offline progresses, on this project.
cc: @yzhangcs @neeldani @huyiwen @bkchang

@junjzhang
Copy link
Author

Hey @junjzhang thanks for proposing! We agree this feature is good to have.

As @lessw2020 suggested, let's create new folder hosting HF training under the experiments folder:

  1. load HF model weights
  2. showcase an example of training by "implement the parallelism applying function", and reusing TrainSpec
  3. support converting weights back to HF formats

Relevant discussions:

Maybe we can work with other people who've shown interests and made offline progresses, on this project. cc: @yzhangcs @neeldani @huyiwen @bkchang

I've finished features 1 and 2. And I think you can easily implement feature 3 by reusing PretrainedModel's save_model weights. I'll try to clean up the relative codes and pull a PR this week. BTW, this feature will introduce extra requirements like transformers. How would you expect this to be handled in the experiment dir?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants