Llama models with custom configurations and uploading to Hugging Face #420

bkchang · 2024-06-24T22:39:00Z

It will be great if torchtitan could support 1) training llama models with custom configurations (like different number of kv heads, number of layers, etc.) and 2) direct uploading of the trained weights to HF hub, where people can download and run the model by simply referencing the HF model repo id. These supports will greatly help the community to investigate the tradeoff between size, speed, and accuracy of a range of models.

Currently in torchtitan, only a fixed set of classic llama model architectures are allowed, and they are hard-coded here and here. Enabling custom inputs of model parameters in config files and feed it to ModelArgs should be straightforward, maybe with a script or a helper function.
For uploading to HF hub, a script from HF could help converting torchtitan's output weights to HF format (thanks @tianyu-l for mentioning this), but a params.json file and a tokenizer.model file are needed for the script. tokenizer.model is redownloaded before running torchtitan, so it only needs to be linked. On the other hand, params.json can be easily written by inspecting the training config.

I can help implement these features, but am wondering if the torchtitan team would be interested in having these features in the torchtitan repo?

Thanks.

tianyu-l · 2024-06-25T23:33:16Z

@bkchang Thanks for the note. The use case sounds very interesting!

For 2, a simple script should suffice. I agree it would be good to have one, although whether or not we should land it in (the main branch of) torchtitan is something we need to discuss further. (I can report back once we have a better idea.)

To summarize, out of the three popular formats torch.save, DCP (torchtitan default), HF, currently we have

DCP -> torch (in PyTorch, see /~https://github.com/pytorch/torchtitan/blob/main/docs/checkpoint.md)
torch -> HF (from HF, although missing params.json if saved from DCP)
HF -> DCP (thanks to reload existing llama checkpoints #305 (comment))
torch -> DCP (in scripts/convert_llama_to_dcp.py)

Ideally we should also have

DCP -> HF (this probably should sit in HF instead of PyTorch/torchtitan)

For 1, it is trickier. The goal of torchtitan is to demonstrate distributed training technology rather than becoming a trainer. The preset configs are more for showcasing purposes. Besides, the training loss convergence has been verified for each of the config files (including the hyperparameters). Although we might not want to directly support this feature, it should be fairly easy for users to add new configs.

tianyu-l added the enhancement New feature or request label Jul 3, 2024

tianyu-l mentioned this issue Dec 3, 2024

Vote on new features in Discussions #694

Open

tianyu-l mentioned this issue Mar 2, 2025

[Possible PR discuss] Will a PR of training HF model be welcomed? #903

Open

tianyu-l added huggingface integration community help wanted and removed enhancement New feature or request labels Mar 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama models with custom configurations and uploading to Hugging Face #420

Llama models with custom configurations and uploading to Hugging Face #420

bkchang commented Jun 24, 2024 •

edited

Loading

tianyu-l commented Jun 25, 2024 •

edited

Loading

Llama models with custom configurations and uploading to Hugging Face #420

Llama models with custom configurations and uploading to Hugging Face #420

Comments

bkchang commented Jun 24, 2024 • edited Loading

tianyu-l commented Jun 25, 2024 • edited Loading

bkchang commented Jun 24, 2024 •

edited

Loading

tianyu-l commented Jun 25, 2024 •

edited

Loading