Releases: hiyouga/LLaMA-Factory
Releases · hiyouga/LLaMA-Factory
v0.9.1: Many Vision Models, Qwen2.5 Coder, Gradient Fix
New features
- 🔥Support Llama-3.2 and Llama-3.2-Vision by @marko1616 in #5547 and #5555
- 🔥Support LLaVA-NeXT, LLaVA-NeXT-Video and Video-LLaVA by @BUAADreamer in #5574
- 🔥Support Pixtral model by @Kuangdd01 in #5581
- Support EXAONE3.0 by @shing100 in #5585
- Support Index-series models by @Cuiyn in #5910
- Support Liger-Kernel for Qwen2-VL by @aliencaocao in #5438
- Support download models from ModelHub by @huniu20 in #5642
- Fix abnormal loss values in transformers 4.46 by @hiyouga in #5852 #5871
- Support multi-image inference by @hiyouga in #5895
- Support calculating effective tokens for SFT and DPO by @wtmlon in #6078
Note: now you can install transformers>=4.46.0,<=4.46.1
to make the gradient accumulation fix enabled.
New models
- Base models
- Qwen2.5 (0.5B/1.5B/3B/7B/14B/32B/72B) 📄
- Qwen2.5-Coder (0.5B/1.5B/3B/7B/14B/32B) 📄🖥️
- Llama-3.2 (1B/3B) 📄
- OpenCoder (1.5B/8B) 📄🖥️
- Index (1.9B) 📄
- Instruct/Chat models
- Qwen2.5-Instruct (0.5B/1.5B/3B/7B/14B/32B/72B) 📄🤖
- Qwen2.5-Coder-Instruct (0.5B/1.5B/3B/7B/14B/32B) 📄🤖🖥️
- Llama-3.2-Instruct (1B/3B) 📄🤖
- OpenCoder-Instruct (1.5B/8B) 📄🤖🖥️
- Index-Chat (1.9B) 📄🤖
- LLaVA-NeXT (7B/8B/13B/34B/72B/110B) 📄🤖🖼️
- LLaVA-NeXT-Video (7B/34B) 📄🤖🖼️
- Video-LLaVA (7B) 📄🤖🖼️
- Pixtral (12B) 📄🤖🖼️
- EXAONE-3.0-Instruct (8B) 📄🤖
Security fix
- Fix CVE-2024-52803 by @superboy-zjc in aa6a174
Bug fix
- Update version of rocm docker by @HardAndHeavy in #5427
- Fix Phi-3-small template by @menibrief in #5475
- Fix function call dataset process function by @whybeyoung in #5483
- Add docker args by @StrangeBytesDev in #5533
- Fix logger by @chengchengpei in #5546
- Fix Gemma2 flash attention warning by @amrear in #5580
- Update setup by @johnnynunez in #5615 #5665
- Add project by @NLPJCL in #5801
- Fix saving Qwen2-VL processor by @hiyouga in #5857
- Support change base image in dockerfile by @sd3ntato in #5880
- Fix template replace behaviour by @hiyouga in #5907
- Add
image_dir
argument by @hiyouga in #5909 - Add rank0 logger by @hiyouga in #5912
- Fix DPO metrics by @hiyouga in #5913 #6052
- Update datasets version by @hiyouga in #5926
- Fix chat engines by @hiyouga in #5927
- Fix vllm 0.6.3 by @hiyouga in #5970
- Fix extra args in llamaboard by @hiyouga in #5971
- Fix vllm input args by @JJJJerry in #5973
- Add
vllm_config
args by @hiyouga in #5982 #5990 - Add shm_size in docker compose config by @XYZliang in #6010
- Fix tyro version by @hiyouga in #6065
- Fix ci by @hiyouga in #6120
- Fix Qwen2-VL inference on vLLM by @hiyouga in #6123 #6126
- Release v0.9.1 by @hiyouga in #6124
- Fix #3881 #4712 #5411 #5542 #5549 #5611 #5668 #5705 #5747 #5749 #5768 #5796 #5797 #5883 #5904 #5966 #5988 #6050 #6061
Full Changelog: v0.9.0...v0.9.1
v0.9.0: Qwen2-VL, Liger-Kernel, Adam-mini
Congratulations on 30,000 stars 🎉 Follow us at X (twitter)
New features
- 🔥Support fine-tuning Qwen2-VL model on multi-image datasets by @simonJJJ in #5290
- 🔥Support time&memory-efficient Liger-Kernel via the
enable_liger_kernel
argument by @hiyouga - 🔥Support memory-efficient Adam-mini optimizer via the
use_adam_mini
argument by @relic-yuexi in #5095 - Support fine-tuning Qwen2-VL model on video datasets by @hiyouga in #5365 and @BUAADreamer in #4136 (needs patch huggingface/transformers#33307)
- Support fine-tuning vision language models (VLMs) using RLHF/DPO/ORPO/SimPO approaches by @hiyouga
- Support Unsloth's asynchronous activation offloading method via the
use_unsloth_gc
argument - Support vLLM 0.6.0 version
- Support MFU calculation by @yzoaim in #5388
New models
- Base models
- Qwen2-Math (1.5B/7B/72B) 📄🔢
- Yi-Coder (1.5B/9B) 📄🖥️
- InternLM2.5 (1.8B/7B/20B) 📄
- Gemma-2-2B 📄
- Meta-Llama-3.1 (8B/70B) 📄
- Instruct/Chat models
- MiniCPM/MiniCPM3 (1B/2B/4B) by @LDLINGLINGLING in #4996 #5372 📄🤖
- Qwen2-Math-Instruct (1.5B/7B/72B) 📄🤖🔢
- Yi-Coder-Chat (1.5B/9B) 📄🤖🖥️
- InternLM2.5-Chat (1.8B/7B/20B) 📄🤖
- Qwen2-VL-Instruct (2B/7B) 📄🤖🖼️
- Gemma-2-2B-it by @codemayq in #5037 📄🤖
- Meta-Llama-3.1-Instruct (8B/70B) 📄🤖
- Mistral-Nemo-Instruct (12B) 📄🤖
New datasets
- Supervised fine-tuning datasets
- Magpie-ultra-v0.1 (en) 📄
- Pokemon-gpt4o-captions (en&zh) 📄🖼️
- Preference datasets
- RLHF-V (en) 📄🖼️
- VLFeedback (en) 📄🖼️
Changes
- Due to compatibility consideration, fine-tuning vision language models (VLMs) requires
transformers>=4.35.0.dev0
, trypip install git+/~https://github.com/huggingface/transformers.git
to install it. visual_inputs
has been deprecated, now you do not need to specify this argument.- LlamaFactory now adopts lazy loading for multimodal inputs, see #5346 for details. Please use
preprocessing_batch_size
to restrict the batch size in dataset pre-processing (supported by @naem1023 in #5323 ). - LlamaFactory now supports
lmf
(equivalent tollamafactory-cli
) as a shortcut command.
Bug fix
- Fix LlamaBoard export by @liuwwang in #4950
- Add ROCm dockerfiles by @HardAndHeavy in #4970
- Fix deepseek template by @piamo in #4892
- Fix pissa savecallback by @codemayq in #4995
- Add Korean display language in LlamaBoard by @Eruly in #5010
- Fix deepseekcoder template by @relic-yuexi in #5072
- Fix examples by @codemayq in #5109
- Fix
mask_history
truncate from last by @YeQiuO in #5115 - Fix jinja template by @YeQiuO in #5156
- Fix PPO optimizer and lr scheduler by @liu-zichen in #5163
- Add SailorLLM template by @chenhuiyu in #5185
- Fix XPU device count by @Zxilly in #5188
- Fix bf16 check in NPU by @Ricardo-L-C in #5193
- Update NPU docker image by @MengqingCao in #5230
- Fix image input api by @marko1616 in #5237
- Add liger-kernel link by @ByronHsu in #5317
- Fix #4684 #4696 #4917 #4925 #4928 #4944 #4959 #4992 #5035 #5048 #5060 #5092 #5228 #5252 #5292 #5295 #5305 #5307 #5308 #5324 #5331 #5334 #5338 #5344 #5366 #5384
v0.8.3: Neat Packing, Split Evaluation
New features
- 🔥Support contamination-free packing via the
neat_packing
argument by @chuan298 in #4224 - 🔥Support split evaluation via the
eval_dataset
argument by @codemayq in #4691 - 🔥Support HQQ/EETQ quantization via the
quantization_method
argument by @hiyouga - 🔥Support ZeRO-3 when using BAdam by @Ledzy in #4352
- Support train on the last turn via the
mask_history
argument by @aofengdaxia in #4878 - Add NPU Dockerfile by @MengqingCao in #4355
- Support building FlashAttention2 in Dockerfile by @hzhaoy in #4461
- Support
batch_eval_metrics
at evaluation by @hiyouga
New models
- Base models
- InternLM2.5-7B 📄
- Gemma2 (9B/27B) 📄
- Instruct/Chat models
Changes
- Fix DPO cutoff len and deprecate
reserved_label_len
argument - Improve loss function for reward modeling
Bug fix
- Fix numpy version by @MengqingCao in #4382
- Improve cli by @kno10 in #4409
- Add
tool_format
parameter to control prompt by @mMrBun in #4417 - Automatically label npu issue by @MengqingCao in #4445
- Fix flash_attn args by @stceum in #4446
- Fix docker-compose path by @MengqingCao in #4544
- Fix torch-npu dependency by @hashstone in #4561
- Fix deepspeed + pissa by @hzhaoy in #4580
- Improve cli by @injet-zhou in #4590
- Add project by @wzh1994 in #4662
- Fix docstring by @hzhaoy in #4673
- Fix Windows command preview in WebUI by @marko1616 in #4700
- Fix vllm 0.5.1 by @T-Atlas in #4706
- Fix save value head model callback by @yzoaim in #4746
- Fix CUDA Dockerfile by @hzhaoy in #4781
- Fix examples by @codemayq in #4804
- Fix evaluation data split by @codemayq in #4821
- Fix CI by @codemayq in #4822
- Fix #2290 #3974 #4113 #4379 #4398 #4402 #4410 #4419 #4432 #4456 #4458 #4549 #4556 #4579 #4592 #4609 #4617 #4674 #4677 #4683 #4684 #4699 #4705 #4731 #4742 #4779 #4780 #4786 #4792 #4820 #4826
v0.8.2: PiSSA, Parallel Functions
New features
- Support GLM-4 tools and parallel function calling by @mMrBun in #4173
- Support PiSSA fine-tuning by @hiyouga in #4307
New models
- Base models
- DeepSeek-Coder-V2 (16B MoE/236B MoE) 📄
- Instruct/Chat models
- MiniCPM-2B 📄🤖
- DeepSeek-Coder-V2-Instruct (16B MoE/236B MoE) 📄🤖
New datasets
- Supervised fine-tuning datasets
- Neo-sft (zh)
- Magpie-Pro-300K-Filtered (en) by @EliMCosta in #4309
- WebInstruct (en) by @EliMCosta in #4309
Bug fix
- Fix DPO+ZeRO3 problem by @hiyouga
- Add MANIFEST.in by @iamthebot in #4191
- Fix eos_token in llama3 pretrain by @dignfei in #4204
- Fix vllm version by @kimdwkimdw and @hzhaoy in #4234 and #4246
- Fix Dockerfile by @EliMCosta in #4314
- Fix pandas version by @zzxzz12345 in #4334
- Fix #3162 #3196 #3778 #4198 #4209 #4221 #4227 #4238 #4242 #4271 #4292 #4295 #4326 #4346 #4357 #4362
v0.8.1: Patch release
v0.8.0: GLM-4, Qwen2, PaliGemma, KTO, SimPO
Stronger LlamaBoard 💪😀
- Support single-node distributed training in Web UI
- Add dropdown menu for easily resuming from checkpoints and picking saved configurations by @hiyouga and @hzhaoy in #4053
- Support selecting checkpoints of full/freeze tuning
- Add throughput metrics to LlamaBoard by @injet-zhou in #4066
- Faster UI loading
New features
- Add KTO algorithm by @enji-zhou in #3785
- Add SimPO algorithm by @hiyouga
- Support passing
max_lora_rank
to the vLLM backend by @jue-jue-zi in #3794 - Support preference datasets in sharegpt format and remove big files from git repo by @hiyouga in #3799
- Support setting system messages in CLI inference by @ycjcl868 in #3812
- Add
num_samples
option indataset_info.json
by @seanzhang-zhichen in #3829 - Add NPU docker image by @dongdongqiang2018 in #3876
- Improve NPU document by @MengqingCao in #3930
- Support SFT packing with greedy knapsack algorithm by @AlongWY in #4009
- Add
llamafactory-cli env
for bug report - Support image input in the API mode
- Support random initialization via the
train_from_scratch
argument - Initialize CI
New models
- Base models
- Qwen2 (0.5B/1.5B/7B/72B/MoE) 📄
- PaliGemma-3B (pt/mix) 📄🖼️
- GLM-4-9B 📄
- Falcon-11B 📄
- DeepSeek-V2-Lite (16B) 📄
- Instruct/Chat models
New datasets
- Pre-training datasets
- FineWeb (en)
- FineWeb-Edu (en)
- Supervised fine-tuning datasets
- Ruozhiba-GPT4 (zh)
- STEM-Instruction (zh)
- Preference datasets
- Argilla-KTO-mix-15K (en)
- UltraFeedback (en)
Bug fix
- Fix RLHF for multimodal finetuning
- Fix LoRA target in multimodal finetuning by @BUAADreamer in #3835
- Fix
yi
template by @Yimi81 in #3925 - Fix abort issue in LlamaBoard by @injet-zhou in #3987
- Pass
scheduler_specific_kwargs
toget_scheduler
by @Uminosachi in #4006 - Fix hyperparameters helps by @xu-song in #4007
- Update issue template by @statelesshz in #4011
- Fix vllm dtype parameter
- Fix exporting hyperparameters by @MengqingCao in #4080
- Fix DeepSpeed ZeRO3 in PPO trainer
- Fix #3108 #3387 #3646 #3717 #3764 #3769 #3803 #3807 #3818 #3837 #3847 #3853 #3873 #3900 #3931 #3965 #3971 #3978 #3992 #4005 #4012 #4013 #4022 #4033 #4043 #4061 #4075 #4077 #4079 #4085 #4090 #4120 #4132 #4137 #4139
v0.7.1: Ascend NPU Support, Yi-VL Models
🚨🚨 Core refactor 🚨🚨
- Add CLIs usage, now we recommend using
llamafactory-cli
to launch training and inference, the entry point is located at the cli.py - Rename files:
train_bash.py
->train.py
,train_web.py
->webui.py
,api_demo.py
->api.py
- Remove files:
cli_demo.py
,evaluate.py
,export_model.py
,web_demo.py
, usellamafactory-cli chat/eval/export/webchat
instead - Use YAML configs in examples instead of shell scripts for a pretty view
- Remove the sha1 hash check when loading datasets
- Rename arguments:
num_layer_trainable
->freeze_trainable_layers
,name_module_trainable
->freeze_trainable_modules
The above changes are made by @hiyouga in #3596
REMINDER: Now installation is mandatory to use LLaMA Factory
New features
- Support training and inference on the Ascend NPU 910 devices by @zhou-wjjw and @statelesshz (docker images are also provided)
- Support
stop
parameter in vLLM engine by @zhaonx in #3527 - Support fine-tuning token embeddings in freeze tuning via the
freeze_extra_modules
argument - Add Llama3 quickstart to readme
New models
- Base models
- Yi-1.5 (6B/9B/34B) 📄
- DeepSeek-V2 (236B) 📄
- Instruct/Chat models
- Yi-1.5-Chat (6B/9B/34B) 📄🤖
- Yi-VL-Chat (6B/34B) by @BUAADreamer in #3748 📄🖼️🤖
- Llama3-Chinese-Chat (8B/70B) 📄🤖
- DeepSeek-V2-Chat (236B) 📄🤖
Bug fix
- Add badam arguments to LlamaBoard by @codemayq in #3487
- Add openai data format to readme by @khazic in #3490
- Fix slow operation in dpo/orpo trainer by @hiyouga
- Fix badam examples by @pha123661 in #3578
- Fix download link of the nectar_rm dataset by @ZeyuTeng96 in #3588
- Add project by @Katehuuh in #3601
- Fix dockerfile by @gaussian8 in #3604
- Fix full tuning of MLLMs by @BUAADreamer in #3651
- Fix gradio environment variables by @cocktailpeanut in #3654
- Fix typo and add log in API by @Tendo33 in #3655
- Fix download link of the phi-3 model by @YUUUCC in #3683
- Fix #3559 #3560 #3602 #3603 #3606 #3625 #3650 #3658 #3674 #3694 #3702 #3724 #3728
v0.7.0: LLaVA Multimodal LLM Support
Congratulations on 20k stars 🎉 We are the 1st of the GitHub Trending at Apr. 23rd 🔥 Follow us at X
New features
- Support SFT/PPO/DPO/ORPO for the LLaVA-1.5 model by @BUAADreamer in #3450
- Support inferring the LLaVA-1.5 model with both native Transformers and vLLM by @hiyouga in #3454
- Support vLLM+LoRA inference for partial models (see support list)
- Support 2x faster generation of the QLoRA model based on UnslothAI's optimization
- Support adding new special tokens to the tokenizer via the
new_special_tokens
argument - Support choosing the device to merge LoRA in LlamaBoard via the
export_device
argument - Add a Colab notebook for getting into fine-tuning the Llama-3 model on a free T4 GPU
- Automatically enable SDPA attention and fast tokenizer for higher performance
New models
- Base models
- OLMo-1.7-7B
- Jamba-v0.1-51B
- Qwen1.5-110B
- DBRX-132B-Base
- Instruct/Chat models
- Phi-3-mini-3.8B-instruct (4k/128k)
- LLaVA-1.5-7B
- LLaVA-1.5-13B
- Qwen1.5-110B-Chat
- DBRX-132B-Instruct
New datasets
- Supervised fine-tuning datasets
- LLaVA mixed (en&zh) by @BUAADreamer in #3471
- Preference datasets
- DPO mixed (en&zh) by @hiyouga
Bug fix
v0.6.3: Llama-3 and 3x Longer QLoRA
New features
- Support Meta Llama-3 (8B/70B) models
- Support UnslothAI's long-context QLoRA optimization (56,000 context length for Llama-2 7B in 24GB)
- Support previewing local datasets in directories in LlamaBoard by @codemayq in #3291
New algorithms
New models
- Base models
- CodeGemma (2B/7B)
- CodeQwen1.5-7B
- Llama-3 (8B/70B)
- Mixtral-8x22B-v0.1
- Instruct/Chat models
- CodeGemma-7B-it
- CodeQwen1.5-7B-Chat
- Llama-3-Instruct (8B/70B)
- Command R (35B) by @marko1616 in #3254
- Command R+ (104B) by @marko1616 in #3254
- Mixtral-8x22B-Instruct-v0.1
Bug fix
- Fix full-tuning batch prediction examples by @khazic in #3261
- Fix output_router_logits of Mixtral by @liu-zichen in #3276
- Fix automodel from pretrained with attn implementation (see huggingface/transformers#30298)
- Fix unable to convergence issue in the layerwise galore optimizer (see huggingface/transformers#30371)
- Fix #3184 #3238 #3247 #3273 #3316 #3317 #3324 #3348 #3352 #3365 #3366
v0.6.2: ORPO and Qwen1.5-32B
New features
- Support ORPO algorithm by @hiyouga in #3066
- Support inferring BNB 4-bit models on multiple GPUs via the
quantization_device_map
argument - Reorganize README files, move example scripts to the
examples
folder - Support saving & loading arguments quickly in LlamaBoard by @hiyouga and @marko1616 in #3046
- Support load alpaca-format dataset from the hub without
dataset_info.json
by specifying--dataset_dir ONLINE
- Add a parameter
moe_aux_loss_coef
to control the coefficient of auxiliary loss in MoE models.
New models
- Base models
- Breeze-7B-Base
- Qwen1.5-MoE-A2.7B (14B)
- Qwen1.5-32B
- Instruct/Chat models
- Breeze-7B-Instruct
- Qwen1.5-MoE-A2.7B-Chat (14B)
- Qwen1.5-32B-Chat
Bug fix
- Fix pile dataset download config by @lealaxy in #3053
- Fix model generation config by @marko1616 in #3057
- Fix qwen1.5 models DPO training by @changingivan and @hiyouga in #3083
- Support Qwen1.5-32B by @sliderSun in #3160
- Support Breeze-7B by @codemayq in #3161
- Fix
addtional_target
in unsloth by @kno10 in #3201 - Fix #2807 #3022 #3023 #3046 #3077 #3085 #3116 #3200 #3225