Skip to content

Latest commit

 

History

History
61 lines (46 loc) · 2.12 KB

README.md

File metadata and controls

61 lines (46 loc) · 2.12 KB

Vision Language Model Chain-of-thought Reasoning and Reward

This is an unofficial repo for the paper: Improve Vision Language Model Chain-of-thought Reasoning

Release

  • [12/24 - 01/25] sft, dpo pipeline, distill gpt, inference + eval.
  • [10.22] we will provide third party implementation for arxiv paper

Dataset

ShareGPT4o-reasoning 193k cot prediction + filtered direct prediction ShareGPT4o-reasoning-dpo 66k DPO data on 3 domains: aokvqa, math and chartqa

Model ckpt

Open-LLaVA-NeXT: same as /~https://github.com/xiaoachen98/Open-LLaVA-NeXT, used as our base model

LLaVA-Reasoner-SFT-preview: SFT with direct + CoT

LLaVA-Reasoner-SFT: SFT with direct + CoT (additional math than above)

LLaVA-Reasoner-DPO-preview: DPO from SFT-preview

setup

# setup environment, need to fill in the required fields
source setup/setup_env.sh

# data
source setup/setup_train_data.sh 

sft

cd llava_reasoner
bash scripts_sft/sft_direct+cot_preview.sh \
$SAVE_DIR/sft/LLaVA-Reasoner-SFT-preview

dpo

cd llava_reasoner
bash scripts_dpo/dpo_llava_reasoner_preview.sh \
$SAVE_DIR/dpo/LLaVA-Reasoner-DPO-preview

citation

@article{zhang2024improve,
  title={Improve vision language model chain-of-thought reasoning},
  author={Zhang, Ruohong and Zhang, Bowen and Li, Yanghao and Zhang, Haotian and Sun, Zhiqing and Gan, Zhe and Yang, Yinfei and Pang, Ruoming and Yang, Yiming},
  journal={arXiv preprint arXiv:2410.16198},
  year={2024}
}

Acknowledge

Thanks to

(open-llava-next)[/~https://github.com/xiaoachen98/Open-LLaVA-NeXT]: for base model and sft training

(LLaVA-Hound)[/~https://github.com/RifleZhang/LLaVA-Hound-DPO/tree/main]: for dpo related