Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

关于PPO训练梯度累计及学习率衰减的问题 #299

Closed
mmbwf opened this issue Jul 16, 2023 · 2 comments
Closed

关于PPO训练梯度累计及学习率衰减的问题 #299

mmbwf opened this issue Jul 16, 2023 · 2 comments
Labels
solved This problem has been already solved.

Comments

@mmbwf
Copy link

mmbwf commented Jul 16, 2023

PPO训练代码里,外层手动做了一下梯度累计的循环,梯度累计和学习率衰减的具体操作在TRL库的代码里,这里有几个疑惑的地方。

1、TRL库从0.4.5版本后,在梯度累计部分增加了梯度清零的代码:
/~https://github.com/lvwerra/trl/blob/388bdc03ac40a42dfb77dbbc416b31a3d076b18e/trl/trainer/ppo_trainer.py#L685
按照ppo_config的默认设置,在ppo_epoch=1,batch_size=mini_batch_size=training_args.per_device_train_batch_size的情况下,梯度累计是不是不生效,之前的梯度都清零了,只有最后一次反向传播的梯度保留了。

2、学习率衰减上,TRL库的学习率衰减操作是在这里进行的:/~https://github.com/lvwerra/trl/blob/388bdc03ac40a42dfb77dbbc416b31a3d076b18e/trl/trainer/ppo_trainer.py#L746
但是在ppo的workflow.py代码里,训练步的计算是考虑了梯度累计的,这样会不会导致训练步对不上。实际在梯度累计的时候,每次循环都进行了学习率衰减。

@hiyouga hiyouga added the pending This problem is yet to be addressed. label Jul 17, 2023
hiyouga added a commit that referenced this issue Jul 17, 2023
@hiyouga
Copy link
Owner

hiyouga commented Jul 17, 2023

感谢指出问题,上述的两个问题在 7e6a5eb 中均得到了修复,我们去掉了外层的梯度累积循环,确保了梯度累积和学习率衰减的正确性。

@mmbwf mmbwf closed this as completed Jul 18, 2023
@hiyouga hiyouga added solved This problem has been already solved. and removed pending This problem is yet to be addressed. labels Jul 18, 2023
@hannlp
Copy link

hannlp commented Aug 9, 2023

@mmbwf 您好,现在学习率曲线正常了吗?我这儿目前还是有问题,hiyouga/LLaMA-Factory#424

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
solved This problem has been already solved.
Projects
None yet
Development

No branches or pull requests

3 participants