-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multitask training cost calculation is wrong (does not affect training) #1757
Comments
It is better to print all the cost, such as cost_a, cost_b. It helps to analyze the problem. |
trainer.train中的cost不应该接受多个cost,而只应该接受一个cost。 因为:
对于多个cost组合的情况,可以使用 cost_a = paddle.layer.xe(name="cost_a", ...)
cost_b = paddle.layer.xe(...)
cost_sum = paddle.layer.add_to(cost_a, cost_b) # here, we could override '+' operator, just write `cost_sum = 0.3 * cost_a + 0.5 * cost_b` could be better.
paddle.train(cost=cost_sum, ...) 对于需要print某一个cost的话,可以 cost_a = paddle.layer.xe(...)
paddle.evaluators.print(cost_a)
... 这样做的好处是:
这样做的问题是: Paddle目前不支持这个风格的cost设计。应该把Paddle的CostLayer这个概念全部去掉。所有的Layer都是正常的Layer |
cost层的backward处理和普通层不一样,所以不做任何代码修改是否可行需要验证下。
|
@reyoung 同意,我也觉着这样更加简单、清晰。非常赞同:
感觉 cost_sum = paddle.layer.sum_scalar((cost_a, 5), cost_b) |
还有一个问题,如果是sum_cost 这种求和的方式,日志中只输出求和之后的cost 。
|
paddle 里 add_to 这样的layer 是没办法作为最后一层的吧,需要重新写一个 add_cost 的layer 吗? add_to 需要有其他 layer 来回传梯度,然后把梯度等量回传给每一个input,只有 cost layer 会产生 error,现在的 add_to layer 应该不能直接相加cost 吧。 |
其实,可以重载操作符,真正让用户写的东西可以是 paddle.train(cost=0.5*cost_1 + 0.3*cost_2, ...) |
我觉得这是Paddle的问题。 其实神经网络应该可以优化任何一个标量,只要最后一层是标量就可以优化。 我想,Paddle的实现里不应该有Cost Layer这个概念。只要最后一层是标量,就可以正确的backward |
如果真的要输出,可以接上print evaluator。如果某一个Layer一定要输出,可以默认就接上PrintEvaluator。 |
这个现在还是需要自己接上是么? |
* Update QUICK_STARTED_cn.md (PaddlePaddle#1757) fix typo * cherry-pick PaddlePaddle#1757 and PaddlePaddle#1473, test=document_fix
One can do multitask training of
cost = e0*cost_a + e1*cost_b
by:This will have correct result in training (backward pass), but when calculating the cost in forward pass the coefficients
e0
ande1
is not used. So the cost calculation is wrong.The text was updated successfully, but these errors were encountered: