Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reduce op #4086

Merged
merged 9 commits into from
Sep 28, 2017
Merged

Add reduce op #4086

merged 9 commits into from
Sep 28, 2017

Conversation

guoshengCS
Copy link
Contributor

Resolves #4060

ops::ReduceKernel<paddle::platform::CPUPlace, float, ops::MinFunctor>);
REGISTER_OP_CPU_KERNEL(reduce_min_grad,
ops::ReduceGradKernel<paddle::platform::CPUPlace, float,
ops::MaxOrMinGradFunctor>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我觉得将min,max等作为attr来写,注册一个kernel,这样cc文件会更加简短精炼。

目前ReduceMinOpMaker和ReduceMaxOpMaker等基本都是重复的。现在分开写了四个kernel,写成一个后,会省将近3/4的代码。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

抱歉啊,这个回复的晚了,这里参考了下tensorflow的做法 /~https://github.com/tensorflow/tensorflow/blob/216dcbf1e08c02d87774120ebd5b251c5c30c56c/tensorflow/core/kernels/reduction_ops_sum.cc#L26 ,另外pytorch中也有类似的reduce操作 /~https://github.com/pytorch/pytorch/blob/master/torch/autograd/_functions/reduce.py ,感觉可能分为多个OP在意义上更清楚一些,另外将min、max作为attr感觉会kernel部分的代码会比较长,后续可能也会加下其他的reduce操作,functor的话还有一个潜在好处是的可以复用目前的kernel不太容易复用,我也不能确定哪种会更好。多谢评论与思考建议~

break;
case 6:
ReduceCompute<6>(context);
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的6个case是什么意思呢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里对应到EigenTensor的几种rank,由于EigenTensor是用的template,这里显示写了出来

auto out = EigenTensor < T, D == 1 ? 1 : (D - 1) > ::From(*output, dims);
auto& place = context.GetEigenDevice<Place>();
Functor functor;
functor(place, x, out, reduce_dim);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果作为attr来写,这里也不用functor,用switch来做,29-86行中一半的代码都能省出来。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functor的话还有一个潜在好处是的可能比较容易复用,目前的kernel好像不太容易复用

ops::ReduceKernel<paddle::platform::GPUPlace, float, ops::MinFunctor>);
REGISTER_OP_GPU_KERNEL(reduce_min_grad,
ops::ReduceGradKernel<paddle::platform::GPUPlace, float,
ops::MaxOrMinGradFunctor>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前实现是,每个循环里调用一次eigen的gpu kernel,效率会比较慢。可以考虑单独写。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个不是特别什么意思,可以稍具体说明下么~

@qingqing01 qingqing01 requested review from QiJune and removed request for Superjomn September 18, 2017 08:33
Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有些代码可以省略掉

self.attrs = {'dim': 1}
self.outputs = {'Out': self.inputs['X'].mean(axis=self.attrs['dim'])}

def test_check_output(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

定义一个父类,然后TestMeanOp(xxxx),然后test_check_out等函数就都可以重用了。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的建议,多谢。这里reduce_max和reduce_min不进行梯度检测,几种OP中test的内容不是特别统一,感觉就先行不改了。

"(Tensor) The input tensor. Tensors with rank at most 6 are supported");
AddOutput("Out", "(Tensor) The result tensor.");
AddComment(R"DOC(
ReduceMean operator computes the sum of input tensor along the given dimension.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sum,Mean,Max,Min的几个OpMaker基本都是一样的。这些代码其实可以用一个基类实现,然后子类来处理不相同的部分。
参考一下:#4139 ElementwiseOpmaker。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks~

: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput(
"X",
"(Tensor) The input tensor. Tensors with rank at most 6 are supported");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

咱们的Tensor最多支持9维。这个6维是怎么来的?

Copy link
Contributor Author

@guoshengCS guoshengCS Sep 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里目前参考的是crop_op中的/~https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/crop_op.h ,可能确实需要再统一下。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

然后?

"(Tensor) The input tensor. Tensors with rank at most 6 are supported");
AddOutput("Out", "(Tensor) The result tensor.");
AddComment(R"DOC(
ReduceMean operator computes the sum of input tensor along the given dimension.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReduceMean -> ReduceSum

namespace operators {

using framework::Tensor;
using framework::LoDTensor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

dims_vector.erase(dims_vector.begin() + dim);
}
auto out_dims = framework::make_ddim(dims_vector);
ctx.Output<framework::LoDTensor>("Out")->Resize(out_dims);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

framework::LoDTensor -> framework::Tensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

AddOutput("Out", "(Tensor) The result tensor.");
AddAttr<int>("dim",
"(int, default 0) The dimension to reduce. "
"Must be in the range [-rank(input), rank(input))")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add more comments for the dim, or add some examples in the Doc.
这里-dim是倒数第几维吧? 加下注释吧。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

auto equals = x == y.broadcast(dim);
auto ones = dx.constant(1);
auto zeros = dx.constant(0);
dx.device(place) = dy.broadcast(dim) * equals.select(ones, zeros);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果max/min值有多个,backward梯度传播的时候,这里的实现多个max值的梯度都为max的grad,而不是一些max值的梯度为0,另外,和 @guoshengCS 讨论,TF这里对多个max/min值的梯度,取了平均: /~https://github.com/tensorflow/tensorflow/blob/37f7ad75bbd2ca140d1092342eb3590d54193bc8/tensorflow/cc/gradients/math_grad.cc#L711

咱们这里的处理加下注释吧~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

auto* input2 = context.Input<Tensor>(framework::GradVarName("Out"));
auto* output = context.Output<Tensor>(framework::GradVarName("X"));

if (output != nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果backward的输出只有一个的话,内部实现不用考虑output == nullptr,这种case,在/~https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/backward.cc#L67 里会返回NOP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


private:
template <size_t D>
void ReduceCompute(const framework::ExecutionContext& context) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为了和上面ReduceCompute区别,这里是否要叫做ReduceGradCompute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


// For EigenTensor unsupported reduce
template <typename T, typename Functor>
class ReduceGradEigenFreeKernel : public framework::OpKernel {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个kernel用在哪里?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. 这里先行删掉

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@guoshengCS guoshengCS merged commit ecef2e6 into PaddlePaddle:develop Sep 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants