-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accuracy op #3907
Accuracy op #3907
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's better to split into small PRs. : )
paddle/operators/accuracy_op.cc
Outdated
auto *label = ctx.Input<framework::Tensor>("Label"); | ||
|
||
// label must be a vector | ||
PADDLE_ENFORCE_EQ(label->dims().size(), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put the comment into the value assertation.
PADDLE_ENFORCE_EQ(label->dims().size(), 1, "label must be a vector")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
paddle/operators/accuracy_op.cc
Outdated
|
||
// label must be a vector | ||
PADDLE_ENFORCE_EQ(label->dims().size(), 1); | ||
PADDLE_ENFORCE_EQ(inference->dims()[0], label->dims()[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One assertation should contain some info if it cores dump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
paddle/operators/accuracy_op.h
Outdated
auto* label = ctx.Input<Tensor>("Label"); | ||
auto* accuracy = ctx.Output<Tensor>("Accuracy"); | ||
const size_t topk = 1; | ||
// static_cast<AttrType>(ctx.op_.GetAttr<AttrType>("topk")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the comment if it is unused code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
paddle/operators/accuracy_op.h
Outdated
} | ||
|
||
// FIXME(typhoonzero): we don't accumulate the accuracy for now. | ||
*accuracy_data = static_cast<T>(num_correct) / static_cast<T>(num_samples); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why here need to cast to the type of T
? I think accuracy_data always is float
or double
according to the precision.
Suppose that we are serving online, then the T will be fp16
for acceleration, the accuracy_data will get the wrong type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
paddle/operators/accuracy_op.h
Outdated
auto* inference = ctx.Input<Tensor>("Inference"); | ||
auto* label = ctx.Input<Tensor>("Label"); | ||
auto* accuracy = ctx.Output<Tensor>("Accuracy"); | ||
const size_t topk = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems topk
is unused in Accuracy
operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
paddle/operators/top_k_op.cc
Outdated
// input must have >= 1d shape. | ||
PADDLE_ENFORCE_GE(input->dims().size(), 1); | ||
// input must have >= k columns. | ||
PADDLE_ENFORCE_GE(input->dims()[input->dims().size() - 1], k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with above. The user can read the enforce information without trace into the source code.
paddle/operators/top_k_op.cu
Outdated
} | ||
|
||
template <typename T, int BlockSize> | ||
__device__ __forceinline__ void GetTopK(Pair<T> topk[], const T* src, int idx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need to put __forceinline__
explicitly? According to its document,
forceinline will be added by the nvcc compiler. To the best of my knowledge, function contains while
, ifelse
control flow loop, it can never be an inline function. Neither in c++ or cuda code.
paddle/operators/top_k_op.h
Outdated
|
||
// reshape input to a flattern matrix(like flat_inner_dims) | ||
framework::DDim inputdims = input->dims(); | ||
const size_t row = framework::product( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be a flatten_to_2d interface in ddim, maybe should put a TODO here to replace in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also maybe we need different ways to flatten: flatten_to_2d_inner
and flatten_to_2d_outter
paddle/operators/top_k_op.h
Outdated
X.reshape(flat2dims); | ||
|
||
for (size_t i = 0; i < row; i++) { | ||
// TODO(typhoonzero): make this more efficient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have a better choice since the partial_sort
is heap sort.
Topk
is such a classic question, so I don't think the efficiency is a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm think try not to copy the memory, since the input is const, we cannot do it currently.
paddle/operators/accuracy_op.cc
Outdated
AddOutput("Accuracy", "The accuracy of current batch"); | ||
|
||
AddComment( | ||
R"DOC(Accuracy. It will print accuracy rate for classification."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad comments here.
R"DOC(...)DOC"
is just like python """..."""
, which R"DOC(
is the left """
and )DOC"
is the right """
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I had ever been confused about what does R"abc(....)abc"
mean (abc
can be anything, it is just a delimiter ).
I found these two docs solve my questions.
Just to share.
paddle/operators/accuracy_op.cu
Outdated
auto* accuracy = ctx.Output<Tensor>("Accuracy"); | ||
const int* inference_data = inference->data<int>(); | ||
const int* label_data = label->data<int>(); | ||
T* accuracy_data = accuracy->mutable_data<T>(ctx.GetPlace()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the accuracy_data should also be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, what do you mean by "change", accuracy_data is passed to cuda kernel and assign the result in the cuda kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the same CPU code below.
float* accuracy_data = accuracy->mutable_data<float>(ctx.GetPlace());
Here the accuracy_data type is T
, which may lose precision, maybe its better to write as mutable_data<float>
. And I noticed that the AccuracyDivideKernel received a parameter of float
.
|
||
// FIXME(typhoonzero): we don't accumulate the accuracy for now. | ||
*accuracy_data = | ||
static_cast<float>(num_correct) / static_cast<float>(num_samples); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an ENFORCE
check num_samples is not equal to zero? when user misuse this operator, the num_samples may be zero. I'm not sure it's useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, this return 0 if num_sample==0
paddle/operators/accuracy_op.cu
Outdated
auto* accuracy = ctx.Output<Tensor>("Accuracy"); | ||
const int* inference_data = inference->data<int>(); | ||
const int* label_data = label->data<int>(); | ||
T* accuracy_data = accuracy->mutable_data<T>(ctx.GetPlace()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the same CPU code below.
float* accuracy_data = accuracy->mutable_data<float>(ctx.GetPlace());
Here the accuracy_data type is T
, which may lose precision, maybe its better to write as mutable_data<float>
. And I noticed that the AccuracyDivideKernel received a parameter of float
.
paddle/operators/accuracy_op.cu
Outdated
size_t num_samples = inference->dims()[0]; | ||
size_t infer_width = inference->dims()[1]; | ||
|
||
AccuracyDivideKernel<<<1, 1>>>(num_samples, infer_width, 1, inference_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar to cuda. It seems launched kernel in <<<1,1>>> can not utilize the capacity of cuda. This operator writes the similar one crossEntropy, but I don't know how the block=512
comes from.
@qingqing01
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I'm still trying to let the kernel use more threads, I'll enhance the kernel in next PR. I've got some problem writing kernel with atocimAdd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM++
Fix #3840