Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aysnc send/recv, seriliaze/deserialize using threadpool. #7705

Merged
merged 2 commits into from
Jan 24, 2018

Conversation

helinwang
Copy link
Contributor

@helinwang helinwang commented Jan 20, 2018

This PR added performance optimization: send/recv can use threadpool to serialize/deserialize tensor. (before this PR tries to add retry logic, now this PR only handles the threadpool performance optimization).

Fix: #7801

auto rpc =
s->stub_->AsyncSendVariable(s->context_.get(), *(req.get()), &cq_);
// Finish will block until failed or the response is received.
rpc->Finish(&s->reply_, &s->status_, (void*)s);
Copy link
Contributor

@gongweibao gongweibao Jan 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我看了一下文档,忽然有一个疑惑,既然这里是block住的(以前我看例子以为是Finish只是注册一下reply,status, tag等,Next才是block的。)

This function will return when either:

when the server's response message and status have been received.
when the server has returned a non-OK status (no message expected in this case).
when the call failed for some reason and the library generated a non-OK status.

那为何还需要completion_queue.Next(), 这里的Finish是说收到了消息但是消息还没有全部返回完毕?
我觉得grpc这个接口这么设计有点反人类。

我们还是需要找一个正经的Async接口。^_^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉既然叫Finish也是有道理的。

Copy link
Contributor

@gongweibao gongweibao Jan 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我测试了一下,这里的server response应该不是user level的消息,应该是grpc的框架把caller的数据发送完毕以后接收到response,提示数据发送完毕或者其他情况。

This function will return when either:
    when the server's response message and status have been received.
    when the server has returned a non-OK status (no message expected in this case).
    when the call failed for some reason and the library generated a non-OK status.

测试方式是:即便server端不处理client端发送的消息,只要框架成功接收,client端也是收到了server response. Server response is not user RPC reply.

不过以前的实现确实不是fully async。需要thread_pool去发送。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

另外,猜测:Status可能是被复用的。RPC的过程中一直在被更新。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

谢谢!看来Finish并不收到处理结果。

new sendrecv::VariableMessage());
SerializeToMessage(var_name, var, ctx, req.get());

std::thread thread([req, var_h, ch, time_out, this] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果参数多会有太多的thread生成出来,可能会有问题。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,不建议在每次call中创建thread或者可以用threadpool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

Copy link
Contributor

@Yancey1989 Yancey1989 Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for later review, the threads number was set by std::thread::hardware_concurrency in the global ThreadPool, it's usually equal to CPU core numbers but not always. I think it's too few for IO threads, maybe we need another threadpool to deal with the IO thread. But we can merge this PR firstly, and do more improve for future.

@helinwang helinwang changed the title Send OP: send / get var asynchronous, retry when send / get failed. Send OP: send / get var asynchronous. Jan 23, 2018
@helinwang
Copy link
Contributor Author

@gongweibao @typhoonzero thanks for the feedback! I removed the retry code since the error can not be reproduced anymore on the latest develop branch. And the retry logic need more thinking before we implement it.

This PR now only do concurrent send / recv, serialization / deserialization.

@helinwang helinwang changed the title Send OP: send / get var asynchronous. aysnc send/recv, seriliaze/deserialize using threadpool. Jan 23, 2018
Copy link
Contributor

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just for reminding, parrallel_for also use the same thread pool, it's a singleton. When we need to do multi-thread + multi-node training we must separate these two threadpools.

Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


LGTM++

@Yancey1989 Yancey1989 merged commit 1ab1181 into PaddlePaddle:develop Jan 24, 2018
@helinwang
Copy link
Contributor Author

helinwang commented Jan 24, 2018

Thanks! @typhoonzero @Yancey1989 I would not worry too much about the threadpool, because this PR for the most part only run serialization and deserialization on the threadpool. The send and recv are scheduled internally by grpc, since @gongweibao implemented grpc aysnc send/recv. I know there is a grpc::Finish() call in our threadpool, but @gongweibao tested it does not block until the server returns a response.

@gongweibao
Copy link
Contributor

gongweibao commented Jan 25, 2018

but @gongweibao tested it does not block until the server returns a response.

Some explanations:
The client does blocks until the server returns a response. The client doesn't block until the server returns an RPC reply.
A response is not an RPC reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants