Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix pserver sub-blocks #11585

Merged
merged 10 commits into from
Jun 24, 2018
Merged

Conversation

Yancey1989
Copy link
Contributor

Fixed #11429

@Yancey1989 Yancey1989 removed the request for review from cxysteven June 20, 2018 06:38
@velconia
Copy link
Collaborator

velconia commented Jun 20, 2018

Should we remove these sub blocks from blk_list in the RunAsyncLoop method of listen_and_serv_op, like what we do in the RunSyncLoop method?

@Yancey1989
Copy link
Contributor Author

Should we remove these sub blocks from blk_list in the RunAsyncLoop method of listen_and_serv_op, like what we do in the RunSyncLoop method?

May not, RunAsyncLoop use grad_to_block_id to determinate which optimize block to be execute by gradient var, and also don't support lr decay in async update.

if not op.has_attr('sub_block'):
return
return -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who uses the return value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

PADDLE_ENFORCE_GE(num_blocks, 2,
"server program should have at least 2 blocks");

std::vector<int> optimize_block_id_list;
for (int blkid = 1; blkid < num_blocks; ++blkid) {
if (std::find(prefetch_block_id_list.begin(), prefetch_block_id_list.end(),
blkid) == prefetch_block_id_list.end()) {
blkid) == prefetch_block_id_list.end() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make it beautiful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, please follow the comment: #11585 (review)

@@ -101,13 +101,16 @@ void ListenAndServOp::RunSyncLoop(
framework::Scope *recv_scope,
const std::vector<int> &prefetch_block_id_list) const {
size_t num_blocks = program->Size();
auto skip_sub_blks = Attr<std::vector<int>>("skip_sub_blks");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Record only the start block id and end block id for listen_and_serv op to run parallelly. Sub-blocks append to the end to make the program in a strict order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way is that we record the parallel block ids, the benefit is don't need insert block to the specified idx, just append block to pserver_program and record the optimize block id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse that deprecated optimize_block attribute could be cool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do that.

typhoonzero
typhoonzero previously approved these changes Jun 22, 2018
Copy link
Contributor

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks this should work.

@Yancey1989 Yancey1989 merged commit 0d4b376 into PaddlePaddle:develop Jun 24, 2018
@Yancey1989 Yancey1989 deleted the fix_pserver_sub_blocks branch June 24, 2018 05:05
Yancey1989 pushed a commit to Yancey1989/Paddle that referenced this pull request Jun 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants