-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
try to fix unit test test_ProtoServer, test_TrainerOnePass #2414
Conversation
fix error: /paddle/paddle/.common_test_util.sh: line 97: netstat: command not found
e894459
to
c03214e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
paddle/pserver/LightNetwork.cpp
Outdated
break; | ||
} | ||
|
||
if (errno == ECONNREFUSED) { | ||
LOG(WARNING) << "connection refused by pserver, try again!"; | ||
if (retry_second++ >= 7) { | ||
LOG(FATAL) << "connection refused by pserver, maybe pserver failed!"; | ||
} | ||
std::this_thread::sleep_for(std::chrono::seconds(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that retry_second
is defined by never used here. Should we make this line the following?
std::this_thread::sleep_fo(std::chrono::seconds(retry_seconds);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the retry_second
in the code actually means "how many second we have retried", not a backoff strategy like "how many second to wait for current retry". Maybe change to retry_count
is more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Changed to retry_count
.
Fixes: #2401