-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement pserver RPC part, and simple parameter partition. #2243
Conversation
paddle/go/cclient/cclient.go
Outdated
type selector bool | ||
|
||
func (s selector) Select() bool { | ||
return bool(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type switch for clearly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this is clear because the underlying type for selector
is bool
, but I could be wrong.
I don't fully understand how to do type switch here, could you paste the code here?
paddle/go/cclient/cclient.go
Outdated
type lister []pserver.Server | ||
|
||
func (l lister) List() []pserver.Server { | ||
return l |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should implement this interface bases on ectd? for example, list pserver names from etcd value for fault tolerant pserver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will write a module who implements this interface that talks to etcd.
However, this PR only includes a module with static pserver addresses. To be used for MPI cluster.
I will send a separate PR for the etcd module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get it
paddle/go/pserver/client.go
Outdated
return nil | ||
} | ||
|
||
type result struct { | ||
idx int | ||
p Parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe name with "param" better here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.
paddle/go/pserver/client.go
Outdated
ticker := time.NewTicker(10 * time.Second) | ||
monitor := func() { | ||
curServers := make([]Server, pserverNum) | ||
list := l.List() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to check the pserverNum == len(list) here to ensure that alive pserver instance equal the desired number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok if number of pservers still did not reach the desired number. We will only connect to pservers that are alive. For pservers not alive yet, we will not connect to them, and all calls to them will be blocked until connected. (See: /~https://github.com/PaddlePaddle/Paddle/pull/2243/files#diff-b1a8f38187aa603e0efc16791f282d87R61)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explain. I got it. : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Easier to view using github split diff :)
Please use the following command to build, because there is some problem with the go cmake (the cmake checks out a new Paddle from github for Go dependency, so the changes in local filesystem are not reflected. Fix is in another PR: #2294).
open new teminal and run:
./test/main # run the example program we just build