Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change script to support tensorflow distribution on k8s. #617

Merged
merged 20 commits into from
Feb 27, 2018

Conversation

gongweibao
Copy link
Collaborator

No description provided.


r = []
for ip in ips:
if port != "0":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use port != None as it's the default value of the function arg.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pod_list = fetch_pods_info(label_selector)
pserver_ips = [item[1] for item in pod_list]
return ",".join(pserver_ips)
def fetch_ips(label_selector, port=None, phase=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called fetch_ips but trying to append ports, try using fetch_endpoints

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

trainer_ips.sort()

def fetch_id(label_selector, phase=None):
pod_list = fetch_pods_info(label_selector, phase)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are the same as fetch_ips, try put them in one function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gongweibao gongweibao requested a review from helinwang February 24, 2018 07:03
return fetch_endpoints(label_selector, port, phase)


def fetch_pserver_ips(label_selector, port=None, phase=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused fetch_pserver_ips called fetch_endpoints function, fetch_endpoints would return IP:PORT rather than IP.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_xxx_ips can have no argument port because we always use None. And, maybe it's better to add some comments to tell if port=None, fetch_endpoints will return a list of IPs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

api_response = v1.list_namespaced_pod(
namespace=NAMESPACE, pretty=True, label_selector=label_selector)
pod_list = []
for item in api_response.items:
if phase is not None and item.status.phase != phase:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If phase == "Running" then need to check if it's Terminating

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return fetch_endpoints(label_selector, port, phase)


def fetch_pserver_ips(label_selector, port=None, phase=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_xxx_ips can have no argument port because we always use None. And, maybe it's better to add some comments to tell if port=None, fetch_endpoints will return a list of IPs.

Copy link
Collaborator

@Yancey1989 Yancey1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gongweibao gongweibao merged commit 30426c9 into PaddlePaddle:develop Feb 27, 2018
@gongweibao gongweibao deleted the fixtf branch February 27, 2018 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants