Skip to content
This repository has been archived by the owner on Dec 5, 2017. It is now read-only.

rebase to kubernetes v0.5x+ #102

Merged
merged 28 commits into from
Jan 5, 2015
Merged

rebase to kubernetes v0.5x+ #102

merged 28 commits into from
Jan 5, 2015

Conversation

jdef
Copy link

@jdef jdef commented Dec 8, 2014

This is a WIP -- use this branch at your own risk.

TODOs

  • single pod
  • replicated pod, pod failover
  • service backed by replicated pod
  • working guestbook pods, controllers, and services
  • guestbook UI accessible to Internet (see comments below)
  • update usage, example docs
  • update GCE/guestbook tutorial (new master params, firewall steps, etc.)
  • lots of testing
  • post-merge: rebuild jdef/redis, jdef/php-redis images
  • post-merge: request merge of GCE tutorial PR (/~https://github.com/mesosphere/website/pull/416)

@jdef
Copy link
Author

jdef commented Dec 8, 2014

Launched pod-nginx.json from examples/ and it looks like the new events stuff is confused (selflink?):
[[EDIT]] resolved

E1208 22:30:18.236930     980 event.go:106] Could not construct reference to: '&api.Pod{TypeMeta:api.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"nginx-id-01", Namespace:"default", SelfLink:"", UID:"c9b25465-7f29-11e4-8a01-04012f416701", ResourceVersion:"", CreationTimestamp:util.Time{Time:time.Time{sec:63553674618, nsec:0xd65ec03, loc:(*time.Location)(0xfaa2c0)}}, Labels:map[string]string{"cluster":"gce", "name":"foo"}, Annotations:map[string]string(nil)}, DesiredState:api.PodState{Manifest:api.ContainerManifest{Version:"v1beta1", ID:"nginx-id-01", UUID:"c9b212da-7f29-11e4-8a01-04012f416701", Volumes:[]api.Volume(nil), Containers:[]api.Container{api.Container{Name:"nginx-01", Image:"dockerfile/nginx", Command:[]string(nil), WorkingDir:"", Ports:[]api.Port{api.Port{Name:"", HostPort:31000, ContainerPort:80, Protocol:"TCP", HostIP:""}}, Env:[]api.EnvVar{api.EnvVar{Name:"KUBERNETES_SERVICE_HOST", Value:"10.132.100.170"}, api.EnvVar{Name:"KUBERNETES_SERVICE_PORT", Value:"443"}, api.EnvVar{Name:"KUBERNETES_PORT", Value:"tcp://10.132.100.170:443"}, api.EnvVar{Name:"KUBERNETES_PORT_443_TCP", Value:"tcp://10.132.100.170:443"}, api.EnvVar{Name:"KUBERNETES_PORT_443_TCP_PROTO", Value:"tcp"}, api.EnvVar{Name:"KUBERNETES_PORT_443_TCP_PORT", Value:"443"}, api.EnvVar{Name:"KUBERNETES_PORT_443_TCP_ADDR", Value:"10.132.100.170"}, api.EnvVar{Name:"KUBERNETES_RO_SERVICE_HOST", Value:"10.132.100.17"}, api.EnvVar{Name:"KUBERNETES_RO_SERVICE_PORT", Value:"80"}, api.EnvVar{Name:"KUBERNETES_RO_PORT", Value:"tcp://10.132.100.17:80"}, api.EnvVar{Name:"KUBERNETES_RO_PORT_80_TCP", Value:"tcp://10.132.100.17:80"}, api.EnvVar{Name:"KUBERNETES_RO_PORT_80_TCP_PROTO", Value:"tcp"}, api.EnvVar{Name:"KUBERNETES_RO_PORT_80_TCP_PORT", Value:"80"}, api.EnvVar{Name:"KUBERNETES_RO_PORT_80_TCP_ADDR", Value:"10.132.100.17"}}, Memory:0, CPU:0, VolumeMounts:[]api.VolumeMount(nil), LivenessProbe:(*api.LivenessProbe)(0xc2083cf640), Lifecycle:(*api.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", Privileged:false, ImagePullPolicy:""}}, RestartPolicy:api.RestartPolicy{Always:(*api.RestartPolicyAlways)(0xfa8f18), OnFailure:(*api.RestartPolicyOnFailure)(nil), Never:(*api.RestartPolicyNever)(nil)}}, Status:"Running", Message:"", Host:"10.132.189.243", HostIP:"", PodIP:"", Info:api.PodInfo(nil)}, CurrentState:api.PodState{Manifest:api.ContainerManifest{Version:"", ID:"", UUID:"", Volumes:[]api.Volume(nil), Containers:[]api.Container(nil), RestartPolicy:api.RestartPolicy{Always:(*api.RestartPolicyAlways)(nil), OnFailure:(*api.RestartPolicyOnFailure)(nil), Never:(*api.RestartPolicyNever)(nil)}}, Status:"Pending", Message:"", Host:"", HostIP:"", PodIP:"", Info:api.PodInfo(nil)}, NodeSelector:map[string]string(nil)}' due to: 'unexpected self link format: ''; got version '[]''. Will not report event: 'Pending' 'scheduled' 'Successfully assigned nginx-id-01 to 10.132.189.243'

@jdef jdef added the WIP label Dec 9, 2014
@jdef
Copy link
Author

jdef commented Dec 9, 2014

Ugly pod condition - it eventually resolves itself but really that process should not take so long. To reproduce:

  1. start a replication-controller pod, wait for it to transition to Running
  2. ssh to the slave, docker stop all containers related to the pod (including 'net')
  3. watch the replication controller faithfully create a new pod instance
  4. observe that the pod remains in Pending status (for "grace period" of 5m)

kubelet logs:

I1209 15:59:31.650987   25562 log.go:151] GET /podInfo?podID=44b07373-7fbb-11e4-b793-04012f416701&podNamespace=default: (4.799485ms) 200
I1209 15:59:41.461771   25562 executor.go:225] Kill task value:"44b0b656-7fbb-11e4-b793-04012f416701" 
W1209 15:59:41.461872   25562 executor.go:248] Cannot remove Unknown pod 44b07373-7fbb-11e4-b793-04012f416701.default.mesos for task 44b0b656-7fbb-11e4-b793-04012f416701
I1209 15:59:41.703927   25562 executor.go:215] Task 44b0b656-7fbb-11e4-b793-04012f416701 no longer registered, stop monitoring for lost pods
I1209 15:59:46.523493   25562 executor.go:75] Launch task name:"PodTask" task_id:<value:"5cbf3239-7fbc-11e4-b793-04012f416701" > slave_id:<value:"20141205-004157-4055729162-5050-3567-4" > resources:<name:"cpus" type:SCALAR scalar:<value:0.25 > > resources:<name:"mem" type:SCALAR scalar:<value:64 > > resources:<name:"ports" type:RANGES ranges:<range:<begin:31001 end:31001 > > > executor:<executor_id:<value:"KubeleteExecutorID" > framework_id:<value:"20141205-004157-4055729162-5050-3567-0011" > command:<uris:<value:"http://10.132.189.241:9000/proxy" > uris:<value:"http://10.132.189.241:9000/kubernetes-executor" > value:"./kubernetes-executor -v=2 -hostname_override=0.0.0.0 -etcd_servers=http://10.132.189.241:4001" > name:"Kubelet Executor" source:"kubernetes" > data:"metadata:\n  name: 5cbf1a27-7fbc-11e4-b793-04012f416701\n  namespace: default\n  selfLink: /api/v1beta1/boundPods/5cbf1a27-7fbc-11e4-b793-04012f416701\n  uid: 5cbf22e4-7fbc-11e4-b793-04012f416701\n  creationTimestamp: 2014-12-09T15:59:31Z\nspec:\n  volumes: []\n  containers:\n  - name: nginx\n    image: dockerfile/nginx\n    ports:\n    - hostPort: 31001\n      containerPort: 80\n      protocol: TCP\n    env:\n    - name: KUBERNETES_SERVICE_HOST\n      value: 10.132.100.170\n    - name: KUBERNETES_SERVICE_PORT\n      value: \"443\"\n    - name: KUBERNETES_PORT\n      value: tcp://10.132.100.170:443\n    - name: KUBERNETES_PORT_443_TCP\n      value: tcp://10.132.100.170:443\n    - name: KUBERNETES_PORT_443_TCP_PROTO\n      value: tcp\n    - name: KUBERNETES_PORT_443_TCP_PORT\n      value: \"443\"\n    - name: KUBERNETES_PORT_443_TCP_ADDR\n      value: 10.132.100.170\n    - name: KUBERNETES_RO_SERVICE_HOST\n      value: 10.132.100.17\n    - name: KUBERNETES_RO_SERVICE_PORT\n      value: \"80\"\n    - name: KUBERNETES_RO_PORT\n      value: tcp://10.132.100.17:80\n    - name: KUBERNETES_RO_PORT_80_TCP\n      value: tcp://10.132.100.17:80\n    - name: KUBERNETES_RO_PORT_80_TCP_PROTO\n      value: tcp\n    - name: KUBERNETES_RO_PORT_80_TCP_PORT\n      value: \"80\"\n    - name: KUBERNETES_RO_PORT_80_TCP_ADDR\n      value: 10.132.100.17\n    - name: NGINXSERVICE_SERVICE_HOST\n      value: 10.132.100.222\n    - name: NGINXSERVICE_SERVICE_PORT\n      value: \"8000\"\n    - name: NGINXSERVICE_PORT\n      value: tcp://10.132.100.222:8000\n    - name: NGINXSERVICE_PORT_8000_TCP\n      value: tcp://10.132.100.222:8000\n    - name: NGINXSERVICE_PORT_8000_TCP_PROTO\n      value: tcp\n    - name: NGINXSERVICE_PORT_8000_TCP_PORT\n      value: \"8000\"\n    - name: NGINXSERVICE_PORT_8000_TCP_ADDR\n      value: 10.132.100.222\n    terminationMessagePath: /dev/termination-log\n    imagePullPolicy: \"\"\n  restartPolicy:\n    always: {}\n" 
W1209 15:59:46.527971   25562 kubelet.go:927] Pod 5cbf1a27-7fbc-11e4-b793-04012f416701.default.mesos: HostPort is already allocated, ignoring: [[0].port: duplicate value '31001']
I1209 15:59:51.450282   25562 log.go:151] GET /podInfo?podID=5cbf1a27-7fbc-11e4-b793-04012f416701&podNamespace=default: (3.467739ms) 404
...
W1209 16:04:46.650710   25562 executor.go:138] Launch expired grace period of '5m0s'
W1209 16:04:46.650773   25562 executor.go:277] Cannot remove Unknown pod 5cbf1a27-7fbc-11e4-b793-04012f416701.default.mesos for lost task 5cbf3239-7fbc-11e4-b793-04012f416701
...
I1209 16:04:51.495566   25562 event.go:62] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"1b79bfde-7fbd-11e4-b793-04012f416701", UID:"1b79cda7-7fbd-11e4-b793-04012f416701", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required container net"}): status: 'waiting', reason: 'pulled' Successfully pulled image kubernetes/pause:latest
I1209 16:04:51.529423   25562 event.go:62] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"1b79bfde-7fbd-11e4-b793-04012f416701", UID:"1b79cda7-7fbd-11e4-b793-04012f416701", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required container net"}): status: 'waiting', reason: 'created' Created with docker id 7f586477f1b21bd2abf41b79ab34c5ffcfc4ec83188f4f8cce417cb64523fd1f
E1209 16:04:51.601116   25562 kubelet.go:648] Failed to introspect network container. (API error (500): Cannot start container 7f586477f1b21bd2abf41b79ab34c5ffcfc4ec83188f4f8cce417cb64523fd1f: Bind for 0.0.0.0:31001 failed: port is already allocated)  Skipping pod 1b79bfde-7fbd-11e4-b793-04012f416701.default.mesos
E1209 16:04:51.601158   25562 kubelet.go:868] Error syncing pod, skipping: API error (500): Cannot start container 7f586477f1b21bd2abf41b79ab34c5ffcfc4ec83188f4f8cce417cb64523fd1f: Bind for 0.0.0.0:31001 failed: port is already allocated
I1209 16:04:51.601197   25562 event.go:62] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"1b79bfde-7fbd-11e4-b793-04012f416701", UID:"1b79cda7-7fbd-11e4-b793-04012f416701", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required container net"}): status: 'failed', reason: 'failed' Failed to start with docker id 7f586477f1b21bd2abf41b79ab34c5ffcfc4ec83188f4f8cce417cb64523fd1f with error: API error (500): Cannot start container 7f586477f1b21bd2abf41b79ab34c5ffcfc4ec83188f4f8cce417cb64523fd1f: Bind for 0.0.0.0:31001 failed: port is already allocated
I1209 16:04:51.661385   25562 kubelet.go:883] Killing unwanted container {podFullName:44b07373-7fbb-11e4-b793-04012f416701.default.mesos uuid:44b0a1b3-7fbb-11e4-b793-04012f416701 containerName:net}
I1209 16:04:51.661428   25562 kubelet.go:533] Killing: e60df33a71e9a09606a933ebd7d239fc5a3bb07e7d4191fa98fe8f17436d0072
I1209 16:04:51.661616   25562 event.go:62] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"44b07373-7fbb-11e4-b793-04012f416701", UID:"44b0a1b3-7fbb-11e4-b793-04012f416701", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required container nginx"}): status: 'terminated', reason: 'killing' Killing e35c7f899a9a628a09da1c71a0fcb35b2e5f5be238b79ed2f6ee5a1b84c4f9db - /k8s_nginx.7782ab18_44b07373-7fbb-11e4-b793-04012f416701.default.mesos_44b0a1b3-7fbb-11e4-b793-04012f416701_dc5fd203
I1209 16:04:51.800085   25562 executor.go:151] Found pod info: 'map[net:{{0xc2081dcbf0 <nil> <nil>} 0  kubernetes/pause:latest} nginx:{{0xc2081dcfa0 <nil> <nil>} 0  }]'
I1209 16:04:51.849596   25562 event.go:62] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"44b07373-7fbb-11e4-b793-04012f416701", UID:"44b0a1b3-7fbb-11e4-b793-04012f416701", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required container net"}): status: 'terminated', reason: 'killing' Killing e60df33a71e9a09606a933ebd7d239fc5a3bb07e7d4191fa98fe8f17436d0072 - /k8s_net.77718d69_44b07373-7fbb-11e4-b793-04012f416701.default.mesos_44b0a1b3-7fbb-11e4-b793-04012f416701_5a018f3a
I1209 16:04:51.919326   25562 log.go:151] GET /podInfo?podID=1b79bfde-7fbd-11e4-b793-04012f416701&podNamespace=default: (9.400702ms) 200

[[EDIT]] this was further aggravated by a bug fixed in @bda0d35

@jdef
Copy link
Author

jdef commented Dec 9, 2014

Observation: with service portals, kube-proxy allocates ephemeral ports on the slaves to back the virtual ip addresses that iptables is pre-routing. these allocations can possibly consume ports in the range of "ports" resources that slaves are offering to other schedulers. we should find a way to configure kube-proxy to either (a) avoid certain ranges, or else; (b) work within a specified range.

Spec.ProxyPort is currently always zero (meaning random) though there's a TODO to eliminate it?:
/~https://github.com/GoogleCloudPlatform/kubernetes/blob/release-0.5/pkg/registry/service/rest.go#L115

It's worth noting that on Linux, at least, the default Mesos "ports" resource range is outside of the OS's default ephemeral port range - meaning that, by default, random ports allocated by kube-proxy should not squash mesos jobs consuming "ports" resources.

~# cat /proc/sys/net/ipv4/ip_local_port_range
32768   61000

Should probably document the potential for disaster here as a known issue.

@jdef
Copy link
Author

jdef commented Dec 9, 2014

After spending some time wading through various issues in the upstream tracker it's unclear to me how to best approach a generic "least-surprise" solution for mesos users wanting to expose a service on a public address. So now I'm starting to think about hacks.

One hack that occurs to me is to "filter" the /services REST API and modify, only empty values of, PublicIP for entries of returned service.Spec's such that they'll point to the public ip-address of the machine the master is running on. This would also require that kube-proxy runs on the master. I'm not terribly fond of this approach, but it does offer convenience to mesos users who want to run in environments where external load balancers may not be readily available. This hack could be enabled by default, and disabled with a flag (-master_loadbalancer=false or some such).

@jdef jdef added this to the M2 milestone Dec 9, 2014
@jdef
Copy link
Author

jdef commented Dec 11, 2014

looks like the proxier has some changes in 0.6x re: public ips working better on non-GCE clouds. it may be worth considering changing this PR to a 0.6 upgrade to pick those up.
[[EDIT]]
v0.6.1 was just labelled as an official release

@jdef jdef changed the title rebase to kubernetes v0.5x rebase to kubernetes v0.5x+ Dec 14, 2014
@jdef
Copy link
Author

jdef commented Dec 15, 2014

Note: during the v0.6.2 rebase a dependency on golang.org/x/net/context was added. The revision used by k8s was giving godep some problems (couldn't find the commit), so I just changed the revision to the latest master.

@jdef
Copy link
Author

jdef commented Dec 15, 2014

Running kube-proxy on the master node to act as a front-end load balancer for external clients:

sudo ./bin/kube-proxy -bind_address=${servicehost} -etcd_servers=http://${servicehost}:4001 \
  -logtostderr=true -v=4 >proxy.log 2>&1 &

Configure frontend-service with publicIPs pointing to the master node. Poke a hole in the GCE firewall for 9998 for nodes tagged master. Add a master tag to the master node (would be nice if mesosphere scripts did this automatically).

{
  "id": "frontend",
  "kind": "Service",
  "apiVersion": "v1beta1",
  "port": 9998,
  "selector": {
    "name": "frontend"
  },
  "publicIPs": [
    "146.148.86.181"
  ]
}

Then (WIP) open up iptables on the master:

sudo iptables -A INPUT -i eth0 -p tcp -m state --state NEW,ESTABLISHED \
  -m tcp --dport 9998 -j ACCEPT

(apparently I need something more since this isn't working, but it does stop logging rejected incoming packets).

@jdef
Copy link
Author

jdef commented Dec 16, 2014

Got the UI working for external clients on GCE. publicIPs has to match $servicehost for masters:

jclouds@development-2863-77a:~$ cat /tmp/frontend-service.json
{
  "id": "frontend",
  "kind": "Service",
  "apiVersion": "v1beta1",
  "port": 9998,
  "selector": {
    "name": "frontend"
  },
  "publicIPs": [
    "10.57.172.200"
  ]
}
jclouds@development-2863-77a:~$ echo $servicehost
10.57.172.200

Next, determine the port that the proxy is listening on for the frontend service:

jclouds@development-2863-77a:~$ sudo iptables-save|grep -e frontend
-A KUBE-PROXY -d 10.10.10.79/32 -p tcp -m comment --comment frontend -m tcp --dport 9998 -j DNAT --to-destination 10.57.172.200:56640
-A KUBE-PROXY -d 10.57.172.200/32 -p tcp -m comment --comment frontend -m tcp --dport 9998 -j DNAT --to-destination 10.57.172.200:56640

Then, add an iptables to poke a hole in the FW for that service:

jclouds@development-2863-77a:~$ sudo iptables -A INPUT -i eth0 -p tcp \
  -m state --state NEW,ESTABLISHED -m tcp --dport 56640 -j ACCEPT

@jdef
Copy link
Author

jdef commented Dec 16, 2014

Wow. k8s v0.7 was just RC'd

@jdef
Copy link
Author

jdef commented Dec 16, 2014

I was trying to access the k8s UI from a web browser, but the iptables prerouting rules aren't set up to DNAT requests that way. Not critical

@jdef jdef mentioned this pull request Dec 17, 2014
@jdef
Copy link
Author

jdef commented Dec 26, 2014

testing on GCE, params used for k8sm framework (use these for updating the tutorial):

# -proxy_path -> kube-proxy
# -portal_net -> 10.10.10.0/24
# -mesos_user -> root

@vladimirvivien
Copy link

@jdef let me know if there's something specific you want me to review merging.

@jdef
Copy link
Author

jdef commented Dec 26, 2014

Thanks. This PR is the next one to be merged, though I've been holding off
until I get a chance to put together a PR for the GCE-based tutorial. I've
gotten access to the website repo that hosts the tutorial, so it's just a
matter of doing the updates at this point. I've started working through the
process but haven't put together a PR yet.

On Fri, Dec 26, 2014 at 11:29 AM, Vladimir Vivien notifications@github.com
wrote:

@jdef /~https://github.com/jdef let me know if there's something specific
you want me to review merging.


Reply to this email directly or view it on GitHub
#102 (comment)
.

James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)

jdef added a commit that referenced this pull request Jan 5, 2015
@jdef jdef merged commit 604cdb6 into master Jan 5, 2015
@jdef jdef deleted the rebase_05x branch January 6, 2015 16:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants