-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only listen to configured k8s namespaces. #1895
Only listen to configured k8s namespaces. #1895
Conversation
This is still WIP, mostly because a proper integration test is yet missing (#1889). Nevertheless, I'd be happy to hear some early feedback (from everyone in general and in particular from @containous/kubernetes). |
It looks like getting Kubernetes integration tests up and running will still require a bit of effort. Nevertheless, merging the PR is advisable IMO as Traefik can be operated in a much more secure way on Kubernetes with namespace-specific watches. Plus, we stop spamming the logs with tons of unrelated events. I'll remove the WIP label for now and suggest we test this manually like we did with the previous changes. |
cd3445a
to
4f60171
Compare
Note to myself: The PR currently sets the |
Wow @timoreimann! |
@timoreimann this is on my radar and I will have a through review done by early next week.. RE the all namespaces thing, my setup currently depends on on this so we should certainly keep it as the default IMO |
24172a9
to
98e6ba8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice and clean to me...
Subject to @emilevauge 's concerns about benchmarking / soak testing LGTM
Yeah I'll definitely try to work out some kind of benchmark. Probably not going to happen before the 1.4 freeze, but not critical either. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks super clean and boss.
LGTM
provider/kubernetes/client.go
Outdated
@@ -19,31 +20,55 @@ import ( | |||
"k8s.io/client-go/tools/cache" | |||
) | |||
|
|||
const resyncPeriod = time.Minute * 5 | |||
const resyncPeriod = 10 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timoreimann this is too low, it should be like each 10 minutes. Just in case with this you are forcing a full sync (from api server) each 10 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU, there will be syncs only for the restricted items each informer is watching. Or am I missing something?
I'm okay with increasing the period in general. Syncing every 10 minutes though sounds like we'd not be catching up on missed Ingresses, Services, and Endpoints for a fairly long time. I wonder if that would still be acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
full resync != delta updates (like Ingresses, Services, and Endpoints )
Edit: please check the gce ingress controller
/~https://github.com/kubernetes/kubernetes/blob/master/cluster/saltbase/salt/l7-gcp/glbc.manifest#L47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, the glbc watches all namespaces. With this PR, Traefik will be watching selective namespaces only. What's being pulled in by a full sync should be subject to the list part of the ListWatch interface. So to my understanding, the sync volume should be much more restricted.
Please let me know if my assumptions are wrong somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to my understanding, the sync volume should be much more restricted.
Yes, that's correct but you are assuming the namespace only contains a few objects (ingress,services,secrets, etc). What happens if the user have hundred of objects in the namespace/s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a trade-off between not overloading the cluster and being able to compensate for ephemeral errors in time. Maybe there's no one size fits all, and it'd be best to make the parameter configurable.
I agree though that we can probably bump the interval somewhat for now. Will go with a minute or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aledbf you might be interested in this thread I started on the kubernetes-sig-api-machinery mailing list. Some take aways for me:
- The resync period only causes the local cache content to get replayed; in particular, it does not hit the API server (that's what an explicit relist is for).
- First and foremost, the resync period is a means to compensate for controller (not watch) bugs, though re-queuing techniques available in more recent informer versions take away a lot of that rationale. That said, resyncs may also be used for other cases. (Daniel describes how you can use them as a clock to initiate polls for external APIs periodically.)
- It's a viable option to disable resyncs completely in order to surface controller bugs more quickly instead of risking to hide them. One example component that does this is the Kubernetes scheduler.
Hope this helps.
We previously watched all namespaces, which was causing issues for users that want to apply restrictive RBAC rules. Hence, we change the logic to hook up one informer per namespace and kind we observe. Note that this moves the namespace filtering into the client/informers (where it properly belongs). A nice side effect of this is that we stop receiving tons of events from unrelated namespaces (most notably kube-system), which has been a source of confusion for users in the past. We also don't need the extra event notifications as "retry signals" in case of ephemeral API failures anymore since we can now safely rely on a fairly low sync period for recovery purposes. This is preferable to getting hammered with event updates at sub-second intervals, only to have Traefik's back-off sleep of 2 seconds kicking in very often. A few additional improvements that made it in: - Wait for the store synchronization to happen during WatchAll instead of when we process events. The wait needs to happen exactly once. - Do not close the event channel until all controllers have stopped processing. This was a pending race condition that could yield to crash. - Remove one superfluous event/watch channel (name changes depending on which method you're looking at). - Remove the namespace-related unit tests. (Must be replaced by to-be-written integration tests.) - Group methods logically. - Improve GoDocs.
As a consequence, the lookup key is now dependent on whether we use NamespaceAll or more specific namespaces.
Shared informers are the standard informer types in more modern releases of client-go, so we can only benefit by using them now as well (even though we don't share any caches and presumably won't do so in the near future). Additionally, we replace the individual Watch* methods by a single WatchObjects method.
6c85c79
to
d0ac8f5
Compare
I finally came around running benchmarks. This is what I did:
Here's what I observed:
I also verified that the number of backend servers matches the pod count. Overall, this looks good to me. We surely want to gain more real world experience to solidify the new implementation. I do think though that the tests demonstrate basic correctness. Happy to run more tests if desired. |
Shortly going back to WIP as I realized RBAC documentation updates are yet missing... |
...aaaaand it's done. |
To all involved reviewers: Please indicate until Monday midnight CEST (i.e., in a good 22 hours from now) if your existing review should not hold anymore -- I'm happy to address any further comments. Otherwise, I'll proceed with merging the PR. |
@timoreimann I think this is a good example that |
@emilevauge strictly speaking, I guess every new commit should invalidate all existing LGTMs. Practically speaking, I'm not sure if we want to be that strict, especially since it'd take another three approvals. For the PR at hand, the changes that happened after the LGTMs were received are rather simple / insignificant. So personally I feel confident moving forward even without another round of eye balls. Either way, we may want to consider tuning the process to add clarity. |
Yes, but here I'm talking about the fact that this PR was already reviewed by 3 maintainers, and still in Just saying we should really be cautious with |
Watch configured namespaces only.
We previously watched all namespaces, which was causing issues for users that want to apply restrictive RBAC rules. Hence, we change the logic to hook up one informer per namespace and kind we observe.
Note that this moves the namespace filtering into the client/informers (where it properly belongs). A nice side effect of this is that we stop receiving tons of events from unrelated namespaces (most notably kube-system), which has been a source of confusion for users in the past. We also don't need the extra event notifications as "retry signals" in case of ephemeral API failures anymore since we can now safely rely on a fairly low sync period for recovery purposes. This is preferable to getting hammered with event updates at sub-second intervals, only to have Traefik's back-off sleep of 2 seconds kicking in very often.
A few additional improvements that made it in:
Finally, we stop logging complete events as we receive them to reduce noise and stop leaking sensitive secrets data into the logs.
Fixes #1626.