-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingressroute status updates can cause update loop #854
Comments
That's interesting @dbason. How many services/endpoints do you have in the cluster? In performance testing, we've encountered high memory, but that was in the thousands of services range due to some gRPC implementations that should be resolved in Envoy soon (or could be already), however, seems like that listener update is the culprit. Could you share the |
@stevesloka here it is
|
Thanks for your report. I'm sorry you're having issues. Can you please provide the logs from the |
@davecheney the only thing I see in the envoy container after adding this is the log line in my original report. That one is repeated every second or so. I've had to pull the routes out of my cluster as they were impacting our QA environment however I will try and get the logs from the contour container when I added them. |
Thanks. That will be a help in helping me understanding if the contour container is wedging, if we're sending constant updates to envoy, or if envoy is wedged on a single update. |
Attached is an extract of the contour logs from around the time of the issue Edit: Also the test seemed repeatable with the yaml I have included. If it helps I can set up a test k8s cluster and try to reproduce the issue there with the same services (I may not be able to set this up until tomorrow though). |
Thank you. From what I can see contour is not trying to send
continuous updates to envoy so the problem could be in envoy jammed
updating itself.
…On Thu, 24 Jan 2019 at 11:44, dbason ***@***.***> wrote:
Contour.xlsx
Attached is an extract of the contour logs from around the time of the issue
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@dbason I think I might have reproduced the fault. Can you please confirm for me which contour tag you have deployed, eg, v0.8.1, latest, or master. If you're using master, can you please downgrade to a released version and let me know if that fixes the problem. |
I've confirmed this happens with 0.8.1 as well, which is currently the :latest tag. The bug, I believe, is contour is updating the status on an ingressroute document, which causes the ingressroute update to fire again, forming a loop. The errant status is, I thinki
In your example, if the services are all present, this should fix the loop. I think Obviously I'm going to supress the looping, this is unhelpful. |
@davecheney we're on 0.8.1 - did you want me to downgrade to 0.8.0 and try again? |
I don't think it'll help, I suspect the problem has been there for a few
releases. The underlying cause is contour is constantly updating an
ingressroute record's status which triggers contour to refresh, push a
change to envoy, update the status .. rinse, repeat.
You can figure out which ingressroute record is being pummelled with
something like
k get ingressroutes --all-namespaces -w
and then watching which object is being constantly rewritten.
I'll have this fixed for 0.9 ASAP
…On Fri, 25 Jan 2019 at 14:30, dbason ***@***.***> wrote:
@davecheney </~https://github.com/davecheney> we're on 0.8.1 - did you want
me to downgrade to 0.8.0 and try again?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#854 (comment)>, or mute
the thread
</~https://github.com/notifications/unsubscribe-auth/AAAcA9xiNXn7rY_n_h97VioQAwy7PjKPks5vGnpggaJpZM4aLzd4>
.
|
I've updated the ticket to reflect what I think is happening. The problem is contour can get into a loop where writing a status update to the IngressRoute object causes another contour to see the update, which causes it to process the ingressroute, find it invalid and update the status. Normally the status update should be idempotent, but in the case where there is more than one invalid part of the ingressroute spec, the patch operation succeeds, generating a new onUpdate message and things start again. i'm testing a patch where I selectively filter out parts of the IngressRoute object during the onUpdate calculation. That should unblock 0.9 but a more permanent solution will be needed for 0.10 |
Updates projectcontour#854 Depending on the order in which parts of an ingressroute document are processed the invalid status message generated may differ over runs. Specifically two or more invalid components of an ingressroute will patch status in a random order, defeating patch's "is this the same" check and causing Contour to see the update to the ingressroute's document, thus triggering a new OnUpdate event and the cycle starts again. For the interim filter out fields which are known to change -- status, and metadata.resourceversion, during the OnUpdate comparision. Signed-off-by: Dave Cheney <dave@cheney.net>
Updates projectcontour#854 Depending on the order in which parts of an ingressroute document are processed the invalid status message generated may differ over runs. Specifically two or more invalid components of an ingressroute will patch status in a random order, defeating patch's "is this the same" check and causing Contour to see the update to the ingressroute's document, thus triggering a new OnUpdate event and the cycle starts again. For the interim filter out fields which are known to change -- status, and metadata.resourceversion, during the OnUpdate comparision. Signed-off-by: Dave Cheney <dave@cheney.net>
Moving to 0.12 because its blocked on #881 |
I think I've found a (there might be more than one) cause of this. If a certificate is missing, then, depending on the order in which it is processed by the dag builder, the Ingressroute may flip between valid and invalid, and those status updates case the dag to be regenerated. We already filter out the status field when comparing Ingressroute objects, but this looping may be caused by the For those interested, this is relatively easy to recreate in a cluster with some Ingressroute records; after #1608 change |
ok, its likely not the
|
Digging into this a bit more I believe the root cause is multiple contours trying to update status concurrently. Moving to a model where the status writer is the leader is the solution. |
I'm going to close this issue in favour of #1425. The underlying issue is multiple contour's attempting to update status. The move to a single status writer driven by leader election, I feel, is the solution. |
What steps did you take and what happened:
[A clear and concise description of what the bug is.]
Added non root IngressRoutes. Then added a root IngressRoute delegating to those. This caused a relatively high CPU and Memory usage (150%cpu usage and 2GB RAM - container is limited to 2GB). This seems quite high for a single IngressRoute
What did you expect to happen:
Envoy continuing as normal
Anything else you would like to add:
After adding the IngressRoute with delegates it seems the envoy container is constantly logging the following message:
[2019-01-22 02:20:05.816][1][info][upstream] source/server/lds_api.cc:80] lds: add/update listener 'ingress_https'
Environment:
kubectl version
):Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.3", GitCommit:"f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd", GitTreeState:"clean", BuildDate:"2017-11-08T18:39:33Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3",
GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T17:53:03Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}/etc/os-release
): CentOS 7The text was updated successfully, but these errors were encountered: