Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve response times in Kubeapps APIs when using Pinniped-proxy #5407

Closed
castelblanque opened this issue Sep 28, 2022 · 11 comments · Fixed by #5518 or #5583
Closed

Improve response times in Kubeapps APIs when using Pinniped-proxy #5407

castelblanque opened this issue Sep 28, 2022 · 11 comments · Fixed by #5518 or #5583
Assignees
Labels
component/pinniped-proxy Issue related to kubeapps integration with pinniped-proxy kind/enhancement An issue that reports an enhancement for an implemented feature rust Pull requests that update Rust code

Comments

@castelblanque
Copy link
Collaborator

castelblanque commented Sep 28, 2022

Summary
Some requests (e.g. core.packages.v1alpha1.PackagesService.GetAvailablePackageSummaries) are slow when using pinniped-proxy. Investigate and improve those times.

Description
After work done for #5362 and #4919, response times have improved due to not creating unused multiple K8s Go clients.
However, there is still what seems to be a throttling problem when using pinniped-proxy. Multiple requests are done from kubeapps-apis to K8s API (with correct QPS and Burst values), but requests go through pinniped-proxy.
Log from the proxy:

2022-09-28T13:48:38.784778 [INFO] - GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta2?timeout=32s 200 OK
2022-09-28T13:48:38.787947 [INFO] - GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:38.788917 [INFO] - GET https://kubernetes.default/apis/storage.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.789350 [INFO] - GET https://kubernetes.default/apis/kubeapps.com/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.880150 [INFO] - GET https://kubernetes.default/apis/authorization.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.880827 [INFO] - GET https://kubernetes.default/apis/apiregistration.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.885347 [INFO] - GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
2022-09-28T13:48:38.885365 [INFO] - GET https://kubernetes.default/apis/identity.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.886346 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:38.887521 [INFO] - GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:38.887959 [INFO] - GET https://kubernetes.default/apis/networking.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.888758 [INFO] - GET https://kubernetes.default/apis/coordination.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.889212 [INFO] - GET https://kubernetes.default/apis/events.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.980185 [INFO] - GET https://kubernetes.default/apis/policy/v1?timeout=32s 200 OK
2022-09-28T13:48:38.981594 [INFO] - GET https://kubernetes.default/apis/node.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.982572 [INFO] - GET https://kubernetes.default/api/v1?timeout=32s 200 OK
2022-09-28T13:48:38.984052 [INFO] - GET https://kubernetes.default/apis/batch/v1?timeout=32s 200 OK
2022-09-28T13:48:38.984456 [INFO] - GET https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1?timeout=32s 200 OK
2022-09-28T13:48:38.984862 [INFO] - GET https://kubernetes.default/apis/rbac.authorization.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.984924 [INFO] - GET https://kubernetes.default/apis/login.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.985329 [INFO] - GET https://kubernetes.default/apis/config.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.985651 [INFO] - GET https://kubernetes.default/apis/authentication.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.986150 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.986452 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha2?timeout=32s 200 OK
2022-09-28T13:48:39.080254 [INFO] - GET https://kubernetes.default/apis/scheduling.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.081032 [INFO] - GET https://kubernetes.default/apis/admissionregistration.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.081128 [INFO] - GET https://kubernetes.default/apis/apps/v1?timeout=32s 200 OK
2022-09-28T13:48:39.083395 [INFO] - GET https://kubernetes.default/apis/apiextensions.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.084938 [INFO] - GET https://kubernetes.default/apis/storage.k8s.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:39.086317 [INFO] - GET https://kubernetes.default/apis/autoscaling/v2?timeout=32s 200 OK
2022-09-28T13:48:39.088025 [INFO] - GET https://kubernetes.default/apis/autoscaling/v1?timeout=32s 200 OK
2022-09-28T13:48:39.180407 [INFO] - GET https://kubernetes.default/apis/notification.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:39.180475 [INFO] - GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
2022-09-28T13:48:39.181346 [INFO] - GET https://kubernetes.default/apis/certificates.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.182219 [INFO] - GET https://kubernetes.default/apis/authentication.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.183452 [INFO] - GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:39.186020 [INFO] - GET https://kubernetes.default/apis/discovery.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.187938 [INFO] - GET https://kubernetes.default/apis/autoscaling/v2beta2?timeout=32s 200 OK

When kubeapps-apis is configured without pinniped-proxy, response times for those requests are way faster. This might indicate that there is a throttling problem from pinniped-proxy towards the actual K8s API server.

Acceptance criteria

  • Response times for Kubeapps APIs is improved when using pinniped-proxy., to similar times obtained without pinniped-proxy.
@kubeapps-bot kubeapps-bot moved this to 🗂 Backlog in Kubeapps Sep 28, 2022
@castelblanque castelblanque added kind/enhancement An issue that reports an enhancement for an implemented feature rust Pull requests that update Rust code component/pinniped-proxy Issue related to kubeapps integration with pinniped-proxy labels Sep 28, 2022
@castelblanque castelblanque added this to the Technical debt milestone Sep 28, 2022
@ppbaena ppbaena added the next-iteration Issues to be discussed in planning session label Sep 29, 2022
@ppbaena ppbaena moved this from 🗂 Backlog to 🗒 Todo in Kubeapps Oct 10, 2022
@absoludity
Copy link
Contributor

Thanks @castelblanque . Yeah, that's intriguing given that we're caching the credential. I'll take a look.

@absoludity absoludity moved this from 🗒 Todo to 🏗 In Progress in Kubeapps Oct 17, 2022
@absoludity
Copy link
Contributor

absoludity commented Oct 17, 2022

So I started looking at this today - and it seems quite an interesting problem/bug. For what it's worth, I don't think it's related to throttling of requests to the kubernetes-api server from pinniped-proxy, but rather, an issue in the way async requests are handled by pinniped-proxy itself.

In summary, tokio, the asynchronous runtime used with hyper to provide many concurrent tasks per CPU thread (similar to go-routines, just not built in to the language), creates a new task for each request as it arrives, then marks the task as idle when it is waiting for i/o (in our case, a network request for the proxied response). When the network request is completed, it is marked as runnable again, and should be put into the front of the queue (the next-task slot), but seems to be put on the back of the queue.

Let's see this in the detail: I added a few extra logging statements during the proxy operation so we can tell when the request begins, when the id/cert exchange happens and when the actual downstream request begins (or see this gist for logging with trace of tokio's mio crate )

The first three queries are proxied and returned (last being the /apis) and each takes between 30-40ms:

2022-10-17T03:25:43.657974 [INFO] - Beginning proxy request for https://kubernetes.default/api?timeout=32s
2022-10-17T03:25:43.659170 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.659756 [INFO] - Beginning proxy request for https://kubernetes.default/apis/authorization.k8s.io/v1/selfsubjectaccessreviews
2022-10-17T03:25:43.659807 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.682590 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.687346 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.692050 [INFO] - GET https://kubernetes.default/api?timeout=32s 200 OK
2022-10-17T03:25:43.692896 [INFO] - Beginning proxy request for https://kubernetes.default/apis?timeout=32s
2022-10-17T03:25:43.692962 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.701520 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.702344 [INFO] - POST https://kubernetes.default/apis/authorization.k8s.io/v1/selfsubjectaccessreviews 201 Created
2022-10-17T03:25:43.707072 [INFO] - GET https://kubernetes.default/apis?timeout=32s 200 OK

It is this response which, when returned upstream to the kubeapps-apis service (which is building the dynamic client and trying to dynamically determine all the available apis as part of the REST mapper creation), the barrage of requests arrives for all the available APIs. The important point is that each request is creating a new task and it appears that all these tasks are being processed FIFO, apparently not concurrently (?!), without allowing any responses back until all the new request tasks have been marked as idle:

2022-10-17T03:25:43.708332 [INFO] - Beginning proxy request for https://kubernetes.default/api/v1?timeout=32s
2022-10-17T03:25:43.708410 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.716813 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.717562 [INFO] - Beginning proxy request for https://kubernetes.default/apis/events.k8s.io/v1?timeout=32s
2022-10-17T03:25:43.717602 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.726343 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.727344 [INFO] - Beginning proxy request for https://kubernetes.default/apis/coordination.k8s.io/v1?timeout=32s
2022-10-17T03:25:43.727406 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.811001 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.812148 [INFO] - Beginning proxy request for https://kubernetes.default/apis/policy/v1?timeout=32s
2022-10-17T03:25:43.812213 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.824686 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.825403 [INFO] - Beginning proxy request for https://kubernetes.default/apis/events.k8s.io/v1beta1?timeout=32s
2022-10-17T03:25:43.825441 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.913587 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.914525 [INFO] - Beginning proxy request for https://kubernetes.default/apis/rbac.authorization.k8s.io/v1?timeout=32s
2022-10-17T03:25:43.914582 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.923789 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.924695 [INFO] - Beginning proxy request for https://kubernetes.default/apis/login.concierge.pinniped.dev/v1alpha1?timeout=32s
2022-10-17T03:25:43.924740 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.009504 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.010713 [INFO] - Beginning proxy request for https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1beta1?timeout=32s
2022-10-17T03:25:44.010772 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.019752 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.020884 [INFO] - Beginning proxy request for https://kubernetes.default/apis/authentication.concierge.pinniped.dev/v1alpha1?timeout=32s
2022-10-17T03:25:44.020932 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.107160 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.108268 [INFO] - Beginning proxy request for https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s
2022-10-17T03:25:44.108326 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.117917 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.118979 [INFO] - Beginning proxy request for https://kubernetes.default/apis/autoscaling/v2beta2?timeout=32s
2022-10-17T03:25:44.119037 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.204251 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.205376 [INFO] - Beginning proxy request for https://kubernetes.default/apis/networking.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.205427 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.214138 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.215287 [INFO] - Beginning proxy request for https://kubernetes.default/apis/autoscaling/v2beta1?timeout=32s
2022-10-17T03:25:44.215355 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.224067 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.225180 [INFO] - Beginning proxy request for https://kubernetes.default/apis/policy/v1beta1?timeout=32s
2022-10-17T03:25:44.225219 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.309400 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.310438 [INFO] - Beginning proxy request for https://kubernetes.default/apis/config.concierge.pinniped.dev/v1alpha1?timeout=32s
2022-10-17T03:25:44.310487 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.319451 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.320575 [INFO] - Beginning proxy request for https://kubernetes.default/apis/authentication.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.320613 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.405115 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.406333 [INFO] - Beginning proxy request for https://kubernetes.default/apis/identity.concierge.pinniped.dev/v1alpha1?timeout=32s
2022-10-17T03:25:44.406387 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.415010 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.416044 [INFO] - Beginning proxy request for https://kubernetes.default/apis/discovery.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.416089 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.424744 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.426010 [INFO] - Beginning proxy request for https://kubernetes.default/apis/autoscaling/v1?timeout=32s
2022-10-17T03:25:44.501688 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.510537 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.511581 [INFO] - Beginning proxy request for https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha1?timeout=32s
2022-10-17T03:25:44.511631 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.520423 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.521591 [INFO] - Beginning proxy request for https://kubernetes.default/apis/admissionregistration.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.521640 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.606678 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.608375 [INFO] - Beginning proxy request for https://kubernetes.default/apis/apiregistration.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.608436 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.617552 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.618658 [INFO] - Beginning proxy request for https://kubernetes.default/apis/batch/v1?timeout=32s
2022-10-17T03:25:44.618710 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.628218 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.629265 [INFO] - Beginning proxy request for https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1?timeout=32s
2022-10-17T03:25:44.629315 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.711106 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.712410 [INFO] - Beginning proxy request for https://kubernetes.default/apis/discovery.k8s.io/v1beta1?timeout=32s
2022-10-17T03:25:44.712458 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.721111 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.801966 [INFO] - Beginning proxy request for https://kubernetes.default/apis/node.k8s.io/v1beta1?timeout=32s
2022-10-17T03:25:44.802032 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.811083 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.812333 [INFO] - Beginning proxy request for https://kubernetes.default/apis/node.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.812383 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.821086 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.822338 [INFO] - Beginning proxy request for https://kubernetes.default/apis/storage.k8s.io/v1beta1?timeout=32s
2022-10-17T03:25:44.822397 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.907995 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.909164 [INFO] - Beginning proxy request for https://kubernetes.default/apis/apps/v1?timeout=32s
2022-10-17T03:25:44.909213 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.917759 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.918973 [INFO] - Beginning proxy request for https://kubernetes.default/apis/scheduling.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.919030 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.005963 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.007736 [INFO] - Beginning proxy request for https://kubernetes.default/apis/autoscaling/v2?timeout=32s
2022-10-17T03:25:45.007832 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.020117 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.021137 [INFO] - Beginning proxy request for https://kubernetes.default/apis/authorization.k8s.io/v1?timeout=32s
2022-10-17T03:25:45.021178 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.105791 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.107111 [INFO] - Beginning proxy request for https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta1?timeout=32s
2022-10-17T03:25:45.107169 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.115979 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.117122 [INFO] - Beginning proxy request for https://kubernetes.default/apis/storage.k8s.io/v1?timeout=32s
2022-10-17T03:25:45.117171 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.205532 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.207104 [INFO] - Beginning proxy request for https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta2?timeout=32s
2022-10-17T03:25:45.207180 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.222913 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.224620 [INFO] - Beginning proxy request for https://kubernetes.default/apis/certificates.k8s.io/v1?timeout=32s
2022-10-17T03:25:45.224708 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.308181 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.309547 [INFO] - Beginning proxy request for https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta1?timeout=32s
2022-10-17T03:25:45.309600 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.318831 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.320041 [INFO] - Beginning proxy request for https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta2?timeout=32s
2022-10-17T03:25:45.320090 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.405111 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.406595 [INFO] - Beginning proxy request for https://kubernetes.default/apis/notification.toolkit.fluxcd.io/v1beta1?timeout=32s
2022-10-17T03:25:45.406644 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.415655 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.416844 [INFO] - Beginning proxy request for https://kubernetes.default/apis/batch/v1beta1?timeout=32s
2022-10-17T03:25:45.416886 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.425878 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.502684 [INFO] - Beginning proxy request for https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha2?timeout=32s
2022-10-17T03:25:45.502744 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.512051 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.513227 [INFO] - Beginning proxy request for https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta2?timeout=32s
2022-10-17T03:25:45.513272 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.525748 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.602727 [INFO] - Beginning proxy request for https://kubernetes.default/apis/apiextensions.k8s.io/v1?timeout=32s
2022-10-17T03:25:45.602787 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.612447 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.613777 [INFO] - Beginning proxy request for https://kubernetes.default/apis/kubeapps.com/v1alpha1?timeout=32s
2022-10-17T03:25:45.613831 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.623089 [INFO] - Requesting k8s API endpoint with cert

So that began at 43.708, with the "Beginning - Exchanging - Requesting" for each request taking only around 10ms, but no requests are completed until all the concurrent incoming requests have been processed. So at 45.718, a full 2s later, the requests are all returned by the proxy, until 45.902, so over a period of 200ms.

2022-10-17T03:25:45.718840 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.724800 [INFO] - GET https://kubernetes.default/api/v1?timeout=32s 200 OK
2022-10-17T03:25:45.734030 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.804941 [INFO] - GET https://kubernetes.default/apis/apiextensions.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.810390 [INFO] - GET https://kubernetes.default/apis/networking.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.811114 [INFO] - GET https://kubernetes.default/apis/policy/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.813519 [INFO] - GET https://kubernetes.default/apis/autoscaling/v2?timeout=32s 200 OK
2022-10-17T03:25:45.814883 [INFO] - GET https://kubernetes.default/apis/discovery.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.816179 [INFO] - GET https://kubernetes.default/apis/autoscaling/v1?timeout=32s 200 OK
2022-10-17T03:25:45.816962 [INFO] - GET https://kubernetes.default/apis/authentication.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.817745 [INFO] - GET https://kubernetes.default/apis/autoscaling/v2beta1?timeout=32s 200 OK
2022-10-17T03:25:45.818542 [INFO] - GET https://kubernetes.default/apis/autoscaling/v2beta2?timeout=32s 200 OK
2022-10-17T03:25:45.819341 [INFO] - GET https://kubernetes.default/apis/config.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.820121 [INFO] - GET https://kubernetes.default/apis/identity.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.820910 [INFO] - GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.821708 [INFO] - GET https://kubernetes.default/apis/kubeapps.com/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.822515 [INFO] - GET https://kubernetes.default/apis/authentication.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.822631 [INFO] - GET https://kubernetes.default/apis/certificates.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.823474 [INFO] - GET https://kubernetes.default/apis/storage.k8s.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.824299 [INFO] - GET https://kubernetes.default/apis/authorization.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.825144 [INFO] - GET https://kubernetes.default/apis/storage.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.825296 [INFO] - GET https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1?timeout=32s 200 OK
2022-10-17T03:25:45.826143 [INFO] - GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta2?timeout=32s 200 OK
2022-10-17T03:25:45.826982 [INFO] - GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
2022-10-17T03:25:45.829095 [INFO] - GET https://kubernetes.default/apis/policy/v1?timeout=32s 200 OK
2022-10-17T03:25:45.829914 [INFO] - GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.830026 [INFO] - GET https://kubernetes.default/apis/discovery.k8s.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.902639 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha2?timeout=32s 200 OK
2022-10-17T03:25:45.902752 [INFO] - GET https://kubernetes.default/apis/apps/v1?timeout=32s 200 OK
2022-10-17T03:25:45.902949 [INFO] - GET https://kubernetes.default/apis/admissionregistration.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.904062 [INFO] - GET https://kubernetes.default/apis/rbac.authorization.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.904242 [INFO] - GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
2022-10-17T03:25:45.904390 [INFO] - GET https://kubernetes.default/apis/coordination.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.905383 [INFO] - GET https://kubernetes.default/apis/events.k8s.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.906250 [INFO] - GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.907137 [INFO] - GET https://kubernetes.default/apis/scheduling.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.908026 [INFO] - GET https://kubernetes.default/apis/events.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.908946 [INFO] - GET https://kubernetes.default/apis/node.k8s.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.909809 [INFO] - GET https://kubernetes.default/apis/apiregistration.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.910668 [INFO] - GET https://kubernetes.default/apis/login.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.911502 [INFO] - GET https://kubernetes.default/apis/node.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.916386 [INFO] - GET https://kubernetes.default/apis/batch/v1?timeout=32s 200 OK
2022-10-17T03:25:45.917290 [INFO] - GET https://kubernetes.default/apis/notification.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.917425 [INFO] - GET https://kubernetes.default/apis/batch/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.919930 [INFO] - Beginning proxy request for https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1/namespaces/kubeapps/helmreleases
2022-10-17T03:25:45.919980 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:46.004163 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:46.009001 [INFO] - GET https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1/namespaces/kubeapps/helmreleases 200 OK

Read in-depth overview of the tokio scheduler. In particular the addition of the next_task slot sounds like a solution to this exact problem - but I'm not yet sure why we're not seeing the desired behaviour? (ie. when a proxied request is complete, the task should be marked runnable and put in the next_task slot so it can jump the queue and be returned, rather than continuing with the next incoming requests from the queue.

It'll probably be a simple configuration, but I'll continue tomorrow.

@castelblanque
Copy link
Collaborator Author

castelblanque commented Oct 17, 2022

Interesting findings!

When the network request is completed, it is marked as runnable again, and should be put into the front of the queue (the next-task slot), but seems to be put on the back of the queue.

Is that an actual bug in tokio?
In the Java world, sometimes this kind of problems occur when the pool of threads is too small. But if no requests are completed until all the concurrent incoming requests have been processed, it does not seem like related to the pool size.

@absoludity
Copy link
Contributor

Is that an actual bug in tokio?

I doubt it, more likely just a config option or something I've missed, but I'll hopefully find out tomorrow.

In the Java world, sometimes this kind of problems occur when the pool of threads is too small. But if no requests are completed until all the concurrent incoming requests have been processed, it does not seem like related to the pool size.

Yeah - all incoming requests get processed before it moves on to return the results for each, though it confuses me that they seem to be done serially. Interesting anyway :)

@absoludity
Copy link
Contributor

So, summary of findings after creating a minimal working example:

  • There is no issue in tokio (as expected). It is correct that it processes I/O events in a FIFO manner
  • The issue is a combination of a CPU-intensive calculation in the request (I suspect calculating hashes to lookup the cached credential, but will verify) and CPU resource starvation for the service itself, so that actually initiating the requests takes many seconds.

Using my minimal working example, I can reproduce the results seen by explicitly:

  1. Adding a 100ms (or otherwise) CPU-blocking call right before each request is made, and
  2. Restricting the number of CPUs available to the runtime.

If only one of these is done, the queries can still execute quickly.

I'll profile the credential cache access used in pinniped-proxy next.

@absoludity
Copy link
Contributor

...and found it: 2 things to note in the logs below:

  1. The request begins, but after my log message of "Exchanging id_token for..." there must be an unexpected I/O event that causes the task to go idle (even though it should be a cache request) - hence all requests get started before any request actually finishes the exchange and begins "Requesting k8s API endpoint with cert",
  2. Perhaps related, but the calls to call_pinniped are apparently stacking (perhaps because of the sync_writes option I'm using on the cache), starting at 15ms and just growing from there as time passes up to almost 2s per request.

Finally I can now actually try to fix the issue :)

2022-10-19T06:02:32.757Z INFO  pinniped_proxy::service > Beginning proxy request for https://kubernetes.default/apis/autoscaling/v2beta1?timeout=32s
 2022-10-19T06:02:32.757Z INFO  pinniped_proxy::service > Exchanging id_token for X509 client identity using pinniped
 2022-10-19T06:02:32.757Z INFO  pinniped_proxy::service > Beginning proxy request for https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta2?timeout=32s
 2022-10-19T06:02:32.757Z INFO  pinniped_proxy::service > Exchanging id_token for X509 client identity using pinniped
 2022-10-19T06:02:32.760Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 16ms, of which 15ms was call_pinniped
 2022-10-19T06:02:32.765Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:32.765Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 20ms, of which 20ms was call_pinniped
 2022-10-19T06:02:32.835Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:32.836Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 91ms, of which 91ms was call_pinniped
 2022-10-19T06:02:32.860Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:32.860Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 115ms, of which 115ms was call_pinniped
 2022-10-19T06:02:32.945Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:32.945Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 199ms, of which 199ms was call_pinniped
 2022-10-19T06:02:32.953Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:32.953Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 207ms, of which 207ms was call_pinniped
 2022-10-19T06:02:33.039Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.039Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 293ms, of which 293ms was call_pinniped
 2022-10-19T06:02:33.135Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.135Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 388ms, of which 388ms was call_pinniped
 2022-10-19T06:02:33.157Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.157Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 411ms, of which 411ms was call_pinniped
 2022-10-19T06:02:33.255Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.255Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 508ms, of which 508ms was call_pinniped
 2022-10-19T06:02:33.355Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.355Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 608ms, of which 607ms was call_pinniped
 2022-10-19T06:02:33.438Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.438Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 691ms, of which 691ms was call_pinniped
 2022-10-19T06:02:33.444Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.444Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 696ms, of which 696ms was call_pinniped
 2022-10-19T06:02:33.448Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.448Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 700ms, of which 700ms was call_pinniped
 2022-10-19T06:02:33.454Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.454Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 705ms, of which 705ms was call_pinniped
 2022-10-19T06:02:33.539Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.539Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 790ms, of which 790ms was call_pinniped
 2022-10-19T06:02:33.636Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.636Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 887ms, of which 887ms was call_pinniped
 2022-10-19T06:02:33.653Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.653Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 904ms, of which 903ms was call_pinniped
 2022-10-19T06:02:33.760Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.760Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1011ms, of which 1010ms was call_pinniped
 2022-10-19T06:02:33.837Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.838Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1088ms, of which 1088ms was call_pinniped
 2022-10-19T06:02:33.842Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.842Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1092ms, of which 1092ms was call_pinniped
 2022-10-19T06:02:33.847Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.847Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1097ms, of which 1097ms was call_pinniped
 2022-10-19T06:02:33.852Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.852Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1101ms, of which 1101ms was call_pinniped
 2022-10-19T06:02:33.947Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.948Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1197ms, of which 1197ms was call_pinniped
 2022-10-19T06:02:34.035Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.035Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1284ms, of which 1283ms was call_pinniped
 2022-10-19T06:02:34.133Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.134Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1382ms, of which 1382ms was call_pinniped
 2022-10-19T06:02:34.157Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.157Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1405ms, of which 1405ms was call_pinniped
 2022-10-19T06:02:34.238Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.238Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1487ms, of which 1486ms was call_pinniped
 2022-10-19T06:02:34.243Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.243Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1491ms, of which 1491ms was call_pinniped
 2022-10-19T06:02:34.248Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.248Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1495ms, of which 1495ms was call_pinniped
 2022-10-19T06:02:34.253Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.253Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1500ms, of which 1499ms was call_pinniped
 2022-10-19T06:02:34.335Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.336Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1582ms, of which 1582ms was call_pinniped
 2022-10-19T06:02:34.435Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.435Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1681ms, of which 1680ms was call_pinniped
 2022-10-19T06:02:34.452Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.452Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1697ms, of which 1697ms was call_pinniped
 2022-10-19T06:02:34.535Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.535Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1780ms, of which 1780ms was call_pinniped
 2022-10-19T06:02:34.542Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.542Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1787ms, of which 1787ms was call_pinniped
 2022-10-19T06:02:34.549Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.549Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1793ms, of which 1793ms was call_pinniped
 2022-10-19T06:02:34.634Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.634Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1878ms, of which 1878ms was call_pinniped
 2022-10-19T06:02:34.640Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.640Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1883ms, of which 1883ms was call_pinniped
 2022-10-19T06:02:34.645Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.645Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1888ms, of which 1888ms was call_pinniped
 2022-10-19T06:02:34.651Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.651Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1894ms, of which 1893ms was call_pinniped
 2022-10-19T06:02:34.657Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.657Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1899ms, of which 1899ms was call_pinniped
 2022-10-19T06:02:34.752Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.941Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
 2022-10-19T06:02:34.943Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/node.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:34.946Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/config.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
 2022-10-19T06:02:34.948Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/autoscaling/v2beta2?timeout=32s 200 OK
 2022-10-19T06:02:35.036Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/api/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.037Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/login.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
 2022-10-19T06:02:35.040Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/storage.k8s.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.043Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/autoscaling/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.046Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/storage.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.046Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/kubeapps.com/v1alpha1?timeout=32s 200 OK
 2022-10-19T06:02:35.049Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha2?timeout=32s 200 OK
 2022-10-19T06:02:35.050Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/autoscaling/v2?timeout=32s 200 OK
 2022-10-19T06:02:35.052Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/networking.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.053Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/policy/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.056Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/authorization.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.135Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/authentication.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
 2022-10-19T06:02:35.136Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/policy/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.139Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/apiextensions.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.142Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta2?timeout=32s 200 OK
 2022-10-19T06:02:35.145Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/batch/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.148Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/autoscaling/v2beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.152Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/events.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.155Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/discovery.k8s.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.234Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/notification.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.234Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/events.k8s.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.252Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.252Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
 2022-10-19T06:02:35.252Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.253Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/admissionregistration.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.257Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/authentication.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.335Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/batch/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.338Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.341Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/identity.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
 2022-10-19T06:02:35.344Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.347Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/certificates.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.350Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/rbac.authorization.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.351Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/coordination.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.354Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/discovery.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.357Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/apiregistration.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.434Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/node.k8s.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.435Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.436Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha1?timeout=32s 200 OK

@absoludity
Copy link
Contributor

And the winner is the sync_writes option to the cached macro: I had misunderstood what it does, turns out it actually causes each call to try to acquire a write-lock before executing the function, so the first query gets the write-lock, then every other following query gets I/O blocked at this point while trying to get the credential until the previous one releases it.

Basically forming a queue to sequentially get the credential before each can make the proxied request.

Without sync_writes = true it works as expected.

@castelblanque
Copy link
Collaborator Author

And the winner is the sync_writes option to the cached macro

Awesome work Michael. So the write-lock was blocking all subsequent requests.

This is something that still amazes me from programming: such a simple fix, but soo hard to pinpoint 😄

@ppbaena
Copy link
Collaborator

ppbaena commented Oct 19, 2022

Awesome finding Michael!

@absoludity
Copy link
Contributor

Thanks - it was a lot of fun to track down (good chance to learn quite a bit). I just had to share here because I'm working all on my own in my TZ with no-one to high-five with :)

@absoludity
Copy link
Contributor

Just an update on this: as mentioned on the linked PR, the issue that I found last week (mentioned above) fixes one aspect, but the requests still takes way too long. I've spent the past couple of days digging to find the issue and finding that

  1. the call to the actual cached function is sometimes a few ms, other times 100ms:
2022-10-23T22:31:22.999358 [INFO] - prep_and_call took 4ms, 4 of which was call_pinniped
2022-10-23T22:31:23.098464 [INFO] - prep_and_call took 94ms, 94 of which was call_pinniped
2022-10-23T22:31:23.292432 [INFO] - prep_and_call took 96ms, 96 of which was call_pinniped
2022-10-23T22:31:23.484841 [INFO] - prep_and_call took 94ms, 94 of which was call_pinniped
2022-10-23T22:31:23.600418 [INFO] - prep_and_call took 18ms, 18 of which was call_pinniped
2022-10-23T22:31:23.782121 [INFO] - prep_and_call took 93ms, 93 of which was call_pinniped
2022-10-23T22:31:23.792623 [INFO] - prep_and_call took 4ms, 4 of which was call_pinniped
  1. The actual cache lookup is less than a ms, consistently.
  2. The longer durations appear to be CPU starvation - but why?

I didn't have much success finding out how the above could be true until this morning, when realising that because the cached function call_pinniped is called asynchronously, regardless of whether the result is cached or not. So even though the value is cached after the first call, every request going through pinniped gets slept when .await is called (it's an i/o wait), with the task manager switching to a different task (rather than just pulling the value out of the cache) and so is dependent on the task being enabled again by the task manager.

At least, this is what I will next check, though I need to take a break from this to get some other things done (hence dumping thoughts here). The approach will be, rather than decorating the async function with the cached macro, instead, having a cache available and only calling the async function on a cache miss. This should avoid sleeping the task unnecessarily.

absoludity added a commit that referenced this issue Oct 24, 2022
Signed-off-by: Michael Nelson <minelson@vmware.com>

<!--
Before you open the request please review the following guidelines and
tips to help it be more easily integrated:

 - Describe the scope of your change - i.e. what the change does.
 - Describe any known limitations with your change.
- Please run any tests or examples that can exercise your modified code.

 Thank you for contributing!
 -->

### Description of the change

This PR is the result of the investigation described on #5407 , the end
result of which is a single line removal, though I've left some other
changes that were helpful during debugging (logging related).

### Benefits

Pinniped-proxy does not stack all requests behind a write lock when called concurrently.

### Possible drawbacks

None.
<!-- Describe any known limitations with your change -->

### Applicable issues

<!-- Enter any applicable Issues here (You can reference an issue using
#) -->

- ref #5407

### Additional information

It does not solve the slow request when running in a local kind cluster, as each task is sleeping while accessing the cache, leading to anywhere between 4 and 100ms just to get the cached response.

Signed-off-by: Michael Nelson <minelson@vmware.com>
Repository owner moved this from 🏗 In Progress to ✅ Done in Kubeapps Oct 24, 2022
@absoludity absoludity reopened this Oct 24, 2022
Repository owner moved this from ✅ Done to 🗒 Todo in Kubeapps Oct 24, 2022
@absoludity absoludity moved this from 🗒 Todo to 🏗 In Progress in Kubeapps Oct 27, 2022
@absoludity absoludity moved this from 🏗 In Progress to 🔎 In Review in Kubeapps Oct 28, 2022
absoludity added a commit that referenced this issue Nov 18, 2022
Signed-off-by: Michael Nelson <minelson@vmware.com>

<!--
Before you open the request please review the following guidelines and
tips to help it be more easily integrated:

 - Describe the scope of your change - i.e. what the change does.
 - Describe any known limitations with your change.
- Please run any tests or examples that can exercise your modified code.

 Thank you for contributing!
 -->

Follows on from #5518, this time replacing the `cached` package with a
custom credential cache.

### Description of the change

After further digging, I found that one cause of the slow handling of 50
concurrent requests going through the pinniped-proxy was that:
- We were caching a function with an async/await signature which means
that even the cached version must have that signature as well - which
means a blocking i/o call (which switches the task), and
- The `Cached` trait specifies that even a `cache_get` operation mutates
the cache (in our case, just for statistics of hits/misses), which, as a
result, requires acquiring a *write* lock to the cache to read a cached
value.

For more details, please see the [discussion with the `Cached`
author](jaemk/cached#133).

To avoid both of those issues, this PR:
1. Adds a `cache` module that provides a generic read/write
`LockableCache` (for multiple readers, single writer) and builds on that
with a `PruningCache` that will prune entries (given a test function)
when they should no longer be cached,
2. Uses (1) to create a single `CredentialCache` on startup (in
`main.rs`) specifically for caching `TokenCredentialRequest` objects and
pruning expired entries, and then passes this through for use in
different threads concurrently.
3. Uses the cache to fetch the credentials.
<!-- Describe the scope of your change - i.e. what the change does. -->

### Benefits

Fetching from the cache is now non-blocking (generally, except when an
entry is being added) and so leads to less task switching, improving the
total query time by ~2s (down to 3-4).

There is still something else using significant CPU when creating the
client itself (cert-related), which I'm investigating now in a separate
PR.
<!-- What benefits will be realized by the code change? -->

### Possible drawbacks

<!-- Describe any known limitations with your change -->

### Applicable issues

<!-- Enter any applicable Issues here (You can reference an issue using
#) -->

- Ref #5407 

### Additional information

<!-- If there's anything else that's important and relevant to your pull
request, mention that information here.-->

Example log when using `RUST_LOG=info,pinniped_proxy::pinniped=debug`
which shows the cache being used after the first request. I've not
included it in the output generally, but the cache get is now always
under a millisecond. As above, the significant delays (some calls to
prepare_and_call_pinniped_exchange only 4ms, others 98ms) are what I'll
look at next.

```
2022-10-27T01:42:47.820245 [INFO] - Listening on http://0.0.0.0:3333
2022-10-27T01:43:05.077116 [DEBUG] - prepare_and_call_pinniped_exchange took 17ms. Used cache?: false
2022-10-27T01:43:05.085273 [INFO] - GET https://kubernetes.default/api?timeout=32s 200 OK
2022-10-27T01:43:05.091663 [DEBUG] - prepare_and_call_pinniped_exchange took 5ms. Used cache?: true
2022-10-27T01:43:05.100437 [INFO] - GET https://kubernetes.default/apis?timeout=32s 200 OK
2022-10-27T01:43:05.106005 [DEBUG] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true
2022-10-27T01:43:05.209952 [DEBUG] - prepare_and_call_pinniped_exchange took 21ms. Used cache?: true
2022-10-27T01:43:05.299424 [DEBUG] - prepare_and_call_pinniped_exchange took 5ms. Used cache?: true
2022-10-27T01:43:05.311599 [DEBUG] - prepare_and_call_pinniped_exchange took 5ms. Used cache?: true
2022-10-27T01:43:05.493269 [DEBUG] - prepare_and_call_pinniped_exchange took 98ms. Used cache?: true
2022-10-27T01:43:05.593683 [DEBUG] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true
2022-10-27T01:43:05.604348 [DEBUG] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true
2022-10-27T01:43:05.697828 [DEBUG] - prepare_and_call_pinniped_exchange took 87ms. Used cache?: true
2022-10-27T01:43:05.811590 [DEBUG] - prepare_and_call_pinniped_exchange took 20ms. Used cache?: true
2022-10-27T01:43:06.004358 [DEBUG] - prepare_and_call_pinniped_exchange took 94ms. Used cache?: true
2022-10-27T01:43:06.098603 [DEBUG] - prepare_and_call_pinniped_exchange took 5ms. Used cache?: true
2022-10-27T01:43:06.108756 [DEBUG] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true

```

Signed-off-by: Michael Nelson <minelson@vmware.com>
Repository owner moved this from 🔎 In Review to ✅ Done in Kubeapps Nov 21, 2022
absoludity added a commit that referenced this issue Nov 21, 2022
Signed-off-by: Michael Nelson <minelson@vmware.com>

<!--
Before you open the request please review the following guidelines and
tips to help it be more easily integrated:

 - Describe the scope of your change - i.e. what the change does.
 - Describe any known limitations with your change.
- Please run any tests or examples that can exercise your modified code.

 Thank you for contributing!
 -->

Reduces the proxying of the ~50 API calls (when the go-lang
kubeapps-apis initialises its REST mapper) from 4-6s down to a
consistent 1.3-1.6s (on my single computer running a cluster including
pinniped-proxy).

### Description of the change

<!-- Describe the scope of your change - i.e. what the change does. -->

After more profiling, the remaining offender was the call to create a
`kube::Client` which ranged from 4-100ms (of CPU + interrupts - not sure
if there was any I/O) for every single request. So when 50 requests are
issued concurrently, it was starving the CPU available (on my single
laptop running a cluster etc.).

~~The call to create the `kube::Client` is dependent on the target k8s
api server only, so it's easy to cache as the number of different
results is equal to the number of clusters targeted by Kubeapps.~~ EDIT:
Actually, I just realised we may not even need to cache it, since we can
simple move where it's evaluated so we only evaluate it when the token
was not cached. I'll try that before marking ready.

### Benefits

<!-- What benefits will be realized by the code change? -->

Pinniped proxy handles the 50 concurrent request/responses in a more
reasonable time.

### Possible drawbacks

<!-- Describe any known limitations with your change -->

### Applicable issues

<!-- Enter any applicable Issues here (You can reference an issue using
#) -->

- fixes #5407 (finally)

### Additional information

<!-- If there's anything else that's important and relevant to your pull
request, mention that information here.-->

Without this change, the main cause of CPU was the call to
`get_client_config`, which then delays all the proxy requests starting:

```
2022-10-28T01:53:54.191216 [TRACE] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true. 4ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred.
2022-10-28T01:53:54.291704 [TRACE] - prepare_and_call_pinniped_exchange took 22ms. Used cache?: true. 22ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred.
2022-10-28T01:53:54.470193 [TRACE] - prepare_and_call_pinniped_exchange took 84ms. Used cache?: true. 84ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred.
2022-10-28T01:53:54.585938 [TRACE] - prepare_and_call_pinniped_exchange took 14ms. Used cache?: true. 14ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred.
2022-10-28T01:53:54.674279 [TRACE] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true. 4ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred.
2022-10-28T01:53:54.683590 [TRACE] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true. 4ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred.
2022-10-28T01:53:54.769326 [TRACE] - prepare_and_call_pinniped_exchange took 80ms. Used cache?: true. 80ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred.

```

After the change the call is effectively zero.

```
2022-10-28T05:28:18.635146 [TRACE] - prepare_and_call_pinniped_exchange took 0ms. Used cache?: true. 0ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred.
2022-10-28T05:28:18.635306 [TRACE] - prepare_and_call_pinniped_exchange took 0ms. Used cache?: true. 0ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred.
2022-10-28T05:28:18.688264 [TRACE] - prepare_and_call_pinniped_exchange took 0ms. Used cache?: true. 0ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred.
2022-10-28T05:28:18.688497 [TRACE] - prepare_and_call_pinniped_exchange took 0ms. Used cache?: true. 0ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred.
2022-10-28T05:28:18.692989 [TRACE] - prepare_and_call_pinniped_exchange took 0ms. Used cache?: true. 0ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred.

```

Signed-off-by: Michael Nelson <minelson@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/pinniped-proxy Issue related to kubeapps integration with pinniped-proxy kind/enhancement An issue that reports an enhancement for an implemented feature rust Pull requests that update Rust code
Projects
Archived in project
3 participants