Improve response times in Kubeapps APIs when using Pinniped-proxy #5407

castelblanque · 2022-09-28T15:28:54Z

Summary
Some requests (e.g. core.packages.v1alpha1.PackagesService.GetAvailablePackageSummaries) are slow when using pinniped-proxy. Investigate and improve those times.

Description
After work done for #5362 and #4919, response times have improved due to not creating unused multiple K8s Go clients.
However, there is still what seems to be a throttling problem when using pinniped-proxy. Multiple requests are done from kubeapps-apis to K8s API (with correct QPS and Burst values), but requests go through pinniped-proxy.
Log from the proxy:

2022-09-28T13:48:38.784778 [INFO] - GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta2?timeout=32s 200 OK
2022-09-28T13:48:38.787947 [INFO] - GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:38.788917 [INFO] - GET https://kubernetes.default/apis/storage.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.789350 [INFO] - GET https://kubernetes.default/apis/kubeapps.com/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.880150 [INFO] - GET https://kubernetes.default/apis/authorization.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.880827 [INFO] - GET https://kubernetes.default/apis/apiregistration.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.885347 [INFO] - GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
2022-09-28T13:48:38.885365 [INFO] - GET https://kubernetes.default/apis/identity.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.886346 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:38.887521 [INFO] - GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:38.887959 [INFO] - GET https://kubernetes.default/apis/networking.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.888758 [INFO] - GET https://kubernetes.default/apis/coordination.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.889212 [INFO] - GET https://kubernetes.default/apis/events.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.980185 [INFO] - GET https://kubernetes.default/apis/policy/v1?timeout=32s 200 OK
2022-09-28T13:48:38.981594 [INFO] - GET https://kubernetes.default/apis/node.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.982572 [INFO] - GET https://kubernetes.default/api/v1?timeout=32s 200 OK
2022-09-28T13:48:38.984052 [INFO] - GET https://kubernetes.default/apis/batch/v1?timeout=32s 200 OK
2022-09-28T13:48:38.984456 [INFO] - GET https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1?timeout=32s 200 OK
2022-09-28T13:48:38.984862 [INFO] - GET https://kubernetes.default/apis/rbac.authorization.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:38.984924 [INFO] - GET https://kubernetes.default/apis/login.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.985329 [INFO] - GET https://kubernetes.default/apis/config.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.985651 [INFO] - GET https://kubernetes.default/apis/authentication.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.986150 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha1?timeout=32s 200 OK
2022-09-28T13:48:38.986452 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha2?timeout=32s 200 OK
2022-09-28T13:48:39.080254 [INFO] - GET https://kubernetes.default/apis/scheduling.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.081032 [INFO] - GET https://kubernetes.default/apis/admissionregistration.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.081128 [INFO] - GET https://kubernetes.default/apis/apps/v1?timeout=32s 200 OK
2022-09-28T13:48:39.083395 [INFO] - GET https://kubernetes.default/apis/apiextensions.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.084938 [INFO] - GET https://kubernetes.default/apis/storage.k8s.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:39.086317 [INFO] - GET https://kubernetes.default/apis/autoscaling/v2?timeout=32s 200 OK
2022-09-28T13:48:39.088025 [INFO] - GET https://kubernetes.default/apis/autoscaling/v1?timeout=32s 200 OK
2022-09-28T13:48:39.180407 [INFO] - GET https://kubernetes.default/apis/notification.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:39.180475 [INFO] - GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
2022-09-28T13:48:39.181346 [INFO] - GET https://kubernetes.default/apis/certificates.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.182219 [INFO] - GET https://kubernetes.default/apis/authentication.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.183452 [INFO] - GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-09-28T13:48:39.186020 [INFO] - GET https://kubernetes.default/apis/discovery.k8s.io/v1?timeout=32s 200 OK
2022-09-28T13:48:39.187938 [INFO] - GET https://kubernetes.default/apis/autoscaling/v2beta2?timeout=32s 200 OK

When kubeapps-apis is configured without pinniped-proxy, response times for those requests are way faster. This might indicate that there is a throttling problem from pinniped-proxy towards the actual K8s API server.

Acceptance criteria

Response times for Kubeapps APIs is improved when using pinniped-proxy., to similar times obtained without pinniped-proxy.

The text was updated successfully, but these errors were encountered:

absoludity · 2022-10-12T04:43:08Z

Thanks @castelblanque . Yeah, that's intriguing given that we're caching the credential. I'll take a look.

absoludity · 2022-10-17T06:42:05Z

So I started looking at this today - and it seems quite an interesting problem/bug. For what it's worth, I don't think it's related to throttling of requests to the kubernetes-api server from pinniped-proxy, but rather, an issue in the way async requests are handled by pinniped-proxy itself.

In summary, tokio, the asynchronous runtime used with hyper to provide many concurrent tasks per CPU thread (similar to go-routines, just not built in to the language), creates a new task for each request as it arrives, then marks the task as idle when it is waiting for i/o (in our case, a network request for the proxied response). When the network request is completed, it is marked as runnable again, and should be put into the front of the queue (the next-task slot), but seems to be put on the back of the queue.

Let's see this in the detail: I added a few extra logging statements during the proxy operation so we can tell when the request begins, when the id/cert exchange happens and when the actual downstream request begins (or see this gist for logging with trace of tokio's mio crate )

The first three queries are proxied and returned (last being the /apis) and each takes between 30-40ms:

2022-10-17T03:25:43.657974 [INFO] - Beginning proxy request for https://kubernetes.default/api?timeout=32s
2022-10-17T03:25:43.659170 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.659756 [INFO] - Beginning proxy request for https://kubernetes.default/apis/authorization.k8s.io/v1/selfsubjectaccessreviews
2022-10-17T03:25:43.659807 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.682590 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.687346 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.692050 [INFO] - GET https://kubernetes.default/api?timeout=32s 200 OK
2022-10-17T03:25:43.692896 [INFO] - Beginning proxy request for https://kubernetes.default/apis?timeout=32s
2022-10-17T03:25:43.692962 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.701520 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.702344 [INFO] - POST https://kubernetes.default/apis/authorization.k8s.io/v1/selfsubjectaccessreviews 201 Created
2022-10-17T03:25:43.707072 [INFO] - GET https://kubernetes.default/apis?timeout=32s 200 OK

It is this response which, when returned upstream to the kubeapps-apis service (which is building the dynamic client and trying to dynamically determine all the available apis as part of the REST mapper creation), the barrage of requests arrives for all the available APIs. The important point is that each request is creating a new task and it appears that all these tasks are being processed FIFO, apparently not concurrently (?!), without allowing any responses back until all the new request tasks have been marked as idle:

2022-10-17T03:25:43.708332 [INFO] - Beginning proxy request for https://kubernetes.default/api/v1?timeout=32s
2022-10-17T03:25:43.708410 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.716813 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.717562 [INFO] - Beginning proxy request for https://kubernetes.default/apis/events.k8s.io/v1?timeout=32s
2022-10-17T03:25:43.717602 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.726343 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.727344 [INFO] - Beginning proxy request for https://kubernetes.default/apis/coordination.k8s.io/v1?timeout=32s
2022-10-17T03:25:43.727406 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.811001 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.812148 [INFO] - Beginning proxy request for https://kubernetes.default/apis/policy/v1?timeout=32s
2022-10-17T03:25:43.812213 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.824686 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.825403 [INFO] - Beginning proxy request for https://kubernetes.default/apis/events.k8s.io/v1beta1?timeout=32s
2022-10-17T03:25:43.825441 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.913587 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.914525 [INFO] - Beginning proxy request for https://kubernetes.default/apis/rbac.authorization.k8s.io/v1?timeout=32s
2022-10-17T03:25:43.914582 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:43.923789 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:43.924695 [INFO] - Beginning proxy request for https://kubernetes.default/apis/login.concierge.pinniped.dev/v1alpha1?timeout=32s
2022-10-17T03:25:43.924740 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.009504 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.010713 [INFO] - Beginning proxy request for https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1beta1?timeout=32s
2022-10-17T03:25:44.010772 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.019752 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.020884 [INFO] - Beginning proxy request for https://kubernetes.default/apis/authentication.concierge.pinniped.dev/v1alpha1?timeout=32s
2022-10-17T03:25:44.020932 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.107160 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.108268 [INFO] - Beginning proxy request for https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s
2022-10-17T03:25:44.108326 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.117917 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.118979 [INFO] - Beginning proxy request for https://kubernetes.default/apis/autoscaling/v2beta2?timeout=32s
2022-10-17T03:25:44.119037 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.204251 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.205376 [INFO] - Beginning proxy request for https://kubernetes.default/apis/networking.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.205427 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.214138 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.215287 [INFO] - Beginning proxy request for https://kubernetes.default/apis/autoscaling/v2beta1?timeout=32s
2022-10-17T03:25:44.215355 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.224067 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.225180 [INFO] - Beginning proxy request for https://kubernetes.default/apis/policy/v1beta1?timeout=32s
2022-10-17T03:25:44.225219 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.309400 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.310438 [INFO] - Beginning proxy request for https://kubernetes.default/apis/config.concierge.pinniped.dev/v1alpha1?timeout=32s
2022-10-17T03:25:44.310487 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.319451 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.320575 [INFO] - Beginning proxy request for https://kubernetes.default/apis/authentication.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.320613 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.405115 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.406333 [INFO] - Beginning proxy request for https://kubernetes.default/apis/identity.concierge.pinniped.dev/v1alpha1?timeout=32s
2022-10-17T03:25:44.406387 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.415010 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.416044 [INFO] - Beginning proxy request for https://kubernetes.default/apis/discovery.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.416089 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.424744 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.426010 [INFO] - Beginning proxy request for https://kubernetes.default/apis/autoscaling/v1?timeout=32s
2022-10-17T03:25:44.501688 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.510537 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.511581 [INFO] - Beginning proxy request for https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha1?timeout=32s
2022-10-17T03:25:44.511631 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.520423 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.521591 [INFO] - Beginning proxy request for https://kubernetes.default/apis/admissionregistration.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.521640 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.606678 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.608375 [INFO] - Beginning proxy request for https://kubernetes.default/apis/apiregistration.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.608436 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.617552 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.618658 [INFO] - Beginning proxy request for https://kubernetes.default/apis/batch/v1?timeout=32s
2022-10-17T03:25:44.618710 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.628218 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.629265 [INFO] - Beginning proxy request for https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1?timeout=32s
2022-10-17T03:25:44.629315 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.711106 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.712410 [INFO] - Beginning proxy request for https://kubernetes.default/apis/discovery.k8s.io/v1beta1?timeout=32s
2022-10-17T03:25:44.712458 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.721111 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.801966 [INFO] - Beginning proxy request for https://kubernetes.default/apis/node.k8s.io/v1beta1?timeout=32s
2022-10-17T03:25:44.802032 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.811083 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.812333 [INFO] - Beginning proxy request for https://kubernetes.default/apis/node.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.812383 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.821086 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.822338 [INFO] - Beginning proxy request for https://kubernetes.default/apis/storage.k8s.io/v1beta1?timeout=32s
2022-10-17T03:25:44.822397 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.907995 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.909164 [INFO] - Beginning proxy request for https://kubernetes.default/apis/apps/v1?timeout=32s
2022-10-17T03:25:44.909213 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:44.917759 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:44.918973 [INFO] - Beginning proxy request for https://kubernetes.default/apis/scheduling.k8s.io/v1?timeout=32s
2022-10-17T03:25:44.919030 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.005963 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.007736 [INFO] - Beginning proxy request for https://kubernetes.default/apis/autoscaling/v2?timeout=32s
2022-10-17T03:25:45.007832 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.020117 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.021137 [INFO] - Beginning proxy request for https://kubernetes.default/apis/authorization.k8s.io/v1?timeout=32s
2022-10-17T03:25:45.021178 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.105791 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.107111 [INFO] - Beginning proxy request for https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta1?timeout=32s
2022-10-17T03:25:45.107169 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.115979 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.117122 [INFO] - Beginning proxy request for https://kubernetes.default/apis/storage.k8s.io/v1?timeout=32s
2022-10-17T03:25:45.117171 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.205532 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.207104 [INFO] - Beginning proxy request for https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta2?timeout=32s
2022-10-17T03:25:45.207180 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.222913 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.224620 [INFO] - Beginning proxy request for https://kubernetes.default/apis/certificates.k8s.io/v1?timeout=32s
2022-10-17T03:25:45.224708 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.308181 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.309547 [INFO] - Beginning proxy request for https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta1?timeout=32s
2022-10-17T03:25:45.309600 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.318831 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.320041 [INFO] - Beginning proxy request for https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta2?timeout=32s
2022-10-17T03:25:45.320090 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.405111 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.406595 [INFO] - Beginning proxy request for https://kubernetes.default/apis/notification.toolkit.fluxcd.io/v1beta1?timeout=32s
2022-10-17T03:25:45.406644 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.415655 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.416844 [INFO] - Beginning proxy request for https://kubernetes.default/apis/batch/v1beta1?timeout=32s
2022-10-17T03:25:45.416886 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.425878 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.502684 [INFO] - Beginning proxy request for https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha2?timeout=32s
2022-10-17T03:25:45.502744 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.512051 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.513227 [INFO] - Beginning proxy request for https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta2?timeout=32s
2022-10-17T03:25:45.513272 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.525748 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.602727 [INFO] - Beginning proxy request for https://kubernetes.default/apis/apiextensions.k8s.io/v1?timeout=32s
2022-10-17T03:25:45.602787 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.612447 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:45.613777 [INFO] - Beginning proxy request for https://kubernetes.default/apis/kubeapps.com/v1alpha1?timeout=32s
2022-10-17T03:25:45.613831 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:45.623089 [INFO] - Requesting k8s API endpoint with cert

So that began at 43.708, with the "Beginning - Exchanging - Requesting" for each request taking only around 10ms, but no requests are completed until all the concurrent incoming requests have been processed. So at 45.718, a full 2s later, the requests are all returned by the proxy, until 45.902, so over a period of 200ms.

2022-10-17T03:25:45.718840 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.724800 [INFO] - GET https://kubernetes.default/api/v1?timeout=32s 200 OK
2022-10-17T03:25:45.734030 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.804941 [INFO] - GET https://kubernetes.default/apis/apiextensions.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.810390 [INFO] - GET https://kubernetes.default/apis/networking.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.811114 [INFO] - GET https://kubernetes.default/apis/policy/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.813519 [INFO] - GET https://kubernetes.default/apis/autoscaling/v2?timeout=32s 200 OK
2022-10-17T03:25:45.814883 [INFO] - GET https://kubernetes.default/apis/discovery.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.816179 [INFO] - GET https://kubernetes.default/apis/autoscaling/v1?timeout=32s 200 OK
2022-10-17T03:25:45.816962 [INFO] - GET https://kubernetes.default/apis/authentication.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.817745 [INFO] - GET https://kubernetes.default/apis/autoscaling/v2beta1?timeout=32s 200 OK
2022-10-17T03:25:45.818542 [INFO] - GET https://kubernetes.default/apis/autoscaling/v2beta2?timeout=32s 200 OK
2022-10-17T03:25:45.819341 [INFO] - GET https://kubernetes.default/apis/config.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.820121 [INFO] - GET https://kubernetes.default/apis/identity.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.820910 [INFO] - GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.821708 [INFO] - GET https://kubernetes.default/apis/kubeapps.com/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.822515 [INFO] - GET https://kubernetes.default/apis/authentication.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.822631 [INFO] - GET https://kubernetes.default/apis/certificates.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.823474 [INFO] - GET https://kubernetes.default/apis/storage.k8s.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.824299 [INFO] - GET https://kubernetes.default/apis/authorization.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.825144 [INFO] - GET https://kubernetes.default/apis/storage.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.825296 [INFO] - GET https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1?timeout=32s 200 OK
2022-10-17T03:25:45.826143 [INFO] - GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta2?timeout=32s 200 OK
2022-10-17T03:25:45.826982 [INFO] - GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
2022-10-17T03:25:45.829095 [INFO] - GET https://kubernetes.default/apis/policy/v1?timeout=32s 200 OK
2022-10-17T03:25:45.829914 [INFO] - GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.830026 [INFO] - GET https://kubernetes.default/apis/discovery.k8s.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.902639 [INFO] - GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha2?timeout=32s 200 OK
2022-10-17T03:25:45.902752 [INFO] - GET https://kubernetes.default/apis/apps/v1?timeout=32s 200 OK
2022-10-17T03:25:45.902949 [INFO] - GET https://kubernetes.default/apis/admissionregistration.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.904062 [INFO] - GET https://kubernetes.default/apis/rbac.authorization.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.904242 [INFO] - GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
2022-10-17T03:25:45.904390 [INFO] - GET https://kubernetes.default/apis/coordination.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.905383 [INFO] - GET https://kubernetes.default/apis/events.k8s.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.906250 [INFO] - GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.907137 [INFO] - GET https://kubernetes.default/apis/scheduling.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.908026 [INFO] - GET https://kubernetes.default/apis/events.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.908946 [INFO] - GET https://kubernetes.default/apis/node.k8s.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.909809 [INFO] - GET https://kubernetes.default/apis/apiregistration.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.910668 [INFO] - GET https://kubernetes.default/apis/login.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
2022-10-17T03:25:45.911502 [INFO] - GET https://kubernetes.default/apis/node.k8s.io/v1?timeout=32s 200 OK
2022-10-17T03:25:45.916386 [INFO] - GET https://kubernetes.default/apis/batch/v1?timeout=32s 200 OK
2022-10-17T03:25:45.917290 [INFO] - GET https://kubernetes.default/apis/notification.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.917425 [INFO] - GET https://kubernetes.default/apis/batch/v1beta1?timeout=32s 200 OK
2022-10-17T03:25:45.919930 [INFO] - Beginning proxy request for https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1/namespaces/kubeapps/helmreleases
2022-10-17T03:25:45.919980 [INFO] - Exchanging id_token for X509 client identity using pinniped
2022-10-17T03:25:46.004163 [INFO] - Requesting k8s API endpoint with cert
2022-10-17T03:25:46.009001 [INFO] - GET https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1/namespaces/kubeapps/helmreleases 200 OK

Read in-depth overview of the tokio scheduler. In particular the addition of the next_task slot sounds like a solution to this exact problem - but I'm not yet sure why we're not seeing the desired behaviour? (ie. when a proxied request is complete, the task should be marked runnable and put in the next_task slot so it can jump the queue and be returned, rather than continuing with the next incoming requests from the queue.

It'll probably be a simple configuration, but I'll continue tomorrow.

castelblanque · 2022-10-17T06:50:52Z

Interesting findings!

When the network request is completed, it is marked as runnable again, and should be put into the front of the queue (the next-task slot), but seems to be put on the back of the queue.

Is that an actual bug in tokio?
In the Java world, sometimes this kind of problems occur when the pool of threads is too small. But if no requests are completed until all the concurrent incoming requests have been processed, it does not seem like related to the pool size.

absoludity · 2022-10-17T07:14:58Z

Is that an actual bug in tokio?

I doubt it, more likely just a config option or something I've missed, but I'll hopefully find out tomorrow.

In the Java world, sometimes this kind of problems occur when the pool of threads is too small. But if no requests are completed until all the concurrent incoming requests have been processed, it does not seem like related to the pool size.

Yeah - all incoming requests get processed before it moves on to return the results for each, though it confuses me that they seem to be done serially. Interesting anyway :)

absoludity · 2022-10-19T04:28:38Z

So, summary of findings after creating a minimal working example:

There is no issue in tokio (as expected). It is correct that it processes I/O events in a FIFO manner
The issue is a combination of a CPU-intensive calculation in the request (I suspect calculating hashes to lookup the cached credential, but will verify) and CPU resource starvation for the service itself, so that actually initiating the requests takes many seconds.

Using my minimal working example, I can reproduce the results seen by explicitly:

Adding a 100ms (or otherwise) CPU-blocking call right before each request is made, and
Restricting the number of CPUs available to the runtime.

If only one of these is done, the queries can still execute quickly.

I'll profile the credential cache access used in pinniped-proxy next.

absoludity · 2022-10-19T06:20:12Z

...and found it: 2 things to note in the logs below:

The request begins, but after my log message of "Exchanging id_token for..." there must be an unexpected I/O event that causes the task to go idle (even though it should be a cache request) - hence all requests get started before any request actually finishes the exchange and begins "Requesting k8s API endpoint with cert",
Perhaps related, but the calls to call_pinniped are apparently stacking (perhaps because of the sync_writes option I'm using on the cache), starting at 15ms and just growing from there as time passes up to almost 2s per request.

Finally I can now actually try to fix the issue :)

2022-10-19T06:02:32.757Z INFO  pinniped_proxy::service > Beginning proxy request for https://kubernetes.default/apis/autoscaling/v2beta1?timeout=32s
 2022-10-19T06:02:32.757Z INFO  pinniped_proxy::service > Exchanging id_token for X509 client identity using pinniped
 2022-10-19T06:02:32.757Z INFO  pinniped_proxy::service > Beginning proxy request for https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta2?timeout=32s
 2022-10-19T06:02:32.757Z INFO  pinniped_proxy::service > Exchanging id_token for X509 client identity using pinniped
 2022-10-19T06:02:32.760Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 16ms, of which 15ms was call_pinniped
 2022-10-19T06:02:32.765Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:32.765Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 20ms, of which 20ms was call_pinniped
 2022-10-19T06:02:32.835Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:32.836Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 91ms, of which 91ms was call_pinniped
 2022-10-19T06:02:32.860Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:32.860Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 115ms, of which 115ms was call_pinniped
 2022-10-19T06:02:32.945Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:32.945Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 199ms, of which 199ms was call_pinniped
 2022-10-19T06:02:32.953Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:32.953Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 207ms, of which 207ms was call_pinniped
 2022-10-19T06:02:33.039Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.039Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 293ms, of which 293ms was call_pinniped
 2022-10-19T06:02:33.135Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.135Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 388ms, of which 388ms was call_pinniped
 2022-10-19T06:02:33.157Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.157Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 411ms, of which 411ms was call_pinniped
 2022-10-19T06:02:33.255Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.255Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 508ms, of which 508ms was call_pinniped
 2022-10-19T06:02:33.355Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.355Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 608ms, of which 607ms was call_pinniped
 2022-10-19T06:02:33.438Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.438Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 691ms, of which 691ms was call_pinniped
 2022-10-19T06:02:33.444Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.444Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 696ms, of which 696ms was call_pinniped
 2022-10-19T06:02:33.448Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.448Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 700ms, of which 700ms was call_pinniped
 2022-10-19T06:02:33.454Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.454Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 705ms, of which 705ms was call_pinniped
 2022-10-19T06:02:33.539Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.539Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 790ms, of which 790ms was call_pinniped
 2022-10-19T06:02:33.636Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.636Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 887ms, of which 887ms was call_pinniped
 2022-10-19T06:02:33.653Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.653Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 904ms, of which 903ms was call_pinniped
 2022-10-19T06:02:33.760Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.760Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1011ms, of which 1010ms was call_pinniped
 2022-10-19T06:02:33.837Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.838Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1088ms, of which 1088ms was call_pinniped
 2022-10-19T06:02:33.842Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.842Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1092ms, of which 1092ms was call_pinniped
 2022-10-19T06:02:33.847Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.847Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1097ms, of which 1097ms was call_pinniped
 2022-10-19T06:02:33.852Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.852Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1101ms, of which 1101ms was call_pinniped
 2022-10-19T06:02:33.947Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:33.948Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1197ms, of which 1197ms was call_pinniped
 2022-10-19T06:02:34.035Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.035Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1284ms, of which 1283ms was call_pinniped
 2022-10-19T06:02:34.133Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.134Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1382ms, of which 1382ms was call_pinniped
 2022-10-19T06:02:34.157Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.157Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1405ms, of which 1405ms was call_pinniped
 2022-10-19T06:02:34.238Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.238Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1487ms, of which 1486ms was call_pinniped
 2022-10-19T06:02:34.243Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.243Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1491ms, of which 1491ms was call_pinniped
 2022-10-19T06:02:34.248Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.248Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1495ms, of which 1495ms was call_pinniped
 2022-10-19T06:02:34.253Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.253Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1500ms, of which 1499ms was call_pinniped
 2022-10-19T06:02:34.335Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.336Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1582ms, of which 1582ms was call_pinniped
 2022-10-19T06:02:34.435Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.435Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1681ms, of which 1680ms was call_pinniped
 2022-10-19T06:02:34.452Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.452Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1697ms, of which 1697ms was call_pinniped
 2022-10-19T06:02:34.535Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.535Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1780ms, of which 1780ms was call_pinniped
 2022-10-19T06:02:34.542Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.542Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1787ms, of which 1787ms was call_pinniped
 2022-10-19T06:02:34.549Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.549Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1793ms, of which 1793ms was call_pinniped
 2022-10-19T06:02:34.634Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.634Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1878ms, of which 1878ms was call_pinniped
 2022-10-19T06:02:34.640Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.640Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1883ms, of which 1883ms was call_pinniped
 2022-10-19T06:02:34.645Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.645Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1888ms, of which 1888ms was call_pinniped
 2022-10-19T06:02:34.651Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.651Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1894ms, of which 1893ms was call_pinniped
 2022-10-19T06:02:34.657Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.657Z INFO  pinniped_proxy::pinniped > prep-and-call-pinniped took: 1899ms, of which 1899ms was call_pinniped
 2022-10-19T06:02:34.752Z INFO  pinniped_proxy::service  > Requesting k8s API endpoint with cert
 2022-10-19T06:02:34.941Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
 2022-10-19T06:02:34.943Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/node.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:34.946Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/config.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
 2022-10-19T06:02:34.948Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/autoscaling/v2beta2?timeout=32s 200 OK
 2022-10-19T06:02:35.036Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/api/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.037Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/login.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
 2022-10-19T06:02:35.040Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/storage.k8s.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.043Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/autoscaling/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.046Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/storage.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.046Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/kubeapps.com/v1alpha1?timeout=32s 200 OK
 2022-10-19T06:02:35.049Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha2?timeout=32s 200 OK
 2022-10-19T06:02:35.050Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/autoscaling/v2?timeout=32s 200 OK
 2022-10-19T06:02:35.052Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/networking.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.053Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/policy/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.056Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/authorization.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.135Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/authentication.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
 2022-10-19T06:02:35.136Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/policy/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.139Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/apiextensions.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.142Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta2?timeout=32s 200 OK
 2022-10-19T06:02:35.145Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/batch/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.148Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/autoscaling/v2beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.152Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/events.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.155Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/discovery.k8s.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.234Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/notification.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.234Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/events.k8s.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.252Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.252Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/source.toolkit.fluxcd.io/v1beta2?timeout=32s 200 OK
 2022-10-19T06:02:35.252Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.253Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/admissionregistration.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.257Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/authentication.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.335Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/batch/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.338Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/kustomize.toolkit.fluxcd.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.341Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/identity.concierge.pinniped.dev/v1alpha1?timeout=32s 200 OK
 2022-10-19T06:02:35.344Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/helm.toolkit.fluxcd.io/v2beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.347Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/certificates.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.350Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/rbac.authorization.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.351Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/coordination.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.354Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/discovery.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.357Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/apiregistration.k8s.io/v1?timeout=32s 200 OK
 2022-10-19T06:02:35.434Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/node.k8s.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.435Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s 200 OK
 2022-10-19T06:02:35.436Z INFO  pinniped_proxy::service  > GET https://kubernetes.default/apis/image.toolkit.fluxcd.io/v1alpha1?timeout=32s 200 OK

absoludity · 2022-10-19T06:48:23Z

And the winner is the sync_writes option to the cached macro: I had misunderstood what it does, turns out it actually causes each call to try to acquire a write-lock before executing the function, so the first query gets the write-lock, then every other following query gets I/O blocked at this point while trying to get the credential until the previous one releases it.

Basically forming a queue to sequentially get the credential before each can make the proxied request.

Without sync_writes = true it works as expected.

castelblanque · 2022-10-19T06:54:04Z

And the winner is the sync_writes option to the cached macro

Awesome work Michael. So the write-lock was blocking all subsequent requests.

This is something that still amazes me from programming: such a simple fix, but soo hard to pinpoint 😄

ppbaena · 2022-10-19T08:22:17Z

Awesome finding Michael!

absoludity · 2022-10-19T09:10:52Z

Thanks - it was a lot of fun to track down (good chance to learn quite a bit). I just had to share here because I'm working all on my own in my TZ with no-one to high-five with :)

absoludity · 2022-10-24T00:26:41Z

Just an update on this: as mentioned on the linked PR, the issue that I found last week (mentioned above) fixes one aspect, but the requests still takes way too long. I've spent the past couple of days digging to find the issue and finding that

the call to the actual cached function is sometimes a few ms, other times 100ms:

2022-10-23T22:31:22.999358 [INFO] - prep_and_call took 4ms, 4 of which was call_pinniped
2022-10-23T22:31:23.098464 [INFO] - prep_and_call took 94ms, 94 of which was call_pinniped
2022-10-23T22:31:23.292432 [INFO] - prep_and_call took 96ms, 96 of which was call_pinniped
2022-10-23T22:31:23.484841 [INFO] - prep_and_call took 94ms, 94 of which was call_pinniped
2022-10-23T22:31:23.600418 [INFO] - prep_and_call took 18ms, 18 of which was call_pinniped
2022-10-23T22:31:23.782121 [INFO] - prep_and_call took 93ms, 93 of which was call_pinniped
2022-10-23T22:31:23.792623 [INFO] - prep_and_call took 4ms, 4 of which was call_pinniped

The actual cache lookup is less than a ms, consistently.
The longer durations appear to be CPU starvation - but why?

I didn't have much success finding out how the above could be true until this morning, when realising that because the cached function call_pinniped is called asynchronously, regardless of whether the result is cached or not. So even though the value is cached after the first call, every request going through pinniped gets slept when .await is called (it's an i/o wait), with the task manager switching to a different task (rather than just pulling the value out of the cache) and so is dependent on the task being enabled again by the task manager.

At least, this is what I will next check, though I need to take a break from this to get some other things done (hence dumping thoughts here). The approach will be, rather than decorating the async function with the cached macro, instead, having a cache available and only calling the async function on a cache miss. This should avoid sleeping the task unnecessarily.

Signed-off-by: Michael Nelson <minelson@vmware.com>  ### Description of the change This PR is the result of the investigation described on #5407 , the end result of which is a single line removal, though I've left some other changes that were helpful during debugging (logging related). ### Benefits Pinniped-proxy does not stack all requests behind a write lock when called concurrently. ### Possible drawbacks None.  ### Applicable issues  - ref #5407 ### Additional information It does not solve the slow request when running in a local kind cluster, as each task is sleeping while accessing the cache, leading to anywhere between 4 and 100ms just to get the cached response. Signed-off-by: Michael Nelson <minelson@vmware.com>

Signed-off-by: Michael Nelson <minelson@vmware.com>  Follows on from #5518, this time replacing the `cached` package with a custom credential cache. ### Description of the change After further digging, I found that one cause of the slow handling of 50 concurrent requests going through the pinniped-proxy was that: - We were caching a function with an async/await signature which means that even the cached version must have that signature as well - which means a blocking i/o call (which switches the task), and - The `Cached` trait specifies that even a `cache_get` operation mutates the cache (in our case, just for statistics of hits/misses), which, as a result, requires acquiring a *write* lock to the cache to read a cached value. For more details, please see the [discussion with the `Cached` author](jaemk/cached#133). To avoid both of those issues, this PR: 1. Adds a `cache` module that provides a generic read/write `LockableCache` (for multiple readers, single writer) and builds on that with a `PruningCache` that will prune entries (given a test function) when they should no longer be cached, 2. Uses (1) to create a single `CredentialCache` on startup (in `main.rs`) specifically for caching `TokenCredentialRequest` objects and pruning expired entries, and then passes this through for use in different threads concurrently. 3. Uses the cache to fetch the credentials.  ### Benefits Fetching from the cache is now non-blocking (generally, except when an entry is being added) and so leads to less task switching, improving the total query time by ~2s (down to 3-4). There is still something else using significant CPU when creating the client itself (cert-related), which I'm investigating now in a separate PR.  ### Possible drawbacks  ### Applicable issues  - Ref #5407 ### Additional information  Example log when using `RUST_LOG=info,pinniped_proxy::pinniped=debug` which shows the cache being used after the first request. I've not included it in the output generally, but the cache get is now always under a millisecond. As above, the significant delays (some calls to prepare_and_call_pinniped_exchange only 4ms, others 98ms) are what I'll look at next. ``` 2022-10-27T01:42:47.820245 [INFO] - Listening on http://0.0.0.0:3333 2022-10-27T01:43:05.077116 [DEBUG] - prepare_and_call_pinniped_exchange took 17ms. Used cache?: false 2022-10-27T01:43:05.085273 [INFO] - GET https://kubernetes.default/api?timeout=32s 200 OK 2022-10-27T01:43:05.091663 [DEBUG] - prepare_and_call_pinniped_exchange took 5ms. Used cache?: true 2022-10-27T01:43:05.100437 [INFO] - GET https://kubernetes.default/apis?timeout=32s 200 OK 2022-10-27T01:43:05.106005 [DEBUG] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true 2022-10-27T01:43:05.209952 [DEBUG] - prepare_and_call_pinniped_exchange took 21ms. Used cache?: true 2022-10-27T01:43:05.299424 [DEBUG] - prepare_and_call_pinniped_exchange took 5ms. Used cache?: true 2022-10-27T01:43:05.311599 [DEBUG] - prepare_and_call_pinniped_exchange took 5ms. Used cache?: true 2022-10-27T01:43:05.493269 [DEBUG] - prepare_and_call_pinniped_exchange took 98ms. Used cache?: true 2022-10-27T01:43:05.593683 [DEBUG] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true 2022-10-27T01:43:05.604348 [DEBUG] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true 2022-10-27T01:43:05.697828 [DEBUG] - prepare_and_call_pinniped_exchange took 87ms. Used cache?: true 2022-10-27T01:43:05.811590 [DEBUG] - prepare_and_call_pinniped_exchange took 20ms. Used cache?: true 2022-10-27T01:43:06.004358 [DEBUG] - prepare_and_call_pinniped_exchange took 94ms. Used cache?: true 2022-10-27T01:43:06.098603 [DEBUG] - prepare_and_call_pinniped_exchange took 5ms. Used cache?: true 2022-10-27T01:43:06.108756 [DEBUG] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true ``` Signed-off-by: Michael Nelson <minelson@vmware.com>

Signed-off-by: Michael Nelson <minelson@vmware.com>  Reduces the proxying of the ~50 API calls (when the go-lang kubeapps-apis initialises its REST mapper) from 4-6s down to a consistent 1.3-1.6s (on my single computer running a cluster including pinniped-proxy). ### Description of the change  After more profiling, the remaining offender was the call to create a `kube::Client` which ranged from 4-100ms (of CPU + interrupts - not sure if there was any I/O) for every single request. So when 50 requests are issued concurrently, it was starving the CPU available (on my single laptop running a cluster etc.). ~~The call to create the `kube::Client` is dependent on the target k8s api server only, so it's easy to cache as the number of different results is equal to the number of clusters targeted by Kubeapps.~~ EDIT: Actually, I just realised we may not even need to cache it, since we can simple move where it's evaluated so we only evaluate it when the token was not cached. I'll try that before marking ready. ### Benefits  Pinniped proxy handles the 50 concurrent request/responses in a more reasonable time. ### Possible drawbacks  ### Applicable issues  - fixes #5407 (finally) ### Additional information  Without this change, the main cause of CPU was the call to `get_client_config`, which then delays all the proxy requests starting: ``` 2022-10-28T01:53:54.191216 [TRACE] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true. 4ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred. 2022-10-28T01:53:54.291704 [TRACE] - prepare_and_call_pinniped_exchange took 22ms. Used cache?: true. 22ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred. 2022-10-28T01:53:54.470193 [TRACE] - prepare_and_call_pinniped_exchange took 84ms. Used cache?: true. 84ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred. 2022-10-28T01:53:54.585938 [TRACE] - prepare_and_call_pinniped_exchange took 14ms. Used cache?: true. 14ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred. 2022-10-28T01:53:54.674279 [TRACE] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true. 4ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred. 2022-10-28T01:53:54.683590 [TRACE] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true. 4ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred. 2022-10-28T01:53:54.769326 [TRACE] - prepare_and_call_pinniped_exchange took 80ms. Used cache?: true. 80ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred. ``` After the change the call is effectively zero. ``` 2022-10-28T05:28:18.635146 [TRACE] - prepare_and_call_pinniped_exchange took 0ms. Used cache?: true. 0ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred. 2022-10-28T05:28:18.635306 [TRACE] - prepare_and_call_pinniped_exchange took 0ms. Used cache?: true. 0ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred. 2022-10-28T05:28:18.688264 [TRACE] - prepare_and_call_pinniped_exchange took 0ms. Used cache?: true. 0ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred. 2022-10-28T05:28:18.688497 [TRACE] - prepare_and_call_pinniped_exchange took 0ms. Used cache?: true. 0ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred. 2022-10-28T05:28:18.692989 [TRACE] - prepare_and_call_pinniped_exchange took 0ms. Used cache?: true. 0ms for get_client_config, 0ms for creating empty cred data, 0ms to fetch the cred. ``` Signed-off-by: Michael Nelson <minelson@vmware.com>

kubeapps-bot added this to Kubeapps Sep 28, 2022

kubeapps-bot moved this to 🗂 Backlog in Kubeapps Sep 28, 2022

castelblanque added kind/enhancement An issue that reports an enhancement for an implemented feature rust Pull requests that update Rust code component/pinniped-proxy Issue related to kubeapps integration with pinniped-proxy labels Sep 28, 2022

castelblanque added this to the Technical debt milestone Sep 28, 2022

ppbaena added the next-iteration Issues to be discussed in planning session label Sep 29, 2022

ppbaena moved this from 🗂 Backlog to 🗒 Todo in Kubeapps Oct 10, 2022

ppbaena modified the milestones: Technical debt, Update pinniped-proxy service Oct 10, 2022

ppbaena assigned absoludity Oct 11, 2022

absoludity moved this from 🗒 Todo to 🏗 In Progress in Kubeapps Oct 17, 2022

absoludity mentioned this issue Oct 20, 2022

Remove sync_writes optino for pinniped-proxy cred cache. #5518

Merged

absoludity closed this as completed in #5518 Oct 24, 2022

Repository owner moved this from 🏗 In Progress to ✅ Done in Kubeapps Oct 24, 2022

absoludity reopened this Oct 24, 2022

Repository owner moved this from ✅ Done to 🗒 Todo in Kubeapps Oct 24, 2022

absoludity moved this from 🗒 Todo to 🏗 In Progress in Kubeapps Oct 27, 2022

This was referenced Oct 27, 2022

Switch cache to custom without write-lock for reads. #5576

Merged

Call the offending get_client_config only when required. #5583

Merged

absoludity moved this from 🏗 In Progress to 🔎 In Review in Kubeapps Oct 28, 2022

absoludity closed this as completed in #5583 Nov 21, 2022

Repository owner moved this from 🔎 In Review to ✅ Done in Kubeapps Nov 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve response times in Kubeapps APIs when using Pinniped-proxy #5407

Improve response times in Kubeapps APIs when using Pinniped-proxy #5407

castelblanque commented Sep 28, 2022 •

edited

Loading

absoludity commented Oct 12, 2022

absoludity commented Oct 17, 2022 •

edited

Loading

castelblanque commented Oct 17, 2022 •

edited

Loading

absoludity commented Oct 17, 2022

absoludity commented Oct 19, 2022

absoludity commented Oct 19, 2022

absoludity commented Oct 19, 2022

castelblanque commented Oct 19, 2022

ppbaena commented Oct 19, 2022

absoludity commented Oct 19, 2022

absoludity commented Oct 24, 2022

Improve response times in Kubeapps APIs when using Pinniped-proxy #5407

Improve response times in Kubeapps APIs when using Pinniped-proxy #5407

Comments

castelblanque commented Sep 28, 2022 • edited Loading

absoludity commented Oct 12, 2022

absoludity commented Oct 17, 2022 • edited Loading

castelblanque commented Oct 17, 2022 • edited Loading

absoludity commented Oct 17, 2022

absoludity commented Oct 19, 2022

absoludity commented Oct 19, 2022

absoludity commented Oct 19, 2022

castelblanque commented Oct 19, 2022

ppbaena commented Oct 19, 2022

absoludity commented Oct 19, 2022

absoludity commented Oct 24, 2022

castelblanque commented Sep 28, 2022 •

edited

Loading

absoludity commented Oct 17, 2022 •

edited

Loading

castelblanque commented Oct 17, 2022 •

edited

Loading