Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance: minimize Update calls #1194

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

Tal-or
Copy link
Collaborator

@Tal-or Tal-or commented Feb 18, 2025

Many of the Update calls done by the operator
are changing nothing.

This is happening because the API server defaults some of the fields
while our code leave them empty.

This result in a diff in the comparison function and leads to redundant calls.

This package purpose it to set the default values in our objects similar to what
the API server is doing.

The code is based on upstream code that is
not in staging so we can't import it unfortunately.

Instead we are pulling all the relevant files,
preforming some manipulations on them and generating using the code-generator,
similar to how k8s is doing it.
Then we are coping the generated files and calling them generated function from our code.

Below benchmarks shows the benefit of those changes:

Simple patch against numaresources CR (before change):

oc patch numaresourcesoperators.nodetopology.openshift.io numaresourcesoperator \
  --type='json' -p '[{"op": "add", "path": "/spec/nodeGroups/0/annotations", "value": {"key3":"value6"}}]'
numaresourcesoperator.nodetopology.openshift.io/numaresourcesoperator patched
./kubectl-dev_tool audit -f audit.log --after 2025-02-18T17:10:00+02:00 -otop --by resource --verb=update --verb=create  --user=system:serviceaccount:openshift-numaresources:numaresources-controller-manager  --resource=*.* --resource=-leases.* 
count: 7, first: 2025-02-18T17:12:26+02:00, last: 2025-02-18T17:12:26+02:00, duration: 59.093ms
2x                   security.openshift.io/v1/securitycontextconstraints
1x                   apiextensions.k8s.io/v1/customresourcedefinitions
1x                   nodetopology.openshift.io/v1/numaresourcesoperators
1x                   v1/services
1x                   apps/daemonsets
1x                   v1/serviceaccounts

Simple patch against numaresources CR (after change):

oc patch numaresourcesoperators.nodetopology.openshift.io numaresourcesoperator \
  --type='json' -p '[{"op": "add", "path": "/spec/nodeGroups/0/annotations", "value": {"key3":"value6"}}]'
numaresourcesoperator.nodetopology.openshift.io/numaresourcesoperator patched
./kubectl-dev_tool audit -f audit.log --after 2025-02-18T15:30:00+02:00 -otop --by resource --verb=update --verb=create  --user=system:serviceaccount:openshift-numaresources:numaresources-controller-manager  --resource=*.* --resource=-leases.* 
count: 1, first: 2025-02-18T17:02:34+02:00, last: 2025-02-18T17:02:34+02:00, duration: 0s
1x                   nodetopology.openshift.io/v1/numaresourcesoperators

Realistic benchmark against serial suite (before change):

E2E_NROP_INSTALL_SKIP_KC=true ./bin/e2e-nrop-serial.test  --ginkgo.skip='reboot_required|schedrst' --ginkgo.label-filter='!(feature: containsAny { ngpoolname, nganns })' --ginkgo.label-filter='tier0||tier1' --ginkgo.dry-run

Ran 32 of 120 Specs in 2895.370 seconds
SUCCESS! -- 32 Passed | 0 Failed | 0 Pending | 88 Skipped
./kubectl-dev_tool audit -f audit.log --after 2025-02-18T14:10:00+02:00 -otop --by resource --verb=update --verb=create --user=system:serviceaccount:openshift-numaresources:numaresources-controller-manager --resource=*.* --resource=-leases.* 
count: 113, first: 2025-02-18T14:40:20+02:00, last: 2025-02-18T15:07:09+02:00, duration: 26m48.655819s
32x                  security.openshift.io/v1/securitycontextconstraints
16x                  apps/daemonsets
16x                  v1/serviceaccounts
16x                  apiextensions.k8s.io/v1/customresourcedefinitions
16x                  nodetopology.openshift.io/v1/numaresourcesoperators
16x                  v1/services
1x                   v1/events

Realistic benchmark against serial suite (after change):

E2E_NROP_INSTALL_SKIP_KC=true ./bin/e2e-nrop-serial.test  --ginkgo.skip='reboot_required|schedrst' --ginkgo.label-filter='!(feature: containsAny { ngpoolname, nganns })' --ginkgo.label-filter='tier0||tier1' --ginkgo.dry-run

Ran 32 of 120 Specs in 2901.802 seconds
SUCCESS! -- 32 Passed | 0 Failed | 0 Pending | 88 Skipped
./kubectl-dev_tool audit -f audit.log --after 2025-02-18T16:00:00+02:00 -otop --by resource --verb=update --verb=create --user=system:serviceaccount:openshift-numaresources:numaresources-controller-manager --resource=*.* --resource=-leases.* 

// return empty. no redundant call at all

NOTE: we must add a CI lane that tracks the amount of update call and failed if we're exceeding a certain threshold.
This is needed because atm we don't have a better way to track changes in the set-default logic of the API server.

@Tal-or Tal-or requested a review from ffromani February 18, 2025 15:32
@openshift-ci openshift-ci bot requested a review from shajmakh February 18, 2025 15:32
Copy link
Contributor

openshift-ci bot commented Feb 18, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Tal-or

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 18, 2025
@Tal-or Tal-or changed the title performance: minimize update calls performance: minimize Update calls Feb 18, 2025
@Tal-or Tal-or mentioned this pull request Feb 18, 2025
Copy link
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive PR. My only concern is to depend on external defaulter package we need to cut and paste here. Can we avoid it with smarter merge functions?

@@ -0,0 +1,310 @@
package setdefault
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing copyright boilerplate (hack/boilerplate.go.txt)

what is the canonical name of the same functionality in other projects in the kube ecosystem? defaulter maybe? Do we have a canonical name at all?

Copy link
Collaborator Author

@Tal-or Tal-or Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Merge the desired object with what actually exists
merged, err := objState.Merge(objState.Existing, objState.Desired)
merged, err := objState.Merge(objState.Existing, desiredWithDefault)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another option could be to make our merge functions smarter. We should only copy the desired fields in the existing object. This allows us to avoid defaults (and the need to depend on defaulter) and was the original intent of the merge package, which was however lost in the initial implementation.

We can take the chance to rethink the Compare step. Can we do Merge+Compare in one single go? Can the merging step infer if a field we care about changed during the merge?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining a list of fields we care about using a black/white list is possible however it has its own cons:

  1. We need to define the fields we care about which is not that easy because we don't know what will happen in case that a field "we didn't care about" is misconfigured, could it break anything?
  2. We need to write a custom compare/merge functions that will be smart enough to skip and focus on the fields according to our needs, which is basically similar to setting the defaults and goes through everything.
  3. What should we do if user is tampering with fields we didn't care about and messing things up? we'll never be able to fix it automatically, a manual intervention will be needed.

@ffromani
Copy link
Member

some historical context.
The person who wrote the merge code (cough cough) implemented the bare minimum merge code back in time. That was not very smart and admittedly pretty raw. The thought was like "we will optimize later" and later, of course, never came.

I'm not against using defaulter. This is actually a good solution and another very valid approach which complements smart and precise merge functions. The problem is having easy access to that package.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Many of the `Update` calls done by the operator
are changing nothing.

This is happening because the API server defaults some of the fields
while our code leave them empty.

This result in a diff in the comparison function and leads to redundant calls.

This package purpose it to set the default values in our objects similar to what
the API server is doing.

The code is based on upstream code that is
not in staging so we can't import it unfortunately.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
@Tal-or Tal-or force-pushed the minimize_update_calls branch from d299325 to f23a5b5 Compare February 25, 2025 18:07
A script to automate the generation of default functions.
In order to use, call: `make defaults`

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
call `make defaults`

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Use the generated functions instead the
one that we copy/pasted manually.

This should make the maintenace and track
of changes much easier.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
The status is a subresource which is managed by other controllers
and which we don't care about during the comparison for `Update`.

So we suggest to merge the status fields from the existing object,
(for objects that has status)
to avoid diff during comparison.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
add default function  to the objectstate struct that will
be called during the ApplyObject process.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
use the default object created per each resource managed by the operator
in the ObjectState struct.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Merge the IP configuration from the existing object,
since those fields are being controlled by external operator.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Set defaults for initial object.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
@Tal-or Tal-or force-pushed the minimize_update_calls branch from f23a5b5 to eaafb55 Compare February 25, 2025 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants