Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial commit of multicluster config connector #1991

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jingyih
Copy link
Collaborator

@jingyih jingyih commented Jun 11, 2024

Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jingyih. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jingyih jingyih force-pushed the multi_cluster branch 2 times, most recently from 5ac02f4 to 57f2e1a Compare June 14, 2024 22:15
}

func (r *MultiClusterLeaseReconciler) updateStatus(ctx context.Context, mcl *multiclusterv1alpha1.MultiClusterLease, leaderIdentity string, isLeader bool) error {
mcl, err := util.GetMultiClusterLease(ctx, r.client, types.NamespacedName{Name: mcl.Name, Namespace: mcl.Namespace})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't think we should do this, it breaks the optimistic concurrency if we read the object more than once. Rather we should pass in the object (it looks like we already pass it in!)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed.

LastObservedTime: time.Now().Format(time.RFC3339),
}

// TODO: retry in case there is a conflict
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other controllers the advice is to return an error and allow the controller retry-with-backoff logic to take over.

That might be problematic here, because we must "heartbeat" when we are the leader.

Still, I think the answer on that might be to start a goroutine for each MultiClusterLease ... which I think you're doing and storing in leaderElectors ?

)

// These tests use Ginkgo (BDD-style Go testing framework). Refer to
// http://onsi.github.io/ginkgo/ to learn more about Ginkgo.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just delete the auto-generated kubebuilder tests, these tests are testing the kubernetes CRDs work :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted!

succeeded := le.tryAcquireOrRenew(ctx)
if !succeeded {
log.Info("failed to acquire lease")
return fmt.Errorf("failed to acquire lease")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I suggest returning (bool, error) - I think it'll be handy to differentiate between e.g. a network error vs "someone else has the lease"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I added a TODO for future improvement.

}

// update the lock itself
if err = le.config.Lock.Update(ctx, leaderElectionRecord); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API should probably support compare-and-swap (i.e. take the old object as a parameter)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right! In k8s, the update is guarded by resourceVersion, we need to have similar mechanism here. Added a TODO to fix in the followup PR.

// TODO: add metrics
return fmt.Errorf("error writing record because the precondition failed: %v", err)
}
default:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we fall-through by default in go, so looks like we might be ignoring errors that aren't StatusPreconditionFailed.

I would put the default path outside of the switch (and convert it to `fmt.Errorf("error creating gs://%s/%s: %w", bucket, object, err)``

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed.

if err != nil {
return fmt.Errorf("error getting object attributes: %v", err)
}
w := b.Object(l.leaseName).If(storage.Conditions{GenerationMatch: objAttrs.Generation}).NewWriter(ctx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks more like what I'd expect to see in the create!

}

func (l *BucketLease) Describe() string {
return fmt.Sprintf("%v/%v", l.bucketName, l.leaseName)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good; it could be even more self-documenting if it was gs://%s/%s

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated.

)

func GetMultiClusterLease(ctx context.Context, c client.Client, nn types.NamespacedName) (*multiclusterv1alpha1.MultiClusterLease, error) {
mcl := &multiclusterv1alpha1.MultiClusterLease{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is that if you're only calling one function in a method it's easier just to inline the method

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Moved to inline.


func (r *MultiClusterLeaseReconciler) handleRenewFailed(ctx context.Context, mcl *multiclusterv1alpha1.MultiClusterLease, le *leaderelection.LeaderElector, lh lifecyclehandler.LifecycleHandler) (ctrl.Result, error) {
if err := lh.OnStoppedLeading(ctx); err != nil {
r.log.Error(err, "error calling stopped leading function")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to handle the error, I don't think we can just log it and move on. (I don't think there's much we can do other than try again, but ... I assume this means that we have two leaders)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Updated the code so that we retry immediately. Also added a rate limiter to the controller so that we backoff if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants