-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial commit of multicluster config connector #1991
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
5ac02f4
to
57f2e1a
Compare
} | ||
|
||
func (r *MultiClusterLeaseReconciler) updateStatus(ctx context.Context, mcl *multiclusterv1alpha1.MultiClusterLease, leaderIdentity string, isLeader bool) error { | ||
mcl, err := util.GetMultiClusterLease(ctx, r.client, types.NamespacedName{Name: mcl.Name, Namespace: mcl.Namespace}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I don't think we should do this, it breaks the optimistic concurrency if we read the object more than once. Rather we should pass in the object (it looks like we already pass it in!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Fixed.
LastObservedTime: time.Now().Format(time.RFC3339), | ||
} | ||
|
||
// TODO: retry in case there is a conflict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other controllers the advice is to return an error and allow the controller retry-with-backoff logic to take over.
That might be problematic here, because we must "heartbeat" when we are the leader.
Still, I think the answer on that might be to start a goroutine for each MultiClusterLease ... which I think you're doing and storing in leaderElectors
?
) | ||
|
||
// These tests use Ginkgo (BDD-style Go testing framework). Refer to | ||
// http://onsi.github.io/ginkgo/ to learn more about Ginkgo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just delete the auto-generated kubebuilder tests, these tests are testing the kubernetes CRDs work :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted!
succeeded := le.tryAcquireOrRenew(ctx) | ||
if !succeeded { | ||
log.Info("failed to acquire lease") | ||
return fmt.Errorf("failed to acquire lease") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I suggest returning (bool, error) - I think it'll be handy to differentiate between e.g. a network error vs "someone else has the lease"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I added a TODO for future improvement.
} | ||
|
||
// update the lock itself | ||
if err = le.config.Lock.Update(ctx, leaderElectionRecord); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API should probably support compare-and-swap (i.e. take the old object as a parameter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right! In k8s, the update is guarded by resourceVersion, we need to have similar mechanism here. Added a TODO to fix in the followup PR.
// TODO: add metrics | ||
return fmt.Errorf("error writing record because the precondition failed: %v", err) | ||
} | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we fall-through by default in go, so looks like we might be ignoring errors that aren't StatusPreconditionFailed.
I would put the default
path outside of the switch (and convert it to `fmt.Errorf("error creating gs://%s/%s: %w", bucket, object, err)``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Fixed.
if err != nil { | ||
return fmt.Errorf("error getting object attributes: %v", err) | ||
} | ||
w := b.Object(l.leaseName).If(storage.Conditions{GenerationMatch: objAttrs.Generation}).NewWriter(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks more like what I'd expect to see in the create!
} | ||
|
||
func (l *BucketLease) Describe() string { | ||
return fmt.Sprintf("%v/%v", l.bucketName, l.leaseName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good; it could be even more self-documenting if it was gs://%s/%s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Updated.
) | ||
|
||
func GetMultiClusterLease(ctx context.Context, c client.Client, nn types.NamespacedName) (*multiclusterv1alpha1.MultiClusterLease, error) { | ||
mcl := &multiclusterv1alpha1.MultiClusterLease{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is that if you're only calling one function in a method it's easier just to inline the method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Moved to inline.
|
||
func (r *MultiClusterLeaseReconciler) handleRenewFailed(ctx context.Context, mcl *multiclusterv1alpha1.MultiClusterLease, le *leaderelection.LeaderElector, lh lifecyclehandler.LifecycleHandler) (ctrl.Result, error) { | ||
if err := lh.OnStoppedLeading(ctx); err != nil { | ||
r.log.Error(err, "error calling stopped leading function") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to handle the error, I don't think we can just log it and move on. (I don't think there's much we can do other than try again, but ... I assume this means that we have two leaders)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Updated the code so that we retry immediately. Also added a rate limiter to the controller so that we backoff if necessary.
cc @600lyy @cheftako @justinsb