-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ APIExportEndpointSlice reconciliation #2432
✨ APIExportEndpointSlice reconciliation #2432
Conversation
900b199
to
785e70a
Compare
pkg/reconciler/apis/apiexportendpointslice/apiexportendpointslice_controller.go
Outdated
Show resolved
Hide resolved
pkg/reconciler/apis/apiexportendpointslice/apiexportendpointslice_controller.go
Outdated
Show resolved
Hide resolved
pkg/server/server.go
Outdated
@@ -456,6 +456,12 @@ func (s *Server) Run(ctx context.Context) error { | |||
} | |||
} | |||
|
|||
if s.Options.Cache.Enabled && (s.Options.Controllers.EnableAll || enabled.Has("apiexportendpointslice")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this doesn't work if the cache server is disabled? What is the fallback option for a service provider wanting to know which URLs to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I should have raised that. Here are my views:
- sharded environment means cache server
- the construct (APIExportEndpointSlice) is of little value for non-sharded environment
- it is not a good UX for service providers to have a different API for sharded and non-sharded environments. Most probably they want to develop their service once and have it working for both
I would propose that in a non-sharded / no cache server environment informers for the local instance are passed to the controller instead of the ones for the cache server, e.g.:
c, err := apiexportendpointslice.NewController(
s.KcpSharedInformerFactory.Apis().V1alpha1().APIExportEndpointSlices(),
// ClusterWorkspaceShards and APIExports get retrieved from cache server
s.CacheKcpSharedInformerFactory.Tenancy().V1alpha1().ClusterWorkspaceShards(),
s.CacheKcpSharedInformerFactory.Apis().V1alpha1().APIExports(),
kcpClusterClient,
)
vs
c, err := apiexportendpointslice.NewController(
s.KcpSharedInformerFactory.Apis().V1alpha1().APIExportEndpointSlices(),
// ClusterWorkspaceShards and APIExports get retrieved from cache server
s.KcpSharedInformerFactory.Tenancy().V1alpha1().ClusterWorkspaceShards(),
s.KcpSharedInformerFactory.Apis().V1alpha1().APIExports(),
kcpClusterClient,
)
As the cache server is K8 API compliant the same code should work with both. As there is a single shard all the relevant information should be provided by it.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually pass in both and have the controller try the local ones first, then fall back to the cache server ones. Right @stevekuznetsov @p0lyn0mial @sttts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question here: #2432 (comment)
It's a fundamental question of design here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually pass in both and have the controller try the local ones first, then fall back to the cache server ones.
With the approach you describe you need the informers for the cache server to be started. This is not the case today if the cache server is disabled. This could be done but I feel that we are conflating scenarios here:
-
Do we want kcp to work without any cache server at all. My understanding is that the answer is yes but it is then a single sharded kcp. My understanding is that the cache server is the communication mechanism between shards. My proposal here: passing local informers instead of cache informers was to address this specific scenario.
-
What is the best degradation we can offer when the cache server is not reachable and is supposed to be. My thinking on this scenario was the answer here: ✨ APIExportEndpointSlice reconciliation #2432 (comment)
Let's go through (2)
- We have 10 shards, 1 shard gets updated:
- if the cache server is not reachable before the reconciliation loop starts. It does not start for any APIExportEndpointSlice with the current implementation in this PR. With secondary local informers it would only start either for the root shard (current implementation) or for the modified shard (if ClusterWorkspaceShard gets created locally instead of the root shard). With the root shard approach, the APIExportEndpointSlices in the root shard gets the correct endpoints. The other are stale. It is 10% better, 1% if we have 100 shards. With ClusterWorkspaceShards created locally, the reconciliation logic sees only 1 shard, the local one with secondary informers. Any change would wipe out the enpoints for the remaining 9 shard and possibly be 9 times worse for the APIExportEndpointSlices on the modified shards. APIExportEndpointSlices on other shards don't see the change.
- if the cache server is not reachable during the reconciliation loop no change is applied with the current implementation in this PR. With secondary informers, the information for APIExports on other shards won't be accessible 90% stay unchanged. For the remaining 10% the APIExport information is available. With ClusterWorkspaceShards created in the root shard, again APIExportEndpointSlices on the root shard get properly populated. Controller for APIExportEndpointSlices on other shards see zero ClusterWorkspaceShard resources and should wipe out all the endpoints, which would be far worse than doing nothing. With ClusterWorkspaceShards created locally, the controller sees only its local shard and should wipe out 9 out of the 10 endpoints, again far worse than doing nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not diverge between local and sharded mode. Let's run the cache server even in single-shard mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's run the cache server even in single-shard mode.
this is already implemented, we can run the cache server with a kcp instance in the same process (embedded mode)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed I will modify the flags so that the cache server is started in embedded mode by default and as a separate process if additional information is provided.
pkg/reconciler/apis/apiexportendpointslice/apiexportendpointslice_controller.go
Outdated
Show resolved
Hide resolved
pkg/reconciler/apis/apiexportendpointslice/apiexportendpointslice_reconcile.go
Show resolved
Hide resolved
apisv1alpha1.APIExportValid, | ||
apisv1alpha1.InternalErrorReason, | ||
conditionsv1alpha1.ConditionSeverityError, | ||
"Error getting APIExport %s|%s: %v", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error is something other than NotFound
from a lister - unclear what it might even be, but does it make sense to update status here? Can we retry the reconciliation? What might a user do with the information here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's either impossible or essentially impossible to get other kinds of errors, given the current implementation of a lister. But in any case, I do think updating a condition is appropriate - it provides feedback to the user that something went wrong. They may not have any recourse, but it's better than giving them no information and having the key stuck in an endless exponential backoff loop.
pkg/reconciler/apis/apiexportendpointslice/apiexportendpointslice_reconcile.go
Show resolved
Hide resolved
pkg/reconciler/apis/apiexportendpointslice/apiexportendpointslice_reconcile.go
Show resolved
Hide resolved
pkg/server/controllers.go
Outdated
|
||
c, err := apiexportendpointslice.NewController( | ||
s.KcpSharedInformerFactory.Apis().V1alpha1().APIExportEndpointSlices(), | ||
// ClusterWorkspaceShards and APIExports get retrieved from cache server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a shard have some local storage for its' own definition? Surely also there will be APIExports on the shard, that do not need to be requested from the cache - we can progress in processing that data even when the cache is down, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a shard have some local storage for its' own definition?
Currently not, it is all in the root workspace of the root shard. As per the above I think we would need it.
here will be APIExports on the shard, that do not need to be requested from the cache - we can progress in processing that data even when the cache is down, right?
That's true but I am not sure it is worth it. We need the shard information for getting more than a single endpoint. This can only be retrieved from the cache server. I would rather keep things "frozen" when the cache server cannot be reached: existing services are not impacted by the unavailability of the cache server as long as the shards don't change. Looking only at the information on the local shard, which is currently only available for the root shard if the APIExport has been created there would only give a partial picture that may be worse than the stale snapshot captured in the APIExportEndpointSlices before the unavailability of the cache server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is against our principle of single-shard functionality without external data. We need to support "working as much as possible" when the larger system has network partitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@p0lyn0mial would appreciate your thoughts on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a shard have some local storage for its' own definition? Surely also there will be APIExports on the shard, that do not need to be requested from the cache - we can progress in processing that data even when the cache is down, right?
yes, so, as of today our approach is to use two informers (maybe in the future we could unify and provide a single instance/interface).
The approeach so far was to first check the local informer and falling back to the second backed by the cache server. For example:
getAPIExport: func(clusterName logicalcluster.Name, name string) (*apisv1alpha1.APIExport, error) {
apiExport, err := localAPIExportInformer.Lister().Cluster(clusterName).Get(name)
if errors.IsNotFound(err) {
return cacheApiExportInformer.Lister().Cluster(clusterName).Get(name)
}
return apiExport, err
}
I think we could apply this strategy to all places that require working with the caching layer.
When it comes to working with Shard
resources in an event of a failure, we could at least register the shard we are currently running on since all we need is Spec.VirtualWorkspaceURL
.
Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could apply this strategy to all places that require working with the caching layer.
I think that we need to discriminate between two types of requests:
- I am looking for APIExport A, or another global resource type, and the existing approach: check locally and then the cache looks ok to me.
- I want to build a complete view and I am looking for all shards. This is the scenario here. In that case I don't think that local requests bring some added value.
When it comes to working with Shard resources in an event of a failure, we could at least register the shard we are currently running on since all we need is Spec.VirtualWorkspaceURL
If the APIExportEndpointSlice was just created, nothing is broken, just not yet working. If the cache server is "temporarily" not accessible it is debatable but my gut feeling is that it is simpler, clearer, cleaner to wait for it to get reachable again and the reconciliation to happen.
If the APIExportEndpointSlice already existed, and had already been reconciled once, looking at the local shard only, may (not today) return a single shard. Reconciliation against that may mean to remove hundred of valid existing URLs and possibly one invalid (if a shard change was the trigger for the reconciliation) and replace them by a single URL (the local shard). I would argue that it is far worse than doing nothing, keeping the stale state and waiting for the cache server to be reachable again to have the state finally up-to-date.
to come back to:
Does a shard have some local storage for its' own definition?
As per my previsous answer: No. I am proposing to change that, to stop having the root shard being special and the cache server becoming the preferred communication mechanism between shards. I feel however that it is orthogonal to this pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we won't gate a shard when the cache server is not ready/available. That means that a local cache (informer) might give you an empty list.
Also, nothing will remove the content received (informer) from the cache server when it becomes unavailiable. In the future we might provide users with a "staleness" information that would help them drive some decicions.
Given the above, we need to decide what to do after a restart and without the access to the cache server.
I was proposing to add at least a local shard instead of wiping the list (/~https://github.com/kcp-dev/kcp/pull/2432/files#diff-a028d37be8294707fd6fdcb63d85f41486832f61eb70c14605fc5fc712958f48R134).
I'm fine with not changing already populated list but that would make the code more complicated because we need to distinguish between a list of shards with only 1 item and a list with only a local shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a commit following our slack discussion. My preference would be without it.
4868e4b
to
374454b
Compare
/milestone v0.11 |
b55f98b
to
2d05eb3
Compare
2d05eb3
to
bb00beb
Compare
bb00beb
to
152d9c7
Compare
/retest |
1 similar comment
/retest |
152d9c7
to
09bfca0
Compare
/retest |
@ncdc @sttts I have completed the changes, the tests pass except for ppc64le but this is an infrastructure issue. |
pkg/reconciler/apis/apiexportendpointslice/apiexportendpointslice_controller_test.go
Show resolved
Hide resolved
f175b5c
to
3f9673c
Compare
ecb6df4
to
666d657
Compare
runtime.HandleError(err) | ||
continue | ||
} else if !exists { | ||
runtime.HandleError(fmt.Errorf("APIBinding %q does not exist", key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not an error. The index can change at any time, especially since the IndexKeys
call above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sttts The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
return utilerrors.NewAggregate(errs) | ||
} | ||
|
||
func filterShardEvent(oldObj, newObj interface{}) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in the filtered event handler for informers the logic is inverse, i.e. false for drop, true for keep. Maybe worth to follow that logic here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I have now inverted the logic.
Signed-off-by: Frederic Giloux <fgiloux@redhat.com>
…mote server when cache-server-kubeconfig-file is specified Signed-off-by: Frederic Giloux <fgiloux@redhat.com>
This prevents SSA to work with kcp when the cache server is embedded. Signed-off-by: Frederic Giloux <fgiloux@redhat.com>
Signed-off-by: Frederic Giloux <fgiloux@redhat.com>
Signed-off-by: Frederic Giloux <fgiloux@redhat.com>
… enqueue logic but not from the reconcile one. Remove unnecessary getters. Signed-off-by: Frederic Giloux <fgiloux@redhat.com>
666d657
to
4a55275
Compare
…VirtualWorkspaceURL or labels have changed Signed-off-by: Frederic Giloux <fgiloux@redhat.com>
4a55275
to
12d0473
Compare
/lgtm |
/retest |
/hold cancel |
/retest |
Summary
This PR introduces some reconciliation logic for the population of an endpoint per shard in the status of the apiexportendpointslices. This will replace what was directly populated in the status of apiexports.
Background
Conditions are also added in this PR to the APIExportEndpointSlice status.
Filtering will be introduced in a subsequent PR.
Related issue(s)
FixesContributes to #2332