-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Adding ClusterWorkspaceShard to the resources stored in the cache server #2381
✨ Adding ClusterWorkspaceShard to the resources stored in the cache server #2381
Conversation
ef7f921
to
cb49b09
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good.
I'd like to get approval from @sttts before we merge it.
Name: resourceName, | ||
}, | ||
Spec: tenancyv1alpha1.ClusterWorkspaceShardSpec{ | ||
BaseURL: "https://base.kcp.test.dev", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is okay for now, it might break when this test is run on a kcp instance that is shared with other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something worth avoiding up front. What about using GenerateName instead of Name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a real address or a way to disable this shard from scheduling. We might try to put something on that shard in a multi shard env.
Maybe for this particular test scenario we should actually use framework.PrivateKcpServer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My gut feeling is that we should build the tests and the business logic in a way that is not sensible to that. If a node is not ready in Kubernetes no new pod gets scheduled onto it. I am wondering whether we would need a similar mechanism in kcp for the shards at some point.
Nevertheless we should forge the multi-shard tests so that the placement is constrained to a shard, which is know to be functional (or not for negative tests). It is good to have as few explicit or implicit dependencies between the test cases.
The alternative is to start an additional kcp server to have a "real address", which may make the tests slower. I would favor the other approach for this reason. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NACK to private KCP in every case. Why don't we have shard scheduling control?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be okay to add an annotation to the shard created by this test that would tell the scheduler to not consider the shard?
I think we would have to update the isValidShard
(/~https://github.com/kcp-dev/kcp/blob/main/pkg/reconciler/tenancy/clusterworkspace/clusterworkspace_reconcile_scheduling.go#L229) function to look up for the special annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be a good approach in the near-term, but do we have any idea if we're going to add some sort of scheduling attributes to the shards for the long-term?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that by default we will schedule randomly and we will allow for specifying a selector for those that have opinions
Name: resourceName, | ||
}, | ||
Spec: tenancyv1alpha1.ClusterWorkspaceShardSpec{ | ||
BaseURL: "https://base.kcp.test.dev", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something worth avoiding up front. What about using GenerateName instead of Name?
cb49b09
to
5344ed2
Compare
/retest |
5344ed2
to
88c4b8f
Compare
Signed-off-by: Frederic Giloux <fgiloux@redhat.com>
88c4b8f
to
93e69af
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly LGTM
@@ -80,10 +82,19 @@ func NewController( | |||
return nil, err | |||
} | |||
|
|||
if err := cacheKcpInformers.Tenancy().V1alpha1().ClusterWorkspaceShards().Informer().AddIndexers(cache.Indexers{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a helper for indexers.AddIfNotPresentOrDie
- should we use it for these, and if not, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any reason why not to use it but will let @p0lyn0mial comment on it as it is just and addition following the existing pattern in the controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know we have indexers.AddIfNotPresentOrDie
. I don't like its side effects:
- it removes entries from
toAdd
map - it panics on error, why it cannot simply return an error?
@@ -445,8 +548,10 @@ func (b *replicateResourceScenario) getCachedResourceHelper(ctx context.Context, | |||
return b.cacheKcpClusterClient.Cluster(cluster).ApisV1alpha1().APIExports().Get(cacheclient.WithShardInContext(ctx, shard.New("root")), b.resourceName, metav1.GetOptions{}) | |||
case "APIResourceSchema": | |||
return b.cacheKcpClusterClient.Cluster(cluster).ApisV1alpha1().APIResourceSchemas().Get(cacheclient.WithShardInContext(ctx, shard.New("root")), b.resourceName, metav1.GetOptions{}) | |||
case "ClusterWorkspaceShard": | |||
return b.cacheKcpClusterClient.Cluster(cluster).TenancyV1alpha1().ClusterWorkspaceShards().Get(cacheclient.WithShardInContext(ctx, shard.New("root")), b.resourceName, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
broader question: why are we using strongly typed (as opposed to dynamic) clients and informers in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leads to more readable code in general and we also make sure that strongly typed clients/informers can be used with the cache server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we doing special logic for each? Or the same? Convince me that we're not going to gain a case in this switch statement for every type ... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We won't because caching is expensive :)
My main goal was to test the kcp client with the cache.
Plus individual tests read well (i.e.
_, err := cacheKcpClusterClient.Cluster(cluster).ApisV1alpha1().APIResourceSchemas().Update(cacheclient.WithShardInContext(ctx, shard.New("root")), cachedSchema, metav1.UpdateOptions{}) |
If we were to use the dynamic client we would have to use unstructured
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - having two or three types here is fine. If we end up with fifteen, then not so much :)
Also was concerned we're re-writing tons and tons of boilerplate for each type when the logic to handle it is identical. Won't we need to use dynamic clients to handle replication claims?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also was concerned we're re-writing tons and tons of boilerplate for each type when the logic to handle it is identical
Usually creation and a spec update are different. Initially I captured it with a baseScenario
idea but Stefan didn't like it (#2240 (comment)).
Maybe we could have a separate test for testing only the kcp client and then change the replication tests to be more dynamic. I think I could do that, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whichever you think is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, i can prepare something and show it you, we can always trash it if we don't like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be done in a separate PR, let's not block this pr on it.
…ent cache and global resources (cache server) using indexers.AddIfNotPresentOrDie for consistency sake Signed-off-by: Frederic Giloux <fgiloux@redhat.com>
@ncdc @stevekuznetsov provided the change from typed to dynamic client in test file is made in a separate PR, is there anything left preventing this PR to get merged? |
Nope, this looks good to me - @p0lyn0mial PTAL /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: stevekuznetsov The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/hold cancel |
Summary
ClusterWorkspaceShards are created and stored in the root workspace. They contain the URLs to access the shards. This information is necessary to construct the URLs for accessing APIExport services that will get stored into the status of APIExportEndpointSlices.
APIExports and APIExportEndpointSlices can be created in any workspace in any shard. This is a reason for making
ClusterWorkspaceShards globally available.
The number of ClusterWorkspaceShards is also not high and changes should be rare, hence the number of writes and the cost should be low.
Signed-off-by: Frederic Giloux fgiloux@redhat.com
Related issue(s)
#2332