✨ Adding ClusterWorkspaceShard to the resources stored in the cache server #2381

fgiloux · 2022-11-18T10:46:02Z

Summary

ClusterWorkspaceShards are created and stored in the root workspace. They contain the URLs to access the shards. This information is necessary to construct the URLs for accessing APIExport services that will get stored into the status of APIExportEndpointSlices.

APIExports and APIExportEndpointSlices can be created in any workspace in any shard. This is a reason for making
ClusterWorkspaceShards globally available.

The number of ClusterWorkspaceShards is also not high and changes should be rare, hence the number of writes and the cost should be low.

Signed-off-by: Frederic Giloux fgiloux@redhat.com

Related issue(s)

#2332

fgiloux · 2022-11-18T12:38:15Z

/retest

p0lyn0mial

Looks very good.
I'd like to get approval from @sttts before we merge it.

pkg/cache/server/bootstrap/bootstrap.go

p0lyn0mial · 2022-11-18T13:45:29Z

test/e2e/reconciler/cache/replication_test.go

+				Name: resourceName,
+			},
+			Spec: tenancyv1alpha1.ClusterWorkspaceShardSpec{
+				BaseURL: "https://base.kcp.test.dev",


I guess it is okay for now, it might break when this test is run on a kcp instance that is shared with other tests.

This is something worth avoiding up front. What about using GenerateName instead of Name?

I think we need a real address or a way to disable this shard from scheduling. We might try to put something on that shard in a multi shard env.

Maybe for this particular test scenario we should actually use framework.PrivateKcpServer ?

My gut feeling is that we should build the tests and the business logic in a way that is not sensible to that. If a node is not ready in Kubernetes no new pod gets scheduled onto it. I am wondering whether we would need a similar mechanism in kcp for the shards at some point.
Nevertheless we should forge the multi-shard tests so that the placement is constrained to a shard, which is know to be functional (or not for negative tests). It is good to have as few explicit or implicit dependencies between the test cases.
The alternative is to start an additional kcp server to have a "real address", which may make the tests slower. I would favor the other approach for this reason. What do you think?

NACK to private KCP in every case. Why don't we have shard scheduling control?

Would it be okay to add an annotation to the shard created by this test that would tell the scheduler to not consider the shard?

I think we would have to update the isValidShard(/~https://github.com/kcp-dev/kcp/blob/main/pkg/reconciler/tenancy/clusterworkspace/clusterworkspace_reconcile_scheduling.go#L229) function to look up for the special annotation.

That might be a good approach in the near-term, but do we have any idea if we're going to add some sort of scheduling attributes to the shards for the long-term?

I think that by default we will schedule randomly and we will allow for specifying a selector for those that have opinions

pkg/cache/server/bootstrap/bootstrap.go

ncdc · 2022-11-18T14:28:26Z

test/e2e/reconciler/cache/replication_test.go

+				Name: resourceName,
+			},
+			Spec: tenancyv1alpha1.ClusterWorkspaceShardSpec{
+				BaseURL: "https://base.kcp.test.dev",


This is something worth avoiding up front. What about using GenerateName instead of Name?

test/e2e/reconciler/cache/replication_test.go

fgiloux · 2022-11-18T16:36:00Z

/retest

Signed-off-by: Frederic Giloux <fgiloux@redhat.com>

fgiloux · 2022-11-22T14:10:57Z

/retest

stevekuznetsov

Broadly LGTM

stevekuznetsov · 2022-11-22T14:58:28Z

pkg/reconciler/cache/replication/replication_controller.go

@@ -80,10 +82,19 @@ func NewController(
 		return nil, err
 	}

+	if err := cacheKcpInformers.Tenancy().V1alpha1().ClusterWorkspaceShards().Informer().AddIndexers(cache.Indexers{


We have a helper for indexers.AddIfNotPresentOrDie - should we use it for these, and if not, why?

I don't see any reason why not to use it but will let @p0lyn0mial comment on it as it is just and addition following the existing pattern in the controller.

I didn't know we have indexers.AddIfNotPresentOrDie. I don't like its side effects:

it removes entries from toAdd map

it panics on error, why it cannot simply return an error?

pkg/reconciler/cache/replication/replication_reconcile.go

pkg/reconciler/cache/replication/replication_reconcile_test.go

stevekuznetsov · 2022-11-22T15:02:02Z

test/e2e/reconciler/cache/replication_test.go

@@ -445,8 +548,10 @@ func (b *replicateResourceScenario) getCachedResourceHelper(ctx context.Context,
 		return b.cacheKcpClusterClient.Cluster(cluster).ApisV1alpha1().APIExports().Get(cacheclient.WithShardInContext(ctx, shard.New("root")), b.resourceName, metav1.GetOptions{})
 	case "APIResourceSchema":
 		return b.cacheKcpClusterClient.Cluster(cluster).ApisV1alpha1().APIResourceSchemas().Get(cacheclient.WithShardInContext(ctx, shard.New("root")), b.resourceName, metav1.GetOptions{})
+	case "ClusterWorkspaceShard":
+		return b.cacheKcpClusterClient.Cluster(cluster).TenancyV1alpha1().ClusterWorkspaceShards().Get(cacheclient.WithShardInContext(ctx, shard.New("root")), b.resourceName, metav1.GetOptions{})


broader question: why are we using strongly typed (as opposed to dynamic) clients and informers in here?

leads to more readable code in general and we also make sure that strongly typed clients/informers can be used with the cache server

Are we doing special logic for each? Or the same? Convince me that we're not going to gain a case in this switch statement for every type ... ?

We won't because caching is expensive :)

My main goal was to test the kcp client with the cache.
Plus individual tests read well (i.e.

kcp/test/e2e/reconciler/cache/replication_test.go

Line 165 in 93e69af

_, err := cacheKcpClusterClient.Cluster(cluster).ApisV1alpha1().APIResourceSchemas().Update(cacheclient.WithShardInContext(ctx, shard.New("root")), cachedSchema, metav1.UpdateOptions{})

).

If we were to use the dynamic client we would have to use unstructured

OK - having two or three types here is fine. If we end up with fifteen, then not so much :)

Also was concerned we're re-writing tons and tons of boilerplate for each type when the logic to handle it is identical. Won't we need to use dynamic clients to handle replication claims?

Also was concerned we're re-writing tons and tons of boilerplate for each type when the logic to handle it is identical

Usually creation and a spec update are different. Initially I captured it with a baseScenario idea but Stefan didn't like it (#2240 (comment)).

Maybe we could have a separate test for testing only the kcp client and then change the replication tests to be more dynamic. I think I could do that, wdyt?

Whichever you think is better.

okay, i can prepare something and show it you, we can always trash it if we don't like it.

this can be done in a separate PR, let's not block this pr on it.

…ent cache and global resources (cache server) using indexers.AddIfNotPresentOrDie for consistency sake Signed-off-by: Frederic Giloux <fgiloux@redhat.com>

fgiloux · 2022-11-28T11:02:14Z

@ncdc @stevekuznetsov provided the change from typed to dynamic client in test file is made in a separate PR, is there anything left preventing this PR to get merged?

stevekuznetsov · 2022-11-28T13:48:57Z

Nope, this looks good to me - @p0lyn0mial PTAL

/lgtm
/approve
/hold

openshift-ci · 2022-11-28T13:49:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stevekuznetsov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [stevekuznetsov]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

p0lyn0mial · 2022-11-28T13:51:47Z

/lgtm

p0lyn0mial · 2022-11-28T13:51:58Z

/hold cancel

fgiloux requested a review from p0lyn0mial November 18, 2022 10:46

openshift-ci bot requested review from csams and s-urbaniak November 18, 2022 10:46

fgiloux force-pushed the shards-2-cacheserver branch from ef7f921 to cb49b09 Compare November 18, 2022 11:22

fgiloux requested a review from stevekuznetsov November 18, 2022 13:21

p0lyn0mial reviewed Nov 18, 2022

View reviewed changes

ncdc reviewed Nov 18, 2022

View reviewed changes

fgiloux force-pushed the shards-2-cacheserver branch from cb49b09 to 5344ed2 Compare November 18, 2022 16:06

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 22, 2022

fgiloux force-pushed the shards-2-cacheserver branch from 5344ed2 to 88c4b8f Compare November 22, 2022 12:57

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 22, 2022

Adding ClusterWorkspaceShard to the resources stored in the cache server

93e69af

Signed-off-by: Frederic Giloux <fgiloux@redhat.com>

fgiloux force-pushed the shards-2-cacheserver branch from 88c4b8f to 93e69af Compare November 22, 2022 13:21

stevekuznetsov reviewed Nov 22, 2022

View reviewed changes

renaming "cache" prefix to "global" to avoid ambiguity between go-cli…

ee45c65

…ent cache and global resources (cache server) using indexers.AddIfNotPresentOrDie for consistency sake Signed-off-by: Frederic Giloux <fgiloux@redhat.com>

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 28, 2022

openshift-ci bot assigned stevekuznetsov Nov 28, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 28, 2022

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 28, 2022

openshift-ci bot assigned p0lyn0mial Nov 28, 2022

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 28, 2022

openshift-merge-robot merged commit ee2d2d1 into kcp-dev:main Nov 28, 2022

fgiloux mentioned this pull request Dec 2, 2022

🌱 Simplify the addition of new resources to the replication e2e test #2449

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Adding ClusterWorkspaceShard to the resources stored in the cache server #2381

✨ Adding ClusterWorkspaceShard to the resources stored in the cache server #2381

fgiloux commented Nov 18, 2022

fgiloux commented Nov 18, 2022

p0lyn0mial left a comment •

edited

Loading

p0lyn0mial Nov 18, 2022

ncdc Nov 18, 2022

p0lyn0mial Nov 18, 2022

fgiloux Nov 18, 2022

stevekuznetsov Nov 18, 2022

p0lyn0mial Nov 21, 2022

stevekuznetsov Nov 21, 2022

p0lyn0mial Nov 22, 2022

ncdc Nov 18, 2022

fgiloux commented Nov 18, 2022

fgiloux commented Nov 22, 2022

stevekuznetsov left a comment

stevekuznetsov Nov 22, 2022

fgiloux Nov 23, 2022

p0lyn0mial Nov 23, 2022

stevekuznetsov Nov 22, 2022

p0lyn0mial Nov 22, 2022

stevekuznetsov Nov 22, 2022

p0lyn0mial Nov 22, 2022

stevekuznetsov Nov 22, 2022

p0lyn0mial Nov 23, 2022

stevekuznetsov Nov 23, 2022

p0lyn0mial Nov 23, 2022

p0lyn0mial Nov 23, 2022

fgiloux commented Nov 28, 2022

stevekuznetsov commented Nov 28, 2022

openshift-ci bot commented Nov 28, 2022

p0lyn0mial commented Nov 28, 2022

p0lyn0mial commented Nov 28, 2022

✨ Adding ClusterWorkspaceShard to the resources stored in the cache server #2381

✨ Adding ClusterWorkspaceShard to the resources stored in the cache server #2381

Conversation

fgiloux commented Nov 18, 2022

Summary

Related issue(s)

fgiloux commented Nov 18, 2022

p0lyn0mial left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fgiloux commented Nov 18, 2022

fgiloux commented Nov 22, 2022

stevekuznetsov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fgiloux commented Nov 28, 2022

stevekuznetsov commented Nov 28, 2022

openshift-ci bot commented Nov 28, 2022

p0lyn0mial commented Nov 28, 2022

p0lyn0mial commented Nov 28, 2022

p0lyn0mial left a comment •

edited

Loading