Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Adding ClusterWorkspaceShard to the resources stored in the cache server #2381

Merged
merged 2 commits into from
Nov 28, 2022

Conversation

fgiloux
Copy link
Contributor

@fgiloux fgiloux commented Nov 18, 2022

Summary

ClusterWorkspaceShards are created and stored in the root workspace. They contain the URLs to access the shards. This information is necessary to construct the URLs for accessing APIExport services that will get stored into the status of APIExportEndpointSlices.

APIExports and APIExportEndpointSlices can be created in any workspace in any shard. This is a reason for making
ClusterWorkspaceShards globally available.

The number of ClusterWorkspaceShards is also not high and changes should be rare, hence the number of writes and the cost should be low.

Signed-off-by: Frederic Giloux fgiloux@redhat.com

Related issue(s)

#2332

@fgiloux fgiloux requested a review from p0lyn0mial November 18, 2022 10:46
@openshift-ci openshift-ci bot requested review from csams and s-urbaniak November 18, 2022 10:46
@fgiloux fgiloux force-pushed the shards-2-cacheserver branch from ef7f921 to cb49b09 Compare November 18, 2022 11:22
@fgiloux
Copy link
Contributor Author

fgiloux commented Nov 18, 2022

/retest

Copy link
Contributor

@p0lyn0mial p0lyn0mial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good.
I'd like to get approval from @sttts before we merge it.

pkg/cache/server/bootstrap/bootstrap.go Outdated Show resolved Hide resolved
Name: resourceName,
},
Spec: tenancyv1alpha1.ClusterWorkspaceShardSpec{
BaseURL: "https://base.kcp.test.dev",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is okay for now, it might break when this test is run on a kcp instance that is shared with other tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something worth avoiding up front. What about using GenerateName instead of Name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a real address or a way to disable this shard from scheduling. We might try to put something on that shard in a multi shard env.

Maybe for this particular test scenario we should actually use framework.PrivateKcpServer ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feeling is that we should build the tests and the business logic in a way that is not sensible to that. If a node is not ready in Kubernetes no new pod gets scheduled onto it. I am wondering whether we would need a similar mechanism in kcp for the shards at some point.
Nevertheless we should forge the multi-shard tests so that the placement is constrained to a shard, which is know to be functional (or not for negative tests). It is good to have as few explicit or implicit dependencies between the test cases.
The alternative is to start an additional kcp server to have a "real address", which may make the tests slower. I would favor the other approach for this reason. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NACK to private KCP in every case. Why don't we have shard scheduling control?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be okay to add an annotation to the shard created by this test that would tell the scheduler to not consider the shard?

I think we would have to update the isValidShard(/~https://github.com/kcp-dev/kcp/blob/main/pkg/reconciler/tenancy/clusterworkspace/clusterworkspace_reconcile_scheduling.go#L229) function to look up for the special annotation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be a good approach in the near-term, but do we have any idea if we're going to add some sort of scheduling attributes to the shards for the long-term?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that by default we will schedule randomly and we will allow for specifying a selector for those that have opinions

pkg/cache/server/bootstrap/bootstrap.go Outdated Show resolved Hide resolved
Name: resourceName,
},
Spec: tenancyv1alpha1.ClusterWorkspaceShardSpec{
BaseURL: "https://base.kcp.test.dev",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something worth avoiding up front. What about using GenerateName instead of Name?

test/e2e/reconciler/cache/replication_test.go Outdated Show resolved Hide resolved
test/e2e/reconciler/cache/replication_test.go Outdated Show resolved Hide resolved
test/e2e/reconciler/cache/replication_test.go Outdated Show resolved Hide resolved
@fgiloux fgiloux force-pushed the shards-2-cacheserver branch from cb49b09 to 5344ed2 Compare November 18, 2022 16:06
@fgiloux
Copy link
Contributor Author

fgiloux commented Nov 18, 2022

/retest

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 22, 2022
@fgiloux fgiloux force-pushed the shards-2-cacheserver branch from 5344ed2 to 88c4b8f Compare November 22, 2022 12:57
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 22, 2022
Signed-off-by: Frederic Giloux <fgiloux@redhat.com>
@fgiloux fgiloux force-pushed the shards-2-cacheserver branch from 88c4b8f to 93e69af Compare November 22, 2022 13:21
@fgiloux
Copy link
Contributor Author

fgiloux commented Nov 22, 2022

/retest

Copy link
Contributor

@stevekuznetsov stevekuznetsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly LGTM

@@ -80,10 +82,19 @@ func NewController(
return nil, err
}

if err := cacheKcpInformers.Tenancy().V1alpha1().ClusterWorkspaceShards().Informer().AddIndexers(cache.Indexers{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a helper for indexers.AddIfNotPresentOrDie - should we use it for these, and if not, why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason why not to use it but will let @p0lyn0mial comment on it as it is just and addition following the existing pattern in the controller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know we have indexers.AddIfNotPresentOrDie. I don't like its side effects:

  • it removes entries from toAdd map
  • it panics on error, why it cannot simply return an error?

@@ -445,8 +548,10 @@ func (b *replicateResourceScenario) getCachedResourceHelper(ctx context.Context,
return b.cacheKcpClusterClient.Cluster(cluster).ApisV1alpha1().APIExports().Get(cacheclient.WithShardInContext(ctx, shard.New("root")), b.resourceName, metav1.GetOptions{})
case "APIResourceSchema":
return b.cacheKcpClusterClient.Cluster(cluster).ApisV1alpha1().APIResourceSchemas().Get(cacheclient.WithShardInContext(ctx, shard.New("root")), b.resourceName, metav1.GetOptions{})
case "ClusterWorkspaceShard":
return b.cacheKcpClusterClient.Cluster(cluster).TenancyV1alpha1().ClusterWorkspaceShards().Get(cacheclient.WithShardInContext(ctx, shard.New("root")), b.resourceName, metav1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broader question: why are we using strongly typed (as opposed to dynamic) clients and informers in here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leads to more readable code in general and we also make sure that strongly typed clients/informers can be used with the cache server

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing special logic for each? Or the same? Convince me that we're not going to gain a case in this switch statement for every type ... ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't because caching is expensive :)

My main goal was to test the kcp client with the cache.
Plus individual tests read well (i.e.

_, err := cacheKcpClusterClient.Cluster(cluster).ApisV1alpha1().APIResourceSchemas().Update(cacheclient.WithShardInContext(ctx, shard.New("root")), cachedSchema, metav1.UpdateOptions{})
).

If we were to use the dynamic client we would have to use unstructured

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - having two or three types here is fine. If we end up with fifteen, then not so much :)

Also was concerned we're re-writing tons and tons of boilerplate for each type when the logic to handle it is identical. Won't we need to use dynamic clients to handle replication claims?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also was concerned we're re-writing tons and tons of boilerplate for each type when the logic to handle it is identical

Usually creation and a spec update are different. Initially I captured it with a baseScenario idea but Stefan didn't like it (#2240 (comment)).

Maybe we could have a separate test for testing only the kcp client and then change the replication tests to be more dynamic. I think I could do that, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whichever you think is better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, i can prepare something and show it you, we can always trash it if we don't like it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be done in a separate PR, let's not block this pr on it.

…ent cache and global resources (cache server)

using indexers.AddIfNotPresentOrDie for consistency sake

Signed-off-by: Frederic Giloux <fgiloux@redhat.com>
@fgiloux
Copy link
Contributor Author

fgiloux commented Nov 28, 2022

@ncdc @stevekuznetsov provided the change from typed to dynamic client in test file is made in a separate PR, is there anything left preventing this PR to get merged?

@stevekuznetsov
Copy link
Contributor

Nope, this looks good to me - @p0lyn0mial PTAL

/lgtm
/approve
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 28, 2022
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 28, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 28, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stevekuznetsov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 28, 2022
@p0lyn0mial
Copy link
Contributor

/lgtm

@p0lyn0mial
Copy link
Contributor

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 28, 2022
@openshift-merge-robot openshift-merge-robot merged commit ee2d2d1 into kcp-dev:main Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants