-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📖 CAEP: Add support for infrastructure cluster resources to be managed externally #4135
📖 CAEP: Add support for infrastructure cluster resources to be managed externally #4135
Conversation
9fead58
to
b025329
Compare
Great!!! Looking forward to putting my Pentium back again in the game. btw: s/introudce/introduce/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. Maybe a little more detail required. Just make sure if there's any new terms, refer to the glossary, and use personas already listed there as best suits.
|
||
### Security Model | ||
|
||
When unmanaged no additional privileges for a cloud provider need to be given to CAPI other than the required to manage machines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to have a section that deals with single controller multitenancy, and in addition modify the multitenancy contract doc in #4074 within this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how this would affect single controller multitenancy? In such a model privileges are given for each infraResource so this would be transparent right? can you elaborate a bit?
|
||
**Unmanaged** | ||
|
||
- The provider infraCluster controller will skip any infrastructure reconciliation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The provider infraCluster controller will skip any infrastructure reconciliation. | |
- The provider infraCluster controller will skip any infrastructure reconciliation with the exception of read-only queries to ensure the operation of the related cloud provider integration, e.g. required tags for working of the AWS Cloud Provider.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@randomvariable I know we discuss about this but would you mind refreshing my mind where specifically you see the need to read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC there are some requirements for the cloud-controller-manager that the AWSCluster controller verifies at the moment. I think checking that tags have been set on the VPC correctly was one of them?
I think it's reasonable that if a InfraCluster controller normally would perform validation of the environment it has created as a requirement for parts of core K8s to function, then we should also do this validation on an unmanaged cluster.
Though, we should make sure the expectations of the environment are documented well so users can ensure their infrastructure provisioning can set the right tags etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cloud-controller-manager that the AWSCluster controller verifies at the moment.
@JoelSpeed @randomvariable I see, we have a list of this somewhere atm?
Another thing I think we should go into more details is why the unmanaged field is useful based on our discussions - that we need information from the infraCluster object in order to inform the infraMachine controller what to do without a lot of repetition. Given our discussions, maybe walk through the example with the CAPA controller. A flow diagram may also be useful. |
b025329
to
b7677c7
Compare
@randomvariable You mean the info inferred from the infraCluster when managed as a reason to reuse the CR to support unmanaged? |
b7677c7
to
cca34bd
Compare
Open questions: @vincepri #4135 (comment) isn't this what the existing unmanaged VPC approach does /~https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/0987c8d9f16a4f891d05a622aa92b4592ad5c154/api/v1alpha3/types.go#L244 but we'd need to teach it to not reconcile sg and LBs, hence we'd use the externalManagedField for this, i.e the contract for an externalManaged AWSCluster is the vpc ID. Could you please provide specific details of what are you missing from the proposal in its current form? |
@enxebre I've outlined most of the thinking in #4135 (comment) — Yes there is going to be contract changes for the InfraCluster controllers, although those can be made generic based on a signal (annotation) that's common across all infrastructure providers. |
Hi folks, a few of us met today to discuss some details of this proposal, and wanted to provide a summary and outline next step. The proposal is focused only on getting the InfrastructureCluster bits to be externally managed, while keeping the infrastructure machine controller intact, at least for now. A few considerations / things to decide:
|
cca34bd
to
0649740
Compare
@vincepri @randomvariable @CecileRobertMichon @detiber @fabriziopandini do you have any additional feedback after latest changes pushed? Also can anyone clarify why couldn't we just reuse the existing unmanagedVPC and teach it to not reconcile the lb when the field is not set and SGs as suggested above? |
This point should probably be tackled in the load balancer proposal later, cc @randomvariable @detiber |
There is no way to predict what would happen in this scenario. | ||
The InfraCluster controller would start attempting to reconcile infrastructure that it did not create, and therefore, there may be assumptions it makes that mean it cannot manage this infrastructure. | ||
|
||
To prevent this, we will have to implement (in the InfraCluster webhook) a means to prevent users converting externally managed InfraClusters into managed InfraClusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding this as a requirement for the external management system up in the document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern with making this a requirement of the external management is that we will have to duplicate that across each external system right? If this is a core requirement of this contract and the check just needs to be, if this annotation is present, don't let it be changed, that feels like it could be part of the core validation?
I guess we would need to implement this for each provider anyway so maybe not much is lost 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I would prefer to lust this as a responsibility of the External providers, because I don't see how this can be centralised given that we are talking about InfraCluster objects and one of the of the goal of this KEP is to avoid changes to the existing infrastructure providers.
If instead you can come out with a solution that centralises this responsibility, +1 to that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about a generic validating webhook that could be implemented to look at the metadata only (therefore could accept any type), this code could be provided centrally and generic, but providers or external infrastructure providers would need to set it up to look at the correct type
Ie all they would need to do is configure the ValidatingWebhookConfiguration correctly. Do you think something like this would work/be useful?
|
||
- Write a kubectl plugin that allows a user to mark their InfraCluster resources as ready | ||
|
||
- Add a secondary annotation to this contract that causes the provider InfraCluster controller to mark resources as ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that this requires all the infrastructure providers to reconcile the externally managed infra clusters, I'm +1 to remove this option from the list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's value in keeping it so we know it's been discussed, but maybe we can expand on this later to explain why we don't want to go down this route?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok to keep to as a record, but please add a note reporting the outcome of the discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this been done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also prefer to not implement this option as it's basically additional complexity in ClusterAPI because of a (current) limitation of kubectl which will probably be resolved mid-term: [WIP] kubectl: support --subresource flag #99556
/kind proposal |
@enxebre are we waiting for any reviewer's lgtm? |
Action item: Add some text on the decision about how do we manage or integrate with load balancer proposal |
I've re-reviewed the load balancer proposal today and 'm not sure there is anything worth noting in this proposal with regards to it. My evaluation from #4135 (comment) still stands.
The only thing I found, was that the Load Balancer proposal should (I don't think it does yet) specify some mechanism for signalling to InfraCluster providers that they should not be creating a control plane LB as this is handled separately. However that manifests (either by some annotation or changing the InfraCluster contract to check the Cluster resource or whatever), it will mean updating the InfraCluster contract documentation, which this proposal already notes that external infra providers must fulfil, so I don't think it is worth calling out explicitly. @detiber I would appreciate if you could have a double check of my evaluation in case I've misinterpreted the LB proposal or missed some of the details @vincepri Was there some specific interaction you were concerned about that I might not have had in mind while reviewing the two proposals? I can double check or provide further context if there are specific concerns, but at the moment I don't see real overlap here given the ordering and precedence set out in the LB proposal |
Broadly agree with the approach. I think @JoelSpeed 's comments are correct here. +1 on proceeding with this. |
I think we're good to proceed, the explanation above seems in line with my assumptions, although it's good to have a sounding board. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/hold
1 week lazy-consensus
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vincepri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Let's merge this and proceed with the implementation since lazy consensus week is over today and does not seem to be any objection. |
This PR drops the ExternalInfraCluster CRD in favour of AWSCluster. Originally we added support for externally managed infra in CAPA via the ExternalInfraCluster CRD kubernetes-sigs/cluster-api-provider-aws#2124 and we used that commit of CAPA in hypershift. Later on we decided to revert that approach upstream and reuse the existing ${infra}Cluster CRDs with an annotation to support externally managed infrastructure kubernetes-sigs/cluster-api#4135 This PR bring latest CAPI/CAPA with one additional patch on top kubernetes-sigs/cluster-api#4709 kubernetes-sigs/cluster-api-provider-aws#2453 to avoid running webhooks. As a follow up we need to rebuild the images from the main branch once those patches are merged or otherwise enable webhooks.
This PR drops the ExternalInfraCluster CRD in favour of AWSCluster. Originally we added support for externally managed infra in CAPA via the ExternalInfraCluster CRD kubernetes-sigs/cluster-api-provider-aws#2124 and we used that commit of CAPA in hypershift. Later on we decided to revert that approach upstream and reuse the existing ${infra}Cluster CRDs with an annotation to support externally managed infrastructure kubernetes-sigs/cluster-api#4135 This PR bring latest CAPI/CAPA with one additional patch on top kubernetes-sigs/cluster-api#4709 kubernetes-sigs/cluster-api-provider-aws#2453 to avoid running webhooks. As a follow up we need to rebuild the images from the main branch once those patches are merged or otherwise enable webhooks.
This PR drops the ExternalInfraCluster CRD in favour of AWSCluster. Originally we added support for externally managed infra in CAPA via the ExternalInfraCluster CRD kubernetes-sigs/cluster-api-provider-aws#2124 and we used that commit of CAPA in hypershift. Later on we decided to revert that approach upstream and reuse the existing ${infra}Cluster CRDs with an annotation to support externally managed infrastructure kubernetes-sigs/cluster-api#4135 This PR bring latest CAPI/CAPA. As a follow up we need to rebuild the images and consume them from quay.io/hypershift.
This PR drops the ExternalInfraCluster CRD in favour of AWSCluster. Originally we added support for externally managed infra in CAPA via the ExternalInfraCluster CRD kubernetes-sigs/cluster-api-provider-aws#2124 and we used that commit of CAPA in hypershift. Later on we decided to revert that approach upstream and reuse the existing ${infra}Cluster CRDs with an annotation to support externally managed infrastructure kubernetes-sigs/cluster-api#4135 This PR bring latest CAPI/CAPA. As a follow up we need to rebuild the images and consume them from quay.io/hypershift.
This PR drops the ExternalInfraCluster CRD in favour of AWSCluster. Originally we added support for externally managed infra in CAPA via the ExternalInfraCluster CRD kubernetes-sigs/cluster-api-provider-aws#2124 and we used that commit of CAPA in hypershift. Later on we decided to revert that approach upstream and reuse the existing ${infra}Cluster CRDs with an annotation to support externally managed infrastructure kubernetes-sigs/cluster-api#4135 This PR bring latest CAPI/CAPA. As a follow up we need to rebuild the images and consume them from quay.io/hypershift.
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #4095