Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aws_launch_template & config vars to enable options to specify pre-bootstrap commands and/or custom AMI for EKS nodes #2621

Closed
wants to merge 9 commits into from

Conversation

joneszc
Copy link
Contributor

@joneszc joneszc commented Aug 13, 2024

Reference Issues or PRs

Fixes #2603
Fixes #2604

What does this implement/fix?

Put a x in the boxes that apply

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds a feature)
  • Breaking change (fix or feature that would cause existing features not to work as expected)
  • Documentation Update
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no API changes)
  • Build related changes
  • Other (please describe):

Testing

  • Did you test the pull request locally?
  • Did you add new tests?

Any other comments?

@viniciusdc
Copy link
Contributor

Hi @joneszc thanks for opening the PR, I will review it later today.

Comment on lines 43 to 60
user_data = base64encode(<<-EOF
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="

--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
%{ if var.node_prebootstrap_command != null }${var.node_prebootstrap_command}%{ endif }

%{ if var.node_prebootstrap_command != null && var.node_groups[count.index].custom_ami != null }--==MYBOUNDARY==%{ endif }
%{ if var.node_prebootstrap_command != null && var.node_groups[count.index].custom_ami != null }Content-Type: text/x-shellscript; charset="us-ascii"%{ endif }
%{ if var.node_prebootstrap_command != null && var.node_groups[count.index].custom_ami == null }%{ else }#!/bin/bash%{ endif }
%{ if var.node_prebootstrap_command != null && var.node_groups[count.index].custom_ami == null }%{ else }set -ex%{ endif }
%{ if var.node_prebootstrap_command != null && var.node_groups[count.index].custom_ami == null }%{ else }B64_CLUSTER_CA=${aws_eks_cluster.main.certificate_authority[0].data}%{ endif }
%{ if var.node_prebootstrap_command != null && var.node_groups[count.index].custom_ami == null }%{ else }API_SERVER_URL=${aws_eks_cluster.main.endpoint}%{ endif }
%{ if var.node_prebootstrap_command != null && var.node_groups[count.index].custom_ami == null }%{ else }/etc/eks/bootstrap.sh ${aws_eks_cluster.main.name} --b64-cluster-ca $B64_CLUSTER_CA --apiserver-endpoint $API_SERVER_URL%{ endif }

--==MYBOUNDARY==--
EOF
Copy link
Contributor

@viniciusdc viniciusdc Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest moving this into /files as a .tpl file and reading it using templatefile(path, vars) , e.g :

base64encode(templatefile("${path.module}/files/user_data.tpl", {
    node_prebootstrap_command = var.node_prebootstrap_command
    node_groups = var.node_groups
    aws_eks_cluster = aws_eks_cluster.main
    count_index = count.index
    cust_ami_node_index = local.cust_ami_node_index
  }))

as this will become easier to maintain

Copy link
Contributor Author

@joneszc joneszc Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viniciusdc

I've implemented the suggested modification with updated main.tf and added files/user_data.tftpl

@@ -118,6 +118,7 @@ class AzureInputVars(schema.Base):
class AWSNodeGroupInputVars(schema.Base):
name: str
instance_type: str
custom_ami: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to follow the aws standard naming, let's use ami_type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hello @viniciusdc,
in this case, the custom_ami prompts the user to input an AMI Id, as opposed to the ami_type (e.g. "CUSTOM"). The corresponding aws var here is image_id. My impression of image_id is that it seems a bit ambiguous for nebari config purposes since it does not make it obvious that an ami is expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viniciusdc how about using ami_image_id ?

Copy link
Contributor

@viniciusdc viniciusdc Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joneszc I see; I was under the impression ami_type would support the ID of a custom AMI as well, but after your comment and re-reding the docs, that does not seem to be true (unfortunately). I agree with you that image_id would also be ambiguous, considering the overall shape of the nebari-config.

Comment on lines +73 to +74
ami_type = var.node_groups[count.index].custom_ami != null ? "CUSTOM" : (var.node_groups[count.index].gpu == true ? "AL2_x86_64_GPU" : "AL2_x86_64")
disk_size = var.node_prebootstrap_command == null && var.node_groups[count.index].custom_ami == null ? 50 : null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing wrong with the way it is right now, but it would be better if "AL2_x86_64_GPU" and "AL2_x86_64" were set as defaults in the python schema itself, so that we could clean this conditionals a bit

Copy link
Contributor Author

@joneszc joneszc Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viniciusdc
ami_type (i.e. AL2_x86_64_GPU/AL2_x86_64) is not fetched from nebari-config.yaml or otherwise processed in __init__.py in either the current state of Nebari or in the scope of the proposed PR. Adding ami_type to the python schema would require a separate update.

@viniciusdc
Copy link
Contributor

I am leaving this as future ref. not to be implemented in this PR, we could also have something like this:

class AMIConfig(BaseModel):
    type: str = "CUSTOM"
    id: Optional[str] = Field(default=None, description="AMI ID. Required if type is set to CUSTOM")

    @RootValidator
    def check_custom_ami_id(cls, values):
        ami_type = values.get('type')
        ami_id = values.get('id')
        if ami_type == "CUSTOM" and not ami_id:
            raise ValueError('AMI ID must be provided when type is CUSTOM.')
        return values

class AWSNodeGroupInputVars(BaseModel):
    name: str
    instance_type: str
    ami: Optional[AWSConfig] = None

The benefit would be that we can later extend this to include the AL2_x86_64_GPU/AL2_x86_64 to clean the conditionals logic in the provider itself. Like type: Literal["CUSTOM", "AL2_x86_64_GPU", "AL2_x86_64"]

@viniciusdc
Copy link
Contributor

viniciusdc commented Aug 20, 2024

@joneszc, So far, this looks good to me; I will do another final pass later this week so that we can include it as part of this month's release. We have this security alert above suggesting us to make the IMDS token a requirements for the launch_template metadata. I don't see any issues with enabling it as these templates are non-default, though I wonder if you could try that inclusion in your current test worspace

@viniciusdc
Copy link
Contributor

oh btw, the terraform fmt plugin on pre-commit is a bit pedantic with variable spaces, so make sure all variables have the same indentation level:

endpoint_private_access   = var.eks_endpoint_private_access
node_prebootstrap_command = var.node_prebootstrap_command
public_access_cidrs       = var.eks_public_access_cidrs
permissions_boundary      = var.permissions_boundary

@viniciusdc viniciusdc added the status: in review 👀 This PR is currently being reviewed by the team label Aug 20, 2024
@tylergraff
Copy link
Contributor

@viniciusdc is your review still in progress or is this work good to be merged?

@viniciusdc
Copy link
Contributor

Hi @tylergraff I will be looking into that today. It all looks good to me already, I just want to do another pass in the main logic for conditioning the template resource itself.

@tylergraff
Copy link
Contributor

Is the failing local-integration test case something my team needs to look into?

@viniciusdc
Copy link
Contributor

Hi @tylergraff

Is the failing local integration test case something my team needs to look into?
That's a separate issue that I am trying to fix in another PR #2337. There were some recent changes that we were not catching, and they are now interfering with new PRs—that's why this review process has been a bit slow on my part.

Copy link
Contributor

@viniciusdc viniciusdc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @joneszc, I did another pass today and deployed this last Friday to make sure terraform didn't complain about any missing vars. All looks good. I just have a final suggestion regarding the user_data contents

Comment on lines +9 to +14
%{ if split_user_data == true }Content-Type: text/x-shellscript; charset="us-ascii"%{ endif }
%{ if include_bootstrap_cmd == true }#!/bin/bash%{ endif }
%{ if include_bootstrap_cmd == true }set -ex%{ endif }
%{ if include_bootstrap_cmd == true }B64_CLUSTER_CA=${cluster_cert_authority}%{ endif }
%{ if include_bootstrap_cmd == true }API_SERVER_URL=${cluster_endpoint}%{ endif }
%{ if include_bootstrap_cmd == true }/etc/eks/bootstrap.sh ${cluster_name} --b64-cluster-ca $B64_CLUSTER_CA --apiserver-endpoint $API_SERVER_URL%{ endif }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these lines could be passed as part of the user's nore_prebootstrap_command instead of being present in the template itself. I am okay with providing the variables by default, but I would prefer that any user's logic be passed down via "overwriting" or variables. Maybe we could have in the schema a user_data_template wich could be a filepath

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viniciusdc please implement the changes you recommend. Let me know if you do not have access to commit to the MetroStar fork.

@tylergraff
Copy link
Contributor

@viniciusdc should this PR be closed given that the changes will be made in #2668?

@tylergraff
Copy link
Contributor

Closing because the functionality in this PR is duplicated in #2668, and this PR now has conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: in review 👀 This PR is currently being reviewed by the team
Projects
None yet
3 participants