Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aws-ecs-2 variants #3273

Merged
merged 4 commits into from
Jul 26, 2023
Merged

Conversation

arnaldo2792
Copy link
Contributor

@arnaldo2792 arnaldo2792 commented Jul 19, 2023

Issue number:

Closes #2164

Description of changes:

This adds the aws-ecs-2 variants family. Features included in these variants include:

  • Secure Boot enabled
  • Kernel 6.1
  • Use XFS for the data partition
  • Support for cgroups v2 by default

Testing done:

  • Internal ECS testing suite ran successfully for aws-ecs-2
  • Internal ECS testing suite ran successfully for aws-ecs-2-nvidia

I confirmed all the mentioned features are enabled:

aws-ecs-2

bash-5.1# apiclient get os
{
  "os": {
    "arch": "x86_64",
    "build_id": "24877a20",
    "pretty_name": "Bottlerocket OS 1.15.0 (aws-ecs-2)",
    "variant_id": "aws-ecs-2",
    "version_id": "1.15.0"
  }

### New kernel
bash-5.1# uname -r
6.1.29

### Secure Boot enabled
bash-5.1# journalctl -k | grep Secure
Jul 19 01:13:09 localhost kernel: Secure boot enabled

### cgroupsV2
bash-5.1# stat -fc %T /sys/fs/cgroup/
cgroup2fs

### Boot configs
[ssm-user@control]$ apiclient get settings.boot
{
  "settings": {
    "boot": {
      "init": {
        "systemd.log_level": [
          "info"
        ]
      },
      "kernel": {
        "console": [
          "tty0",
          "ttyS1,115200n8"
        ]
      }
    }
  }
}
[ssm-user@control]$ cat /proc/bootconfig
kernel.console = "tty0", "ttyS1,115200n8"
init.systemd.log_level = "info"


### XFS for data partition
bash-5.1# xfs_spaceman -c info /local
meta-data=/dev/nvme1n1p1         isize=512    agcount=41, agsize=130816 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=5242368, imaxpct=25
         =                       sunit=128    swidth=128 blks
naming   =version 2              bsize=16384  ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

aws-ecs-2-nvidia

bash-5.1# apiclient get os
{
  "os": {
    "arch": "x86_64",
    "build_id": "24877a20",
    "pretty_name": "Bottlerocket OS 1.15.0 (aws-ecs-2-nvidia)",
    "variant_id": "aws-ecs-2-nvidia",
    "version_id": "1.15.0"
  }
}

### New kernel
bash-5.1# uname -r
6.1.29

### Secure Boot enabled
bash-5.1# journalctl -k | grep Secure
Jul 19 16:38:26 localhost kernel: Secure boot enabled

### cgroupsV2
bash-5.1# stat -fc %T /sys/fs/cgroup/
cgroup2fs

### Boot configs
[ssm-user@control]$ apiclient get settings.boot
{
  "settings": {
    "boot": {
      "init": {
        "systemd.log_level": [
          "info"
        ]
      },
      "kernel": {
        "console": [
          "tty0",
          "ttyS1,115200n8"
        ]
      }
    }
  }
}
[ssm-user@control]$ cat /proc/bootconfig
kernel.console = "tty0", "ttyS1,115200n8"
init.systemd.log_level = "info"


### XFS for data partition
bash-5.1# xfs_spaceman -c info /local
meta-data=/dev/nvme1n1p1         isize=512    agcount=37, agsize=130816 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=4718080, imaxpct=25
         =                       sunit=128    swidth=128 blks
naming   =version 2              bsize=16384  ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

### NVIDIA driver working:
bash-5.1# docker exec 6443448d5e32 nvidia-smi -L
GPU 0: Tesla T4 (UUID: GPU-69f2de54-cb1b-153e-bfdf-81bf722ad6ed)

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Copy link
Contributor

@stmcginnis stmcginnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far, just some minor issues pointed out.

sources/models/src/lib.rs Outdated Show resolved Hide resolved
sources/models/src/lib.rs Outdated Show resolved Hide resolved
sources/models/src/lib.rs Outdated Show resolved Hide resolved
@arnaldo2792
Copy link
Contributor Author

Forced push includes:

  • Fix references, and point to the folder instead of symlinks
  • Remove cgroupdriver=cgroupfs from docker config for NVIDIA

Copy link
Contributor

@yeazelm yeazelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also want to update /~https://github.com/bottlerocket-os/bottlerocket/blob/develop/QUICKSTART-ECS.md to discuss why one might choose aws-ecs-1 vs aws-ecs-2 and once we release we will want to update this guide to use latest. Otherwise, looks like a good start!

@arnaldo2792
Copy link
Contributor Author

(Forced push adds Boot Config APIs for both variants)

@arnaldo2792
Copy link
Contributor Author

We might also want to update /~https://github.com/bottlerocket-os/bottlerocket/blob/develop/QUICKSTART-ECS.md to discuss why one might choose aws-ecs-1 vs aws-ecs-2 and once we release we will want to update this guide to use latest. Otherwise, looks like a good start!

@yeazelm probably the QUICKSTART docs wouldn't be a good place to compare variants. AFAIK, we don't have any docs where we explicitly call out why users should favor specific variants. I do think I can update the docs to point to the aws-ecs-2 variant once it is released, to begin steering people to use it.

@yeazelm
Copy link
Contributor

yeazelm commented Jul 21, 2023

We might also want to update /~https://github.com/bottlerocket-os/bottlerocket/blob/develop/QUICKSTART-ECS.md to discuss why one might choose aws-ecs-1 vs aws-ecs-2 and once we release we will want to update this guide to use latest. Otherwise, looks like a good start!

@yeazelm probably the QUICKSTART docs wouldn't be a good place to compare variants. AFAIK, we don't have any docs where we explicitly call out why users should favor specific variants. I do think I can update the docs to point to the aws-ecs-2 variant once it is released, to begin steering people to use it.

I think at least calling out that we recommend aws-ecs-2 for new launches in the Quickstart would be good. Currently, variants are more or less obvious via naming for which one a user should choose. But for ECS, it might not be as clear that there is no "ECS version 2" orchestrator like one might expect when looking the differences between aws-k8s.1.26 and aws-k8s-1.27. I think we need to have the information captured somewhere and the Quickstart isn't really the right place, but I don't know if there is anywhere better to help a user through the decision for which variant to choose when faced with aws-ecs-1 vs aws-ecs-2. Maybe the right place is the website rather than this repo, but I wanted to address it now so we can have that documentation ready before these variants end up being published.

@stmcginnis
Copy link
Contributor

Maybe in the list of variants we should just go ahead and put something like "(deprecated)" next to the aws-ecs-1 variants so there is an indication that they can use it, but it will eventually be going away so they probably want to use the aws-ecs-2 variants.

But good point that we kind of need something explaining what the differences are between 1 and 2. Maybe something for the project website?

@arnaldo2792
Copy link
Contributor Author

@stockholmux, is there a place on the website where the difference among the different variants for an orchestrator is documented?

@stockholmux
Copy link
Member

@arnaldo2792 It's mostly automatic. As part of the build process, the scripts pick the all the variants in a given minor release.

However, ecs-1 -> ecs-2 is a special case since it's the sole variant for that orchestrator, there are probably some assumptions made in the writing. Would you mind throwing an issue on bottlerocket-os/bottlerocket-project-website so I can track this update?

@arnaldo2792
Copy link
Contributor Author

@stockholmux I created the issue: bottlerocket-os/bottlerocket-project-website#198

@yeazelm, it seems like there is already an automated way to describe what is included in each variant but has to be “adapted” for ECS variants (see discussion above). Does that ease your concern?

@yeazelm
Copy link
Contributor

yeazelm commented Jul 24, 2023

@yeazelm, it seems like there is already an automated way to describe what is included in each variant but has to be “adapted” for ECS variants (see discussion above). Does that ease your concern?

Yes, that works for me!

@stockholmux
Copy link
Member

stockholmux commented Jul 25, 2023

@arnaldo2792 With regards to the QUICKSTART-ECS.md on this repo, perhaps this is right time to transition to the website quickstart for ECS?

The website quickstart shouldn't matter if it's ecs-1 or ecs-2 (grabs the latest from SSM after switching one character). Otherwise, there seems to be only some missing info about regions and nvidia variants, both of which should probably exist on the website eventually, but they seem like a weird fit for a quickstart.

What I would propose is to cut down the QUICKSTART-ECS.md to the places where it provides specific, unique value and point users to bottlerocket.dev. This is similar to what I'm proposing for EKS in #3285, so it seems like good timing.

Does this light anyone's hair aflame?

The docker daemon knows how to detect the cgroups version used in the
host.

Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
@arnaldo2792
Copy link
Contributor Author

(rebase onto develop to include the changes suggested by @webern)

Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
@arnaldo2792
Copy link
Contributor Author

(forced push includes changes to reference ../variants.rs instead of /dev/null)

@arnaldo2792
Copy link
Contributor Author

@arnaldo2792 With regards to the QUICKSTART-ECS.md on this repo, perhaps this is right time to transition to the website quickstart for ECS?

The website quickstart shouldn't matter if it's ecs-1 or ecs-2 (grabs the latest from SSM after switching one character). Otherwise, there seems to be only some missing info about regions and nvidia variants, both of which should probably exist on the website eventually, but they seem like a weird fit for a quickstart.

What I would propose is to cut down the QUICKSTART-ECS.md to the places where it provides specific, unique value and point users to bottlerocket.dev. This is similar to what I'm proposing for EKS in #3285, so it seems like good timing.

Does this light anyone's hair aflame?

I don't disagree, but I would like to keep that as a separate PR

@arnaldo2792 arnaldo2792 requested a review from webern July 26, 2023 17:24
Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Comment on lines +7 to +8
# Don't rebuild crate just because of changes to README.
exclude = ["README.md"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's a bit inconsistent to add this to aws-ecs-2 but not aws-ecs-2-nvidia (or aws-ecs-1 as well). Doesn't actually matter since none of these have a README.md.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I saw that we have the same line for most of the variants. I can clean up all of them in a separate PR.

Copy link
Contributor

@stmcginnis stmcginnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Able to build, and looks fine to me.

@arnaldo2792 arnaldo2792 merged commit 74e5471 into bottlerocket-os:develop Jul 26, 2023
@arnaldo2792 arnaldo2792 deleted the ecs-variants branch January 29, 2024 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add a new aws-ecs variant for kernel 6.1
6 participants