Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.26.0 ⚱️Tracking Issue #4253

Closed
6 tasks done
KCSesh opened this issue Oct 23, 2024 · 11 comments · Fixed by #4254
Closed
6 tasks done

v1.26.0 ⚱️Tracking Issue #4253

KCSesh opened this issue Oct 23, 2024 · 11 comments · Fixed by #4254
Labels
type/enhancement New feature or request

Comments

@KCSesh
Copy link
Contributor

KCSesh commented Oct 23, 2024

This is a "living" issue used to track bug fixes, features, and things landing in the next release of Bottlerocket v1.26.0.

This is a convenient way for the maintainers and the community to have a birds eye view of what's happening in any given Bottlerocket release. You'll find things we're targeting for the release, target dates, and important maintenance tasks.

But anything is subject to change, this isn't a guarantee anything will actually land in a given release, and this tracking issue is not all encompassing. For a full comparison of new things for this release, use the GitHub /compare feature and check the differences between the 1.24.x branch and develop.

Release captains 🧑‍✈️
@KCSesh @ginglis13 @rpkelly

Targeting 🎯

Maintenance 🔧

@rpkelly
Copy link
Contributor

rpkelly commented Oct 24, 2024

The 1.26.0 has passed validation and will be published shortly. AMIs are public and the SSM parameters should be available for this release or will be shortly. Watch for the release at /~https://github.com/bottlerocket-os/bottlerocket/releases to signal when the release is complete.

@Veronica4036
Copy link
Contributor

Veronica4036 commented Oct 24, 2024

There seem to be some compatibility issues with the latest version of Bottlerocket when running Java-based applications.

#4260

@cavalcantigor
Copy link

☝️ same for NodeJS based applications.

#4261

@patkinson01
Copy link

And .NET apps. Python is ok :)

NAME READY STATUS RESTARTS AGE
hello-world-dotnet-primary-8489bf754d-9vmts 0/1 CrashLoopBackOff 2 (10s ago) 34s
hello-world-dotnet-primary-8489bf754d-hfj7w 0/1 CrashLoopBackOff 78 (25s ago) 6h20m
hello-world-java-primary-85c6c4f5b-mxs67 0/1 CrashLoopBackOff 77 (44s ago) 6h11m
hello-world-java-primary-85c6c4f5b-q4lt7 0/1 CrashLoopBackOff 72 (2m15s ago) 5h47m
hello-world-nodejs-primary-5d6d5fd7f8-m9x6b 0/1 CrashLoopBackOff 74 (118s ago) 5h57m
hello-world-nodejs-primary-5d6d5fd7f8-v7cc2 0/1 CrashLoopBackOff 77 (2m27s ago) 6h11m
hello-world-python-primary-5ddd8bd9dc-8wsl7 1/1 Running 0 6h21m
hello-world-python-primary-5ddd8bd9dc-lwm7v 1/1 Running 0 5h47m

@portswigger-tim
Copy link

portswigger-tim commented Oct 24, 2024

This change seems... interesting... bottlerocket-os/bottlerocket-core-kit#158 - not sure it should have been "LGTMd 👍" - with very little contest as to why it is a good thing and the potential side effects of such a change that will "block writable/executable memory for all services".

Note that this option is incompatible with programs and libraries that generate program code dynamically
at runtime, including JIT execution engines, executable stacks, and code "trampoline" feature of various
C compilers.

This option improves service security, as it makes harder for software exploits to change running
code dynamically. However, the protection can be circumvented, if the service can write to a filesystem,
which is not mounted with noexec (such as /dev/shm), or it can use memfd_create(). 

Seems quite clear that it'll cause havoc with a lot of things... (from: https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#:~:text=MemoryDenyWriteExecute)

@Amr-Aly
Copy link

Amr-Aly commented Oct 24, 2024

We faced an issue basically with any language running on VM including JVM based services.

@bcressey
Copy link
Contributor

This change seems... interesting... bottlerocket-os/bottlerocket-core-kit#158 - not sure it should have been "LGTMd 👍" - with very little contest as to why it is a good thing and the potential side effects of such a change that will "block writable/executable memory for all services".

That is a fair point. As one of the reviewers, I can say I didn't anticipate the interaction with the runc child processes launched by the containerd service via CRI that led to this disruption.

My mental model is that runc is very careful about setting up the container process. Often that involves resetting privileges like capabilities to a particular known state. It's easy to over-generalize from that to "in the course of setting up isolation, runc will correct anything in the environment that might unexpectedly affect the process". Which is sometimes the case but not in this case.

I wish I had flagged this in review. But failing that, and to allow for the occasional imperfect human review, I would have expected this to be caught by testing. That is the main takeaway for me: we need better coverage of JVM and NodeJS applications, or else better signal when those are failing.

@misterek
Copy link

I'm slightly gloming onto this issue. But, is this a good opportunity to discuss having some delayed mechanism for updating nodes? If I'm not mistaken, the "auto update" for both Karpenter and Bottle Rocket Updater just always goes to the newest release. Is there any way we can look at having some time delayed releases? It would be nice to have: Dev be updated immediately, QA be updated after 3 days, and production after 7 days to avoid these kind of issues, while still also reducing the toil of manually updating these things.

@portswigger-tim
Copy link

I wish I had flagged this in review. But failing that, and to allow for the occasional imperfect human review, I would have expected this to be caught by testing. That is the main takeaway for me: we need better coverage of JVM and NodeJS applications, or else better signal when those are failing.

Forgive my critical tone @bcressey - I'm not trying to throw blame or shame anyone. It shows that we all need to be a bit more vigilant on the changes in dependencies and surfacing them in the dependent software (a largely unsolved problem). Definitely not an issue that is unique to AWS, Bottlerocket or even OSS in general.

This sort of chain of changes that seem innocuous in themselves is how malicious actors will target "infrastructure systems" - a small change to systemd, a small change to containerd / runc, a small change to Bottlerocket and you've potentially got millions of compromised machines and 100s of millions of containers globally.

@cbgbt
Copy link
Contributor

cbgbt commented Nov 1, 2024

... is this a good opportunity to discuss having some delayed mechanism for updating nodes? If I'm not mistaken, the "auto update" for both Karpenter and Bottle Rocket Updater just always goes to the newest release...

I've opened an issue to track a change for this at the Bottlerocket level: #4273.

This should work for the "update agent" solutions, but we would need something like what's proposed in this issue on the Karpenter side.

@KCSesh
Copy link
Contributor Author

KCSesh commented Nov 1, 2024

1.26.1 with the fix has been released, so I will close this issue.

@KCSesh KCSesh closed this as completed Nov 1, 2024
@KCSesh KCSesh unpinned this issue Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.