-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Restart buildkit after containerd when provisioning #461
Conversation
Seems like all of the errors on the tests are similar to this:
Wondering if its because of the way the |
Been seeing some of these failures across a few different PRs. Potentially not related to your changes. |
I reran the CI on #455, and it didn't hit these errors. I think its something related to this one |
The tests are failing here on this PR and on the finch-core repo. The similarity? Both are now running with synced containerd/buildkit worker UUIDs. In other words, the "correct" behavior appears to be that these tests fail. Looking into when/why this started happening. |
I've pulled your changes down locally, verified that the UUIDs are matching, but can't reproduce the e2e test failures locally. These tests have been failing on finch-core since e2e tests were added. The difference between finch and finch-core e2e tests is that finch-core e2e 1/ runs a slightly different Fedora VM and 2/ runs tests with
True, but for some reason the Fedora VM used in finch-core e2e tests doesn't require this buildkit systemd service ordering change to enforce that UUID match... see /~https://github.com/runfinch/finch-core/blob/eaeaa96443a633d49d51143373650a93ab83a467/e2e/fedora.yaml#L148-L150 Here's my results using /~https://github.com/runfinch/finch-core/blob/main/e2e/fedora.yaml:
The changes in this PR do what we expect, which is sync the UUIDs for buildkit and containerd. |
tl;dr for CI issues here is garbage collection labels not being set properly. I got to the bottom of it and opened an issue on nerdctl containerd/nerdctl#2372, but in reality this is a bug with buildkitv0.11.x . The bug has been patched in moby/buildkit#3972 which was included in buildkitv0.12.0 which was released just yesterday: /~https://github.com/moby/buildkit/releases/tag/v0.12.0. Opened up containerd/nerdctl#2374 to upgrade nerdctl's buildkit to v0.12.0. Let's keep this PR open until we have both a new nerdctl release with the upgrade and a new Lima release pointing to the new nerdctl release. |
Should be able to rebase and re-run CI now #521 has been merged with dependency upgrades |
Signed-off-by: Justin Alvarez <alvajus@amazon.com>
Signed-off-by: Justin Alvarez <alvajus@amazon.com>
19392e1
to
e36f90b
Compare
Just rebased 🤞 |
Benchmarking steps:
|
🤖 I have created a release *beep* *boop* --- ## [0.8.0](v0.7.0...v0.8.0) (2023-08-16) ### Features * adding config option for SOCI installation on VM ([#506](#506)) ([a2e077b](a2e077b)) ### Bug Fixes * configure aws creds in sync submodules/deps action ([#518](#518)) ([b67452e](b67452e)) * give pull request write permissions to sync job ([#520](#520)) ([55b5235](55b5235)) * give token write perms to sync-submodules ([#519](#519)) ([8b639ea](8b639ea)) * Mount /var/folders to finch vm ([#525](#525)) ([c97d2e9](c97d2e9)) * option to use installed lima for SOCI e2e tests ([#533](#533)) ([8b66659](8b66659)) * quote recursive calls to make ([#515](#515)) ([d603096](d603096)) * Restart buildkit after containerd when provisioning ([#461](#461)) ([fca1828](fca1828)) ### Build System or External Dependencies * **deps:** Bump github.com/docker/cli from 24.0.4+incompatible to 24.0.5+incompatible ([#495](#495)) ([e9e8617](e9e8617)) * **deps:** Bump github.com/docker/docker from 24.0.4+incompatible to 24.0.5+incompatible ([#497](#497)) ([6f1afbb](6f1afbb)) * **deps:** Bump github.com/lima-vm/lima from 0.16.0 to 0.17.2 ([#531](#531)) ([6e33d15](6e33d15)) * **deps:** Bump github.com/onsi/gomega from 1.27.8 to 1.27.10 ([#496](#496)) ([d08d102](d08d102)) * **deps:** Bump github.com/pkg/sftp from 1.13.5 to 1.13.6 ([#530](#530)) ([09b3846](09b3846)) * **deps:** Bump github.com/shirou/gopsutil/v3 from 3.23.6 to 3.23.7 ([#513](#513)) ([83bd718](83bd718)) * **deps:** Bump golang.org/x/tools from 0.11.0 to 0.11.1 ([#509](#509)) ([e826bcf](e826bcf)) * **deps:** Bump golang.org/x/tools from 0.11.1 to 0.12.0 ([#523](#523)) ([09d6514](09d6514)) * **deps:** Bump k8s.io/apimachinery from 0.27.3 to 0.27.4 ([#487](#487)) ([444bbc0](444bbc0)) * **deps:** Bump k8s.io/apimachinery from 0.27.4 to 0.28.0 ([#535](#535)) ([8df84cf](8df84cf)) * **deps:** Bump submodules and dependencies ([#521](#521)) ([1b3ad94](1b3ad94)) --- This PR was generated with [Release Please](/~https://github.com/googleapis/release-please). See [documentation](/~https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Issue #, if available: *Description of changes:* - Fixes e2e tests by restarting BuildKit when containerd is restarted (similar issue faced in runfinch/finch#461) - Uses a more up to date yaml file for testing (will automatically update the OS version too since that's included in the makefile) *Testing done:* - [x] I've reviewed the guidance in CONTRIBUTING.md #### License Acceptance By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: Justin Alvarez <alvajus@amazon.com>
Issue #, if available: Fixes #412
Description of changes:
Testing done:
Local testing
I've reviewed the guidance in CONTRIBUTING.md
License Acceptance
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.