-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update NVIDIA bootloader state on every boot #8
Conversation
Forgetting to mender commit scenario:
The PR works when the user mender commits after an update but we are stuck with misaligned partitions if the user forgets to mender commit and reboots. Please check the table above for details which were produced using this PR. |
Just tested However, cboot does not rollback to the other partition on executing
Maybe modifying redundant-boot-rollback-script-uboot to deliberately switch boot partitions for rollback in case of cboot might fix it? |
What happens in a rollback situation? Let's say you're on rootfs A to start with:
Since the device is already booting from slot A again, there's nothing for us to do as far as cboot is concerned, correct? It has already rolled back to the good slot. |
My original view on this was that manual installs are not the normal production use case, so if people shoot themselves in the foot, so be it. I changed my mind on this after having to clean up after other developers and QA engineers that ended up with mismatched devices who would either forget to What I did at work to deal with this (until I switched away from the combined cboot + U-Boot setup) was to have an In most cases the bootloader update payload wasn't critical to the install, so this generally worked OK, and saved me a bunch of grunt work cleaning up after folks. There's some risk that something (like a device tree change) wouldn't get picked up due to skipping the bootloader update payload step, but for my situation, at least, that was much less likely to be a problem. |
Thanks for the responses @madisongh
Agree this scenario is handled. What about this one though?
In the scenario above I think with these changes, if we don't support rollback we'll still be on rootfs B. If we supported |
My view on this is that it's too easy to mess up the u-boot configuration as it currently is and we should either have changes like those proposed in sighthoundinc#1 which automate the commit step (which so far haven't been resoundingly approved at https://hub.mender.io/t/auto-commit-for-standalone-mender-updates/2791/3 ) or we need documentation warning against using this configuration for anyone starting out with TX2, possibly with a warning we add in bitbake if you build |
meta-mender-tegra/recipes-bsp/tegra-binaries/redundant-boot-overrides/update-nvbootctrl.sh.in
Show resolved
Hide resolved
I don't think the state machine is dependent on contacting the mender server to complete the commit; it only reports on its current state, and retries later if the server isn't reachable. If I'm wrong about that then yes, this would be an issue. Solving it reliably could be problematic, too, since we don't have visibility into the Mender client's current state. |
I tried to find something in mender documentation which states this definitively one way or another and couldn't find one. It also wasn't obvious to me from searching the mender client code. I see a comment from Eystein at https://news.ycombinator.com/item?id=13745959 which states
Which matches what I thought happened. That said I've never specifically tested this case before, I just always assumed that was the point of
Agree, and we probably don't need it. Just wanted to verify you didn't have some way this scenario would be handled. |
I agree it's non-obvious (like most Go code I run across, it seems), so my read of the state machine code could be wrong, but I didn't see a hard dependency on server communication in there for the commit transition. |
b1ebd59
to
5c75fb9
Compare
Looks good to me. Did you want to merge here first and then get upstream? I'd like to also include some automation scripts we are working on for testing, and test with some other machines before we upstream. |
Yeah, I thought I'd merge here, using this repo as a staging area where we can collect up anything that needs to go upstream, if that's OK. |
...ender-tegra/recipes-bsp/tegra-binaries/redundant-boot-overrides/update-nvbootctrl.service.in
Outdated
Show resolved
Hide resolved
…ifier With recent versions of L4T, the NVIDIA bootloader on platforms that support A/B redundancy will automatically decrement the priority level of the current boot slot before starting the operating system, expecting that the nv_update_verifier service will reset it if the OS boot is successful. If the system boots three times without resetting the priority, the bootloader will fail over to the alternate boot slot. For Mender, we do not use nv_update_verifier due to some undesirable side-effects, so we mask it from being run at startup. To prevent the failover from occurring, provide our own service to run the necessary commands to both mark the current boot successful and reset the slot priority. Signed-off-by: Matt Madison <matt@madison.systems>
Now that we update the NVIDIA bootloader's state on every boot, we no longer need the state scripts that were doing this after upgrades/rollbacks, so remove the scripts that were doing that and update others to remove the commands that were marking the current boot slot successful. Signed-off-by: Matt Madison <matt@madison.systems>
for libubootenv, which were missing the required '@' at the start of the Python expressions. Signed-off-by: Matt Madison <matt@madison.systems>
5c75fb9
to
36377eb
Compare
Regarding
and
Our devices (which run zeus) do suffer from the following problem:
Perhaps a rollback-reboot would have been triggered automatically if the device would have run long enough. We are using the hosted mender solution and will update to dunfell soon. Will this PR change anything on this behaviour? Thank you! |
And based on this post you are using u-boot as your bootloader, is that correct? If so I don't think this PR is going to change anything about the behavior. Mender is still going to roll back your u-boot boot slot and you are going to be out of sync between boot slots used for rootfs (controlled by mender) and bootloader slot controlled by NVIDIA. I suggest migrating to cboot. With cboot as your bootloader this PR is going to avoid this scenario by only rolling back to the old boot slot if you boot 7 times without getting as far as the |
Hi @dwalkes thanks for your reply! Yes, we are using u-boot right now. We will look into migrating to a cboot-only solution, but I'm not sure if we want to risk such a major change now. Right now we just disallow update in case nvbootctrl and mender boot slots are out of sync. That was meant as a provisional quickfix, but I can already see it lasting longer than expected. I remember a comment from @madisongh (at the Yocto Project Summit, I think) that one has to take it upon him/herself to propagate the systemd-machineid from one slot to the other with a cboot-only solution. He said it can be made to work, though. But that's something we will definitely have to look into before doing such a switch. |
That's right. One way I've done this is to add a state script to copy |
Shame on me but I missed this in the yocto project talk, couldn't find it either in a quick review of the presentation. This seems like something we need to at minimum document but ideally have some way to fix in the community layer. I suppose a symlink to I don't think I understand why this is only important for cboot, it seems it would be related to uboot builds as well. |
IIRC, I mentioned it in passing either as a side comment or in response to a question.
I just checked, and the copy-after-install mechanism I mentioned is implemented in the state scripts in the community later, if you have the
I don't think systemd would handle it being a symlink, but maybe a bind mount would work, if you mount /data in the initramfs before systemd starts. That doesn't work so well if you're using systemd in the initramfs, too, though.
With U-boot, the machine ID is persisted in a U-Boot variable and added to the kernel command line at boot. |
Great, and that should be setup by default here in either u-boot or cboot builds I can see the feature that went into meta-mender at /~https://github.com/mendersoftware/meta-mender/pull/759/files but is only for u-boot, and I can see the logic here which makes this also work for cboot.
Got it, you'd think I'd remember that from mendersoftware@f6ced69 or OE4T/meta-tegra#200
This supposedly worked on warrior with mendersoftware#165 although I haven't tested it either. Maybe this is one reason it's not working since then. |
@madisongh what are your thoughts about either upstreaming this or getting it used for /~https://github.com/OE4T/tegra-demo-distro/blob/dunfell-l4t-r32.4.3/.gitmodules#L19 in the short term? I know you have other changes in the works for nvidia boot tools, so the scope could ultimately be bigger than just this change, but I think this change is important for test automation. Also I'm thinking the nvidia boot tools change you have in the works is probably not destined for dunfell. I've been able to somehow create boot slot mismatches with latest cboot testing I did today on the latest tegra-demo-distro dunfell + our custom changes so will need to go back over this during the coming week, starting with tegra-demo-distro, and I'm wondering what direction I should take. I'm thinking about starting with tegra-demo-distro + this patch because I think missing this patch will result in race conditions on boot that could cause failover in test automation. |
@dwalkes If you think they're good to go upstream, I can open a PR with these changes. Without something like these, you will run into the problem of the per-boot slot priority decrements triggering a slot switch after a couple of reboots on either TX2 or Xavier. I also have patches on this branch that I think improve the cboot+U-boot case for the TX2, but they could use more thorough testing. |
I may have confused myself somehow I'm since not able to reproduce this with the latest here on tegra-demo-distro or our custom distro. In the meantime I've added a new PR for the test scripts I'm using to validate and attempt to reproduce at mendersoftware#207. When this is approved I'll send a pr to
Looks interesting and helpful, but I'm also wondering if we should just make a change like the one below to default to cboot in tegrademo-mender.conf:
Let me know if you have any suggestions about a better way to do this that might result in less prison time for crimes against yocto. |
On the one hand, I'd certainly recommend using cboot only on the TX2 for any product. On the other hand, I don't want to make it impossible to use U-Boot, if someone really needs to for some reason.
😃
|
Agreed, I think just making cboot the default for people who are getting started will be helpful.
Thanks for keeping me out of yocto jail, not sure why I didn't think to try that. Will include in a PR along with meta-mender-community dunfell after mendersoftware#207 |
When using TX2, use cboot as default bootloader, since this avoids the additional complexity of u-boot partition mismatch and mender commit. See discussion at [1]. OE4T/meta-mender-community#8 (comment) Signed-off-by: Dan Walkes <danwalkes@trellis-logic.com>
When using TX2, use cboot as default bootloader, since this avoids the additional complexity of u-boot partition mismatch and mender commit. See discussion at [1]. OE4T/meta-mender-community#8 (comment) Signed-off-by: Dan Walkes <danwalkes@trellis-logic.com>
Since uboot slot alignment issues are tricky and unless you know what you are doing you probably want cboot as your bootloader for TX2/mender See discussion at OE4T/meta-mender-community#8 (comment) Signed-off-by: Dan Walkes <danwalkes@trellis-logic.com>
Since uboot slot alignment issues are tricky and unless you know what you are doing you probably want cboot as your bootloader for TX2/mender See discussion at OE4T/meta-mender-community#8 (comment) Signed-off-by: Dan Walkes <danwalkes@trellis-logic.com>
Since uboot slot alignment issues are tricky, unless you know what you are doing you probably want cboot as your bootloader for TX2/mender See discussion at OE4T/meta-mender-community#8 (comment) Signed-off-by: Dan Walkes <danwalkes@trellis-logic.com>
Since uboot slot alignment issues are tricky, unless you know what you are doing you probably want cboot as your bootloader for TX2/mender See discussion at OE4T/meta-mender-community#8 (comment) Signed-off-by: Dan Walkes <danwalkes@trellis-logic.com>
Since uboot slot alignment issues are tricky, unless you know what you are doing you probably want cboot as your bootloader for TX2/mender See discussion at OE4T/meta-mender-community#8 (comment) Signed-off-by: Dan Walkes <danwalkes@trellis-logic.com>
Since uboot slot alignment issues are tricky, unless you know what you are doing you probably want cboot as your bootloader for TX2/mender See discussion at OE4T/meta-mender-community#8 (comment) Signed-off-by: Dan Walkes <danwalkes@trellis-logic.com>
Since uboot slot alignment issues are tricky, unless you know what you are doing you probably want cboot as your bootloader for TX2/mender See discussion at OE4T/meta-mender-community#8 (comment) Signed-off-by: Dan Walkes <danwalkes@trellis-logic.com>
Since uboot slot alignment issues are tricky, unless you know what you are doing you probably want cboot as your bootloader for TX2/mender See discussion at OE4T/meta-mender-community#8 (comment) Signed-off-by: Dan Walkes <danwalkes@trellis-logic.com>
Since uboot slot alignment issues are tricky, unless you know what you are doing you probably want cboot as your bootloader for TX2/mender See discussion at OE4T/meta-mender-community#8 (comment) Signed-off-by: Dan Walkes <danwalkes@trellis-logic.com>
Since uboot slot alignment issues are tricky, unless you know what you are doing you probably want cboot as your bootloader for TX2/mender See discussion at OE4T/meta-mender-community#8 (comment) Signed-off-by: Dan Walkes <danwalkes@trellis-logic.com>
@manuel-wagesreither regarding upgrade from u-boot to c-boot on |
Since uboot slot alignment issues are tricky, unless you know what you are doing you probably want cboot as your bootloader for TX2/mender See discussion at OE4T/meta-mender-community#8 (comment) Signed-off-by: Dan Walkes <danwalkes@trellis-logic.com>
No description provided.