Is `tlb_data` critical system state? If not, should we clear it before saving state? #1257

rndmcnlly · 2025-02-13T17:50:58Z

rndmcnlly
Feb 13, 2025

In the current V86 design, the software model of the memory management unit, namely tlb_data, is not included in snapshots. Accordingly, it makes sense that we have a call to full_clear_tlb in the CPU's set_state process.

Even though the TLB contents aren't supposed to be part of the architectural state of an x86 machine, you can write code that observes it. To make V86 a more faithful model of real x86 processors, should we also include TLB state (at least the known-valid entries)?

(Super quick sketch of observing TLB state: In supervisor mode, write to a page table entry but don't flush the TLB. A moment later, write to an address impacted by your page table update. Whether your write lands in one part of memory or the other depends on whether someone else flushed the TLB in the meantime, which can depend on whether it contained certain state or not, conditionally triggering a page walk, etc.)

Simple answer: No, we shouldn't try to faithfully reproduce behavior that was undefined anyway. V86 doesn't claim to be a model of any specific x86 processor, so it is fine if its behavior in architecturally undefined state is weirdly sensitive to whether you saved/loaded state recently.

We could make V86 more self-consistent by also calling full_clear_tlb in get_state. This would make it so the state you are in when you restore a state matches the one you were in when you saved, but it would require the counter-intuitive effect that get_state actually changes the state of the system (by destroying architecturally invisible but technically visible anyway state).

Simple answer: No, it is too weird that get_state would change anything at all. This choice optimizes for the convenience of the V86 maintainers at the cost of convenience for 32-bit kernel-level developers who might be using V86 as a testbed for debugging kernels or practicing TLB timing attacks.

Context: I'm looking into capturing execution state trajectories (state over time in a reproducible way). One approach is to try to make execution deterministic and capture only the environment stimulus events that drive execution down one path or another. This seems like a massive engineering task that might ultimately not be possible without unacceptable performance impacts (see QEMU's record/replay approach). Another approach is to simply take whole-system snapshots at whatever time granularity the application demands. The second approach is much simpler (if you ignore how we're going to compress and save all that data without hurting interactivity), but it could be observable in executions as the effect of the TLB being much more forgetful.

I recommend sticking to our current design (change nothing).

copy · 2025-02-18T12:08:47Z

copy
Feb 18, 2025
Maintainer

Even though the TLB contents aren't supposed to be part of the architectural state of an x86 machine, you can write code that observes it. To make V86 a more faithful model of real x86 processors, should we also include TLB state (at least the known-valid entries)?

Simple answer: No, we shouldn't try to faithfully reproduce behavior that was undefined anyway. V86 doesn't claim to be a model of any specific x86 processor, so it is fine if its behavior in architecturally undefined state is weirdly sensitive to whether you saved/loaded state recently.

Originally I tried to squeeze every bit out of state images, but I think now it's probably better to implement the "obvious" thing and not cause a bunch of pagefaults after state restores. Currently, the TLB is limited to 10000 entries anyway, so it wouldn't increase the size much (10000 * 8 bytes ≈ 80 KiB, and probably compressed very well).

To summarise, yes I wouldn't mind changing this.

Simple answer: No, it is too weird that get_state would change anything at all. This choice optimizes for the convenience of the V86 maintainers at the cost of convenience for 32-bit kernel-level developers who might be using V86 as a testbed for debugging kernels or practicing TLB timing attacks.

I agree with this, get_state shouldn't change anything. Developers who are interested in more consistency between state images may call full_clear_tlb before get_state themselves, but this should probably be documented.

Context: I'm looking into capturing execution state trajectories (state over time in a reproducible way). One approach is to try to make execution deterministic and capture only the environment stimulus events that drive execution down one path or another. This seems like a massive engineering task that might ultimately not be possible without unacceptable performance impacts (see QEMU's record/replay approach). Another approach is to simply take whole-system snapshots at whatever time granularity the application demands. The second approach is much simpler (if you ignore how we're going to compress and save all that data without hurting interactivity), but it could be observable in executions as the effect of the TLB being much more forgetful.

This is also interesting for me. FWIW, I still have an old patch for v86 that enables replaying (and verifying) binary traces from qemu. It does so by disabling all of v86's hardware except the CPU (with the jit disabled), replaying the interrupts and port IO from qemu, and comparing register state with qemu's after every basic block.

I haven't, but would like to have v86 also output such a trace file, so we can, for example, compare v86's jitted with v86's non-jitted execution.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is `tlb_data` critical system state? If not, should we clear it before saving state? #1257

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Is tlb_data critical system state? If not, should we clear it before saving state? #1257

rndmcnlly Feb 13, 2025

Replies: 1 comment

copy Feb 18, 2025 Maintainer

Is `tlb_data` critical system state? If not, should we clear it before saving state? #1257

rndmcnlly
Feb 13, 2025

copy
Feb 18, 2025
Maintainer