This document is a list of bugs that were uncovered during the implementation and deployment of arithmetic overflow checking.
This list is restricted solely to legitimate bugs. Cases
where the overflow was benign (e.g. the computed value is
unused), transient (e.g. the computed wrapped value is
guaranteed to be brought back into the original range, such as
in unsigned - 1 + provably_positive
), or silly (random
non-functional code in the tests or documentation) are not
included in the list.
However, extremely rare or obscure corner cases are considered legitimate bugs. (We begin with such a case.)
if one calls the iter.idx(index)
with index <= amt
,
then it calls the wrapped inner iterstor with a wrapped
around value. The contract for idx
does say that it
does need to handle out-of-bounds inputs, so this
appeared benign at first, but there is the corner case
of an iterator that actually covers the whole range
of indices, which would then return Some(_)
here when
(pnkfelix thinks) None
should be expected.
reference: rust-lang/rust#22532 (comment)
fn ns
was converting a tick count t
to nanoseconds
via the computation t * 1_000_000_000 / frequency()
;
but the multiplication there can overflow, thus losing
the high-order bits.
Full disclosure: This bug was known prior to landing arithmetic overflow checks, and filed as:
Despite being filed, it was left unfixed for months, despite the fact that the overflow would start occurring after 2 hours of machine uptime, according to:
pnkfelix included it on this list because having arithmetic overflow forces such bugs to be fixed in some manner rather than ignored.
The runtime startup uses a fairly loose computation to determine the stack extent to pass to record_os_managed_stack_bounds (which sets up guard pages and fault handlers to deal with call stack over- or underflows).
In this case, the arithmetic involved was actually overflowing, in this calculation:
let top_plus_20k = my_stack_top + 20000;
pnkfelix assumes that in practice this would lead to us attempting to install a guard page starting from some random location, rather than the actual desired address range. While the lack of a guard page in the right spot is probably of no consequence here (assuming that the OS is already going to stop us from actually attempting to write to stack locations resulting from overflow if that ever occurs), attempting to install a guard page on a random unrelated address range seems completely bogus. pnkfelix only observed this bug when building a 32-bit Rust on a 64-bit Linux host via cross-compilation.
So, probably qualifies a rare bug. reference:
rust-lang/rust#22532 (comment)
UPDATE: In hindsight, one might argue this should be reclassified as a transient overflow, because the whole computation in context is:
let my_stack_bottom =
my_stack_top + 20000 - OS_DEFAULT_STACK_ESTIMATE;
where OS_DEFAULT_STACK_ESTIMATE
is a large value
(> 1mb).
However, my claim is that this code is playing guessing games; do we really know that the stack is sufficiently large that the computation above does not underflow?
So pnkfelix is going to leave it on this list, at least for now. (pnkfelix subsequently changed the code to use saturated arithmetic in both cases, though obviously that could be tweaked a bit.)
There is an explanatory story here:
In short, one of our tests was quite weak and not
actually checking the computed values. But
arithmetic-overflow checking immediately pointed
out an attempt to reserve a ridiculous amount
of space within a Vec
. (This was on an experimental
branch of the codebase where we would fill with
a series of 0xC1
bytes when a value was dropped, rather
than filling with 0x00
bytes.)
It is actually quite likely that this test would still have failed without the arithmetic overflow checking, but it probably would have been much harder to diagnose since the panic would have happened at some arbitrary point later in the control flow.
Experts in the syntax system probably already knew about this bug, but overflow checking forced everyone to be reminded of it, repeatedly, in:
See in particular:
rust-lang/rust#23115 (comment)
We have not actually fixed issue #23480 at the time of this writing. Instead we put in a workaround:
The implementation of #[bench]
was not prepared to deal
with an overflow in the iteration count.
The arithmetic overflow checking forced that to fail on a benchmark like
#[bench]
fn foo(b: &mut test::Bencher) {}
rather than return bogus results.
The iteration count is stored in a u64
, so overflow sounds
unlikely -- even though n
is being multiplied by 2 on every
increment, so it sounds plausible that we could get to very large
numbers indeed, the fact that we have to actually run that many
iterations is what gives me pause.
At this point I am assuming that the compiler is doing something
to optimize away the looping when the benchmark code is empty:
keep in mind that the #[bench]
code above is not calling the
black_box
method of b
. Maybe we should be requring that?
Anyway, the chosen fix was to just return the results so far before hitting the overflow:
This seems like a combination of a typo (or "think-o" perhaps)
doing lo - hi
instead of hi - lo
, but also a fencepost error.
rust-lang/rust#33864 (comment)
/~https://github.com/rust-lang/rust/pull/33864/commits/0ca9bf394006fe635a4a76ca6fa78a70633666f0
This code passed in an expected_len
rather than max_len
, a logic
error, and the arithmetic checking code caught it. (I'm not quite
clear on whether there is a unit test that also could have caught it
independently? It sounds like if such a test exists, it was not part
of the test suite...)
Apparently our C ABI code for x86_64 was determining where an argument needed to be passed in memory or in a register via some logic that computes the number of available registers via cumulative subtraction from a running total, and compares the number available to how many registers we need. If available is less than the the number needed, we know its passed in memory.
The problem was that it originally used an unsized integer type for the accumulated number available, and thus if underflowed, we would stop indicating that those values needed to be passed via memory.
(Definitely seems like a legit and subtle bug to me.)
This was a simple logic error that produced bogus values for the accumulated error count. The underflow checker caught it, our compile-fail/ test suite at some point started hitting it (but only on debug builds, which at that time were not tested by bors/homu I guess).
I am not an expert on the CSS specification, but my intuition is that this is a legitimate bug (which was uncovered by the overflow checker):