Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port of ariels draft change to add initialization checking spec #51

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
268 changes: 225 additions & 43 deletions 0000-nonlexical-lifetimes.md
Original file line number Diff line number Diff line change
Expand Up @@ -1592,7 +1592,50 @@ our example, this would be the case if the type of `list` had been
reported by the next portion of the borrowck, described in the next
section.

### Borrow checker phase 2: reporting errors
### Borrow Checker phase 2: computing initialization

In addition to propagating borrows, the borrow-checker makes sure that uninitialized values can never be seen.

To ensure that, we also compute whether each lvalue is potentionally and/or definitely initialized at each point in the CFG.

We compute that using another standard dataflow computation. Of course, there might be an infinite amount of lvalues (e.g. consider a linked list using `Box`), but the computation can be done in finite time because lvalues form a tree, and the transfer function always affects an **lvalue subtree** at once (where an lvalue `cv` is in the subtree generated by the lvalue `lv` if `lv` is a **prefix** of `cv`).

The transfer function is as follows:

- at function entry, the function arguments are definitely initialized, and all other locals are definitely uninitialized
- after a consuming use of the lvalue `lv`, if the type of `lv` does not implement `Copy`, all lvalues in the lvalue subtree of `lv` are definitely uninitialized
- after a MIR `drop(lv)` or `storagedead lv` statement, all lvalues in the lvalue subtree of `lv` are definitely uninitialized
- after an `lv = <rvalue>` statement or an `lv <- <rvalue>` (drop+replace) statement, all lvalues in the lvalue subtree of `lv` are definitely initialized
- after a non-panic return from an `lv = call <fn>(<args>)` statement, all lvalues in the lvalue subtree of `lv` are definitely initialized

NOTE: it is possible for all the fields of a struct or tuple to be initialized without the struct/tuple itself being initialized, for example:
```Rust
let x: (u32, u32);
x.0 = 4;
x.1 = 5;

println!("{} {}", x.0, x.1); // ok

println!("{:?}", x); //~ ERROR `x` is not initialized

x = (x.0, x.1);
println!("{:?}", x); // ok
```

I'm not sure of a nice way to fix this while maintaining that structs with 0 fields can be uninitialized. XXX: PLEASE TRY TO FIND SOMETHING.

However, in a valid program, if a value behind a pointer or enum is initialized, the pointer or enum discriminant itself is initialized too. This follows because the contents of a pointer/enum can only be written to while the pointer is initialized.

XXX: formalize interaction of match & enums to make this correct.

To ensure that the compiler can always keep track of which values are initialized, a consuming use of a non-`Copy` lvalue `lv` cause an error if `lv` has a prefix of one of these forms:
1. A dereference of an `&`-reference or `*`-pointer.
2. An projection of an element from an array (`[T; n]`), except through an array pattern (e.g. `let x = a[4];` is an error, but `let [.., x] = a;` is OK), as the index might not be known at compile-time.
3. Any projection of an element from a slice (`[T]`). This might be relaxed to allow slice patterns if someone figures out how to implement them.
4. A field projection of an ADT with a destructor (both field accesses like `foo.x` and patterns, including enum patterns like `E::A(data)`), as it is not obvious when the destructor should be run.


### Borrow checker phase 3: reporting errors

At this point, we have computed which loans are in scope at each
point. Next, we traverse the MIR and identify actions that are illegal
Expand Down Expand Up @@ -1630,7 +1673,7 @@ There are a few interesting cases to keep in mind:
extend the text here to cover `Box`, though some questions arise
around the handling of drop (see the section on drops for details).

**Accessing an lvalue LV.** When accessing an lvalue LV, there are two
**Accessing an lvalue LV.** For checking borrows, when accessing an lvalue LV, there are two
axes to consider:

- The access can be SHALLOW or DEEP:
Expand All @@ -1657,57 +1700,196 @@ similar: it writes to the shallow content of `y`, but then -- via the
new name `x` -- we can access all other content accessible through
`y`.

The pseudocode for deciding when an access is legal looks like this:
The pseudocode for deciding when an access is valid looks like this:

```
fn access_legal(lvalue, is_shallow, is_read) {
let relevant_borrows = select_relevant_borrows(lvalue, is_shallow);
fn access_valid(lvalue, is_shallow, is_read) {
check_initialized(lvalue, is_shallow);
check_loans(lvalue, is_shallow, is_read);
check_permission(lvalue, is_read);
check_immutable_binding(lvalue, is_read);
}

fn check_permission(lvalue, is_read) {
for each prefix of lvalue {
if prefix ~ `*base` &&
!is_read &&
base is an immutable reference or pointer
{
// can't mutate through an immutable reference
report_error();
}
}
}

fn check_immutable_binding(lvalue, is_read) {
for each shallow prefix of lvalue {
if is_possibly_initialized(lvalue) &&
!is_read &&
prefix is an immutable binding
{
// can't mutate a non-mutable binding twice
report_error();
}
}
}

fn check_initialized(lvalue, is_shallow) {
if is_shallow {
for each prefix of lvalue {
if prefix ~ `*base` && !is_definitely_initialized(base) {
// can't write through an uninitialized pointer
report_error();
}
}
} else {
for each child in lvalue_tree(lvalue) {
if !is_definitely_initialized(lvalue) {
// can't use an uninitialized value
report_error();
}
}
}
}

fn loan_overlaps_access(lvalue, is_shallow, loan) {
if is_disjoint(lvalue, loan.lvalue) {
// the loan is necessarily disjoint from our accessed lvalue,
// it can't affect our access.
return false;
}

if is_untracked(loan.lvalue) {
// Rust already assumes that everything behind an immutable
// reference is already borrowed, so knowing that a loan
// exists doesn't add any restrictions.
return false;
}

if is_shallow && is_subobject_behind_deref(lvalue, loan.lvalue) {
// This is a shallow access, and the loan is behind a
// dereference of this lvalue - after this access is over,
// the loan will be "orphaned".
return false;
}

// The loan can potentially interact with this access.
return true;
}

fn check_loans(lvalue, is_shallow, is_read) {
for loan in active_loans {
// If the other loan doesn't conflict with this access, it is ok.
if !loan_overlaps_access(lvalue, is_shallow, loan) { continue; }

for borrow in relevant_borrows {
// shared borrows like `&x` still permit reads from `x` (but not writes)
if is_read && borrow.is_read { continue; }
if is_read && loan.is_read { continue; }

// otherwise, report an error, because we have an access
// that conflicts with an in-scope borrow
// that conflicts with an in-scope loan
report_error();
}
}
```

As you can see, it works in two steps. First, we enumerate a set of
in-scope borrows that are relevant to `lvalue` -- this set is affected
by whether this is a "shallow" or "deep" action, as will be described
shortly. Then, for each such borrow, we check if it conflicts with the
action (i.e.,, if at least one of them is potentially writing), and,
if so, we report an error.

For **shallow** accesses to the path `lvalue`, we consider borrows relevant
if they meet one of the following criteria:

- there is a loan for the path `lvalue`;
- so: writing a path like `a.b.c` is illegal if `a.b.c` is borrowed
- there is a loan for some prefix of the path `lvalue`;
- so: writing a path like `a.b.c` is illegal if `a` or `a.b` is borrowed
- `lvalue` is a **shallow prefix** of the loan path
- shallow prefixes are found by stripping away fields, but stop at
any dereference
- so: writing a path like `a` is illegal if `a.b` is borrowed
- but: writing `a` is legal if `*a` is borrowed, whether or not `a`
is a shared or mutable reference

For **deep** accesses to the path `lvalue`, we consider borrows relevant
if they meet one of the following criteria:

- there is a loan for the path `lvalue`;
- so: reading a path like `a.b.c` is illegal if `a.b.c` is mutably borrowed
- there is a loan for some prefix of the path `lvalue`;
- so: reading a path like `a.b.c` is illegal if `a` or `a.b` is mutably borrowed
- `lvalue` is a **supporting prefix** of the loan path
- supporting prefixes were defined earlier
- so: reading a path like `a` is illegal if `a.b` is mutably
borrowed, but -- in contrast with shallow accesses -- reading `a` is also
illegal if `*a` is mutably borrowed

There are a few conditions that need to be satisfied in order for an access to be valid:

The simplest is permission checking - it is an error to write through an immutable path. This is basicallly a "type-checking" constraint, but is included in the borrow checker.

Similarly, immutable binding checking: writes whose lvalue's **shallow prefix** contains a immutable binding (`let x;`, as opposed to `let mut x;`) are an error if the local can't possibly be initialized when the accesses occur (because deep writes require the data to be *definitely* initialized, they can't actually be valid if their lvalue's shallow prefix contains an immutable binding).

Next up is definite initialization checking:
- Every pointer being dereferenced in the lvalue needs to be definitely initialized - e.g., if the lvalue is `(*(*x).y).z` then `*x` and `*(*x).y` need to be definitely initialized. This is required to avoid dereferencing wild pointers.
- For deep accesses, the lvalue being accessed and every subcomponent of it must also be initialized.

Note that there can never be definitely initialized data behind an uninitialized pointer. This follows from the rules - moving or ending the storage of a pointer deinitializes all data behind it, and data can't be written behind an uninitialized pointer. This means that for deep accesses, we do not need to check that pointers to the checking that the lvalue being accessed is initialized implies

The final condition is loan checking: all loans must be compatible with this access (as the start of a borrow is an access, this also implies that 2 conflicting borrow can't exist).

It might be more natural to specify that *no loan might conflict with the access*, but this is a soundness criterion, so it is better to work positively.

A loan is compatible with an access if either:
1. the loan is shared and the access is a read (LC-permission-compatibile).
2. the loan is not in scope during the access. Note that the loan created by a borrow is always compatible with the access that starts the borrow (LC-temporally-separated).
3. the loan's lvalue has the dereference of a raw or immutable pointer as a prefix (LC-untracked).
- e.g. a loan of `*(*x).y` where `x` is an immutable reference.
4. the loan's lvalue is disjoint from the accessed lvalue according to the rules below (LC-spatially-disjoint)
5. the access is shallow, and the accessed lvalue is a dereference-subobject of the loan's lvalue (LC-subobject)
- e.g. a loan of `x` is compatible with a shallow access to `(*x.0).y`
- or a loan of `x[i]` is compatible with a shallow access to `(*x[j].0).y
- XXX: there should be a better way of writing this rule, but use the DISJOINT-OR-SUBOBJECT rule below.

Disjointness and subobject are determined through the following rules (I should really write an explanation):

```
l1, l2 locals
l1 != l2
----------------
l1 DISJOINT l2 (DISJOINT-LOCAL)

l1, l2 locals
----------------
l1 DISJOINT-OR-EQ l2 (DISJOINT-OR-EQ-BASE)

p,q reference or pointer
p DISJOINT-OR-EQ q
----------------
*p DISJOINT-OR-EQ *q (DISJOINT-OR-EQ-PTR)

p, q struct or enum
p DISJOINT-OR-EQ q
----------------
p.field1 DISJOINT-OR-EQ q.field2 (DISJOINT-OR-EQ-FIELD)

p, q array or slice
p DISJOINT-OR-EQ q
----------------
p[i] DISJOINT-OR-EQ q[j] (DISJOINT-OR-EQ-ARRAY)

p, q array or slice
p DISJOINT-OR-EQ q
i != j
----------------
p[i] DISJOINT q[j] (DISJOINT-ARRAY)

p, q array or slice
p DISJOINT-OR-EQ q
i not in j1..j2
----------------
p[i] DISJOINT q[j1..j2] (DISJOINT-ARRAY)
p[j1..j2] DISJOINT q[i] (DISJOINT-ARRAY)

p struct or enum
q struct or enum
p DISJOINT-OR-EQ q
field1 != field2
----------------
p.field1 DISJOINT q.field2 (DISJOINT-FIELD)

p DISJOINT q
p PREFIX OF p1
q PREFIX OF q1
----------------
p1 DISJOINT q1 (DISJOINT-PREFIX)

p DISJOINT-OR-EQ q
q PREFIX OF q1
*q1 PREFIX OF q2
----------------
q2 DISJOINT-OR-DEREFERENCE-SUBOBJECT q (DISJOINT-OR-SUBOBJECT)

// note: no rules for unions other than DISJOINT-PREFIX, so
// if we see a union we "get stuck".
//
// note2: nested slice patterns are collapsed before we get
// here, so
// let [a, [b, ..]..] = X;
// is equivalent to
// let [a, b, ..] = X;
```

Note that in the absence of unions, every loan and lvalue are either provably disjoint-or-equal, or potentially overlapping.

**Dropping an lvalue LV.** Dropping an lvalue can be treated as a DEEP
WRITE, like a move, but this is overly conservative. The rules here
are under active development, see
Expand Down