Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Begin documenting initialization checking #2174

Closed
wants to merge 4 commits into from
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 119 additions & 20 deletions text/2094-nll.md
Original file line number Diff line number Diff line change
Expand Up @@ -1592,7 +1592,49 @@ our example, this would be the case if the type of `list` had been
reported by the next portion of the borrowck, described in the next
section.

### Borrow checker phase 2: reporting errors
### Borrow Checker phase 2: computing initialization

In addition to propagating borrows, the borrow-checker makes sure that uninitialized values can never be seen.

To ensure that, we also compute whether each lvalue is potentionally and/or definitely initialized at each point in the CFG.

We compute that using another standard dataflow computation. Of course, there might be an infinite amount of lvalues (e.g. consider a linked list using `Box`), but the computation can be done in finite time because lvalues form a tree, and the transfer function always affects an **lvalue subtree** at once (where an lvalue `cv` is in the subtree generated by the lvalue `lv` if `lv` is a **prefix** of `cv`).

The transfer function is as follows:

- at function entry, the function arguments are definitely initialized, and all other locals are definitely uninitialized
- after a consuming use of the lvalue `lv`, if the type of `lv` does not implement `Copy`, all lvalues in the lvalue subtree of `lv` are definitely uninitialized
- after a `drop(lv)` statement, all lvalues in the lvalue subtree of `lv` are definitely uninitialized
- after an `lv = <rvalue>` statement or an `lv <- <rvalue>` (drop+replace) statement, all lvalues in the lvalue subtree of `lv` are definitely initialized
- after a non-panic return from an `lv = call <fn>(<args>)` statement, all lvalues in the lvalue subtree of `lv` are definitely initialized

NOTE: it is possible for all the fields of a struct or tuple to be initialized without the struct/tuple itself being initialized, for example:
```Rust
let x: (u32, u32);
x.0 = 4;
x.1 = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current borrow checker, we do not accept this program. Are there corner cases that I am overlooking here?

Presuming there are not, I think that the current borrow checker basically enforces a rule like "Assigning to an lvalue LV requires that all prefixes of LV are definitely initialized", right?

Maybe we should just adopt this rule for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a difference from the current borrow checker. I see no problem with this rule.


println!("{} {}", x.0, x.1); // ok

println!("{:?}", x); //~ ERROR `x` is not initialized

x = (x.0, x.1);
println!("{:?}", x); // ok
```

I'm not sure of a nice way to fix this while maintaining that structs with 0 fields can be uninitialized. XXX: PLEASE TRY TO FIND SOMETHING.

However, in a valid program, if a value behind a pointer or enum is initialized, the pointer or enum discriminant itself is initialized too. This follows because the contents of a pointer/enum can only be written to while the pointer is initialized.

XXX: formalize interaction of match & enums to make this correct.

To ensure that the compiler can always keep track of which values are initialized, a consuming use of a non-`Copy` lvalue `lv` cause an error if `lv` has a prefix of one of these forms:
1. A dereference of an `&`-reference or `*`-pointer.
2. An projection of an element from an array (`[T; n]`), except through an array pattern (e.g. `let x = a[4];` is an error, but `let [.., x] = a;` is OK), as the index might not be known at compile-time.
3. Any projection of an element from a slice (`[T]`). This might be relaxed to allow slice patterns if someone figures out how to implement them.
4. A field projection of an ADT with a destructor (both field accesses like `foo.x` and patterns, including enum patterns like `E::A(data)`), as it is not obvious when the destructor should be run.

### Borrow checker phase 3: reporting errors

At this point, we have computed which loans are in scope at each
point. Next, we traverse the MIR and identify actions that are illegal
Expand Down Expand Up @@ -1630,8 +1672,7 @@ There are a few interesting cases to keep in mind:
extend the text here to cover `Box`, though some questions arise
around the handling of drop (see the section on drops for details).

**Accessing an lvalue LV.** When accessing an lvalue LV, there are two
axes to consider:
**Accessing an lvalue LV.** For checking borrows, when accessing an lvalue LV, there are two axes to consider:

- The access can be SHALLOW or DEEP:
- A *shallow* access means that the immediate fields reached at LV
Expand Down Expand Up @@ -1661,6 +1702,55 @@ The pseudocode for deciding when an access is legal looks like this:

```
fn access_legal(lvalue, is_shallow, is_read) {
check_initialized(lvalue, is_shallow);
check_borrows(lvalue, is_shallow, is_read);
check_permission(lvalue, is_read);
check_immutable_binding(lvalue, is_read);
}

fn check_permission(lvalue, is_read) {
for each prefix of lvalue {
if prefix ~ `*base` &&
!is_read &&
base is an immutable reference or pointer
{
// can't mutate through an immutable reference
report_error();
}
}
}

fn check_immutable_binding(lvalue, is_read) {
for each shallow prefix of lvalue {
if is_possibly_initialized(lvalue) &&
!is_read &&
prefix is an immutable binding
{
// can't re-mutate a non-mutable binding
report_error();
}
}
}

fn check_initialized(lvalue, is_shallow) {
if is_shallow {
for each prefix of lvalue {
if prefix ~ `*base` && !is_definitely_initialized(base) {
// can't write through an uninitialized pointer
report_error();
}
}
} else {
for each child in lvalue_tree(lvalue) {
if !is_definitely_initialized(lvalue) {
// can't use an uninitialized value
report_error();
}
}
}
}

fn check_borrows(lvalue, is_shallow, is_read) {
let relevant_borrows = select_relevant_borrows(lvalue, is_shallow);

for borrow in relevant_borrows {
Expand All @@ -1674,40 +1764,49 @@ fn access_legal(lvalue, is_shallow, is_read) {
}
```

As you can see, it works in two steps. First, we enumerate a set of
in-scope borrows that are relevant to `lvalue` -- this set is affected
by whether this is a "shallow" or "deep" action, as will be described
shortly. Then, for each such borrow, we check if it conflicts with the
action (i.e.,, if at least one of them is potentially writing), and,
As you can see, the checking of borrows works in two steps. First, we
enumerate a set of in-scope borrows that are relevant to `lvalue` -- this
set is affected by whether this is a "shallow" or "deep" action, as will
be described shortly. Then, for each such borrow, we check if it conflicts
with the action (i.e.,, if at least one of them is potentially writing), and,
if so, we report an error.

For **shallow** accesses to the path `lvalue`, we consider borrows relevant
if they meet one of the following criteria:

- there is a loan for the path `lvalue`;
- so: writing a path like `a.b.c` is illegal if `a.b.c` is borrowed
- there is a loan for some prefix of the path `lvalue`;
- so: writing a path like `a.b.c` is illegal if `a` or `a.b` is borrowed
- `lvalue` is a **shallow prefix** of the loan path
1. the loan path is a prefix of the path `lvalue`;
- so: writing to a path like `a.b.c` is illegal if `a`, `a.b`, or `a.b.c` itself is mutably borrowed
2. the loan path is, or has a **shallow prefix** `<base>.<field1>` that accesses a field of a union, and `lvalue` has a prefix of the form `<base>.<field2>` for a *different* field with the same base union.
- so: if `a.b` is a union with distinct fields `x` and `y`, shallowly accessing a path `a.b.x.c` is illegal if `a.b.y` or `a.b.y.s` is borrowed. There `<base>` is `a.b`, `<field1>` is `y` and `<field2>` is `x`.
- but: unless `a.b.x` is *also* a union, it is legal to access the `a.b.x.c` is *legal* if `a.b.x.d` is borrowed, because the same union field is used in both borrows.
- the prefix of `lvalue` can be an arbitrary prefix - if `a.b.y.s` is borrowed, then it is illegal to shallowly access `*(*a.b.x.c).t`.
3. `lvalue` is a **shallow prefix** of the loan path
- shallow prefixes are found by stripping away fields, but stop at
any dereference
- so: writing a path like `a` is illegal if `a.b` is borrowed
- so: writing to a path like `a` is illegal if `a.b` is borrowed
- but: writing `a` is legal if `*a` is borrowed, whether or not `a`
is a shared or mutable reference

In addition, if `lvalue` has a prefix that is a dereference `*base_lv`, then the base pointer `base_lv` must be definitely initialized (inductively, checking the shallow prefix should be equivalent)
- so: writing to a path like `(*a).x` is legal only if `a` is initialized, and still legal if `(*a).x` itself is not initialized.

For **deep** accesses to the path `lvalue`, we consider borrows relevant
if they meet one of the following criteria:

- there is a loan for the path `lvalue`;
- so: reading a path like `a.b.c` is illegal if `a.b.c` is mutably borrowed
- there is a loan for some prefix of the path `lvalue`;
- so: reading a path like `a.b.c` is illegal if `a` or `a.b` is mutably borrowed
- `lvalue` is a **supporting prefix** of the loan path
1. the loan path is a prefix of the path `lvalue`;
- so: reading a path like `a.b.c` is illegal if `a`, `a.b` or `a.b.c` is mutably borrowed
2. the loan path is, or has a **supporting prefix** `<base>.<field1>` that accesses a field of a union, and `lvalue` has a prefix of the form `<base>.<field2>` for a *different* field with the same base union.
- so: if `a.b` is a union with distinct fields `x` and `y`, deeply accessing a path `a.b.x.c` is illegal if `a.b.y`, `a.b.y.s`, or (in contrast with shallow accesses, and as long as both dereferences are of a `&mut T`) `*(*a.b.y.s).t` is borrowed. There `<base>` is `a.b`, `<field1>` is `y` and `<field2>` is `x`.
3. `lvalue` is a **supporting prefix** of the loan path
- supporting prefixes were defined earlier
- so: reading a path like `a` is illegal if `a.b` is mutably
borrowed, but -- in contrast with shallow accesses -- reading `a` is also
illegal if `*a` is mutably borrowed


And the accessed data and all children in its lvalue tree must be definitely initialized.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sentence fragment supposed to be connected to the "3." bullet point immediately above, or something higher up?


In addition, mutable accesses whose **shallow prefix** contains a immutable binding (`let x;`, as opposed to `let mut x;`) are only valid if the local can't possibly be initialized when the accesses occur (because deep writes require the data to be *definitely* initialized, that means that deep writes whose shallow prefix contains a immutable binding can never be valid).

**Dropping an lvalue LV.** Dropping an lvalue can be treated as a DEEP
WRITE, like a move, but this is overly conservative. The rules here
are under active development, see
Expand Down