Skip to content

Commit

Permalink
Merge pull request rust-lang#28 from nikomatsakis/master
Browse files Browse the repository at this point in the history
add query + incremental section and restructure a bit
  • Loading branch information
nikomatsakis authored Jan 29, 2018
2 parents 3b4fab4 + bf77592 commit 688d1b0
Show file tree
Hide file tree
Showing 6 changed files with 478 additions and 15 deletions.
11 changes: 7 additions & 4 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,19 @@
- [Using the compiler testing framework](./running-tests.md)
- [Walkthrough: a typical contribution](./walkthrough.md)
- [High-level overview of the compiler source](./high-level-overview.md)
- [Queries: demand-driven compilation](./query.md)
- [Incremental compilation](./incremental-compilation.md)
- [The parser](./the-parser.md)
- [Macro expansion](./macro-expansion.md)
- [Name resolution](./name-resolution.md)
- [HIR lowering](./hir-lowering.md)
- [The HIR (High-level IR)](./hir.md)
- [The `ty` module: representing types](./ty.md)
- [Type inference](./type-inference.md)
- [Trait resolution](./trait-resolution.md)
- [Type checking](./type-checking.md)
- [MIR construction](./mir-construction.md)
- [MIR borrowck](./mir-borrowck.md)
- [MIR optimizations](./mir-optimizations.md)
- [The MIR (Mid-level IR)](./mir.md)
- [MIR construction](./mir-construction.md)
- [MIR borrowck](./mir-borrowck.md)
- [MIR optimizations](./mir-optimizations.md)
- [trans: generating LLVM IR](./trans.md)
- [Glossary](./glossary.md)
19 changes: 10 additions & 9 deletions src/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,24 @@ AST | the abstract syntax tree produced by the syntax crate
codegen unit | when we produce LLVM IR, we group the Rust code into a number of codegen units. Each of these units is processed by LLVM independently from one another, enabling parallelism. They are also the unit of incremental re-use.
cx | we tend to use "cx" as an abbrevation for context. See also `tcx`, `infcx`, etc.
DefId | an index identifying a definition (see `librustc/hir/def_id.rs`). Uniquely identifies a `DefPath`.
HIR | the High-level IR, created by lowering and desugaring the AST. See `librustc/hir`.
HIR | the High-level IR, created by lowering and desugaring the AST ([see more](hir.html))
HirId | identifies a particular node in the HIR by combining a def-id with an "intra-definition offset".
'gcx | the lifetime of the global arena (see `librustc/ty`).
'gcx | the lifetime of the global arena ([see more](ty.html))
generics | the set of generic type parameters defined on a type or item
ICE | internal compiler error. When the compiler crashes.
infcx | the inference context (see `librustc/infer`)
MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans. Defined in the `src/librustc/mir/` module, but much of the code that manipulates it is found in `src/librustc_mir`.
obligation | something that must be proven by the trait system; see `librustc/traits`.
MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans ([see more](./mir.html))
obligation | something that must be proven by the trait system ([see more](trait-resolution.html))
local crate | the crate currently being compiled.
node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.
query | perhaps some sub-computation during compilation; see `librustc/maps`.
provider | the function that executes a query; see `librustc/maps`.
query | perhaps some sub-computation during compilation ([see more](query.html))
provider | the function that executes a query ([see more](query.html))
sess | the compiler session, which stores global data used throughout compilation
side tables | because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node.
span | a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more.
substs | the substitutions for a given generic type or item (e.g., the `i32`, `u32` in `HashMap<i32, u32>`)
tcx | the "typing context", main data structure of the compiler (see `librustc/ty`).
tcx | the "typing context", main data structure of the compiler ([see more](ty.html))
'tcx | the lifetime of the currently active inference context ([see more](ty.html))
trans | the code to translate MIR into LLVM IR.
trait reference | a trait and values for its type parameters (see `librustc/ty`).
ty | the internal representation of a type (see `librustc/ty`).
trait reference | a trait and values for its type parameters ([see more](ty.html)).
ty | the internal representation of a type ([see more](ty.html)).
4 changes: 2 additions & 2 deletions src/hir-lowering.md → src/hir.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# HIR lowering
# The HIR

The HIR -- "High-level IR" -- is the primary IR used in most of
rustc. It is a desugared version of the "abstract syntax tree" (AST)
Expand Down Expand Up @@ -116,4 +116,4 @@ associated with an **owner**, which is typically some kind of item
(e.g., a `fn()` or `const`), but could also be a closure expression
(e.g., `|x, y| x + y`). You can use the HIR map to find the body
associated with a given def-id (`maybe_body_owned_by()`) or to find
the owner of a body (`body_owner_def_id()`).
the owner of a body (`body_owner_def_id()`).
139 changes: 139 additions & 0 deletions src/incremental-compilation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Incremental compilation

The incremental compilation scheme is, in essence, a surprisingly
simple extension to the overall query system. We'll start by describing
a slightly simplified variant of the real thing, the "basic algorithm", and then describe
some possible improvements.

## The basic algorithm

The basic algorithm is
called the **red-green** algorithm[^salsa]. The high-level idea is
that, after each run of the compiler, we will save the results of all
the queries that we do, as well as the **query DAG**. The
**query DAG** is a [DAG] that indices which queries executed which
other queries. So for example there would be an edge from a query Q1
to another query Q2 if computing Q1 required computing Q2 (note that
because queries cannot depend on themselves, this results in a DAG and
not a general graph).

[DAG]: https://en.wikipedia.org/wiki/Directed_acyclic_graph

On the next run of the compiler, then, we can sometimes reuse these
query results to avoid re-executing a query. We do this by assigning
every query a **color**:

- If a query is colored **red**, that means that its result during
this compilation has **changed** from the previous compilation.
- If a query is colored **green**, that means that its result is
the **same** as the previous compilation.

There are two key insights here:

- First, if all the inputs to query Q are colored green, then the
query Q **must** result in the same value as last time and hence
need not be re-executed (or else the compiler is not deterministic).
- Second, even if some inputs to a query changes, it may be that it
**still** produces the same result as the previous compilation. In
particular, the query may only use part of its input.
- Therefore, after executing a query, we always check whether it
produced the same result as the previous time. **If it did,** we
can still mark the query as green, and hence avoid re-executing
dependent queries.

### The try-mark-green algorithm

The core of the incremental compilation is an algorithm called
"try-mark-green". It has the job of determining the color of a given
query Q (which must not yet have been executed). In cases where Q has
red inputs, determining Q's color may involve re-executing Q so that
we can compare its output; but if all of Q's inputs are green, then we
can determine that Q must be green without re-executing it or inspect
its value what-so-ever. In the compiler, this allows us to avoid
deserializing the result from disk when we don't need it, and -- in
fact -- enables us to sometimes skip *serializing* the result as well
(see the refinements section below).

Try-mark-green works as follows:

- First check if there is the query Q was executed during the previous
compilation.
- If not, we can just re-execute the query as normal, and assign it the
color of red.
- If yes, then load the 'dependent queries' that Q
- If there is a saved result, then we load the `reads(Q)` vector from the
query DAG. The "reads" is the set of queries that Q executed during
its execution.
- For each query R that in `reads(Q)`, we recursively demand the color
of R using try-mark-green.
- Note: it is important that we visit each node in `reads(Q)` in same order
as they occurred in the original compilation. See [the section on the query DAG below](#dag).
- If **any** of the nodes in `reads(Q)` wind up colored **red**, then Q is dirty.
- We re-execute Q and compare the hash of its result to the hash of the result
from the previous compilation.
- If the hash has not changed, we can mark Q as **green** and return.
- Otherwise, **all** of the nodes in `reads(Q)` must be **green**. In that case,
we can color Q as **green** and return.

<a name="dag">

### The query DAG

The query DAG code is stored in
[`src/librustc/dep_graph`][dep_graph]. Construction of the DAG is done
by instrumenting the query execution.

One key point is that the query DAG also tracks ordering; that is, for
each query Q, we noy only track the queries that Q reads, we track the
**order** in which they were read. This allows try-mark-green to walk
those queries back in the same order. This is important because once a subquery comes back as red,
we can no longer be sure that Q will continue along the same path as before.
That is, imagine a query like this:

```rust,ignore
fn main_query(tcx) {
if tcx.subquery1() {
tcx.subquery2()
} else {
tcx.subquery3()
}
}
```

Now imagine that in the first compilation, `main_query` starts by
executing `subquery1`, and this returns true. In that case, the next
query `main_query` executes will be `subquery2`, and `subquery3` will
not be executed at all.

But now imagine that in the **next** compilation, the input has
changed such that `subquery` returns **false**. In this case, `subquery2` would never
execute. If try-mark-green were to visit `reads(main_query)` out of order,
however, it might have visited `subquery2` before `subquery1`, and hence executed it.
This can lead to ICEs and other problems in the compiler.

[dep_graph]: /~https://github.com/rust-lang/rust/tree/master/src/librustc/dep_graph

## Improvements to the basic algorithm

In the description basic algorithm, we said that at the end of
compilation we would save the results of all the queries that were
performed. In practice, this can be quite wasteful -- many of those
results are very cheap to recompute, and serializing + deserializing
them is not a particular win. In practice, what we would do is to save
**the hashes** of all the subqueries that we performed. Then, in select cases,
we **also** save the results.

This is why the incremental algorithm separates computing the
**color** of a node, which often does not require its value, from
computing the **result** of a node. Computing the result is done via a simple algorithm
like so:

- Check if a saved result for Q is available. If so, compute the color of Q.
If Q is green, deserialize and return the saved result.
- Otherwise, execute Q.
- We can then compare the hash of the result and color Q as green if
it did not change.

# Footnotes

[^salsa]: I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis
6 changes: 6 additions & 0 deletions src/mir.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# The MIR (Mid-level IR)

TODO

Defined in the `src/librustc/mir/` module, but much of the code that
manipulates it is found in `src/librustc_mir`.
Loading

0 comments on commit 688d1b0

Please sign in to comment.