Skip to content

Porting wiki

Jeremy Tan Jie Rui edited this page Jul 16, 2023 · 73 revisions

Porting status

The current state of play:

  1. mathport more or less works.
    • Lots of warts, but it gives an output that is much easier to clean up than to translate from scratch.
    • Please report errors in the output of mathport. We would like to make it better, but some things are hard!
  2. many tactics have been ported to mathlib4 (e.g. ring, linarith, norm_num) but many are still missing.
  3. Every file has been ported.
  4. We need lots of people helping on the port!

This document explains how to get involved with the porting process.

Porting procedure

High level

  1. Decide the file you wish to port; you may wish to use the port status page.
  2. Run the start_port.sh from Mathlib4 master; this copies the file from mathlib3port, runs various automated fixes and sets up a branch for you to work on.
  3. Run lake exe cache get and lake build <library-name> and review the file
  4. Publish the new branch and open a PR in Github (this automates tracking for porting status)
    • Initially tag the branch with mathlib-port and WIP
    • If you take it as far as you easily can, remove WIP and tag with help wanted
  5. Perform the various fixes noted below, pushing commits back to Github often
  6. When you are done, remove any help-wanted or WIP labels, and tag the PR with awaiting-review. Done is defined as:
    • Builds and lints cleanly using CI automation (note that some lints can be automated via fix-lints.py
    • Names follow Mathlib4 conventions
    • Non-trival changes documented via --porting note: <note>
    • Documentation updated to address name changes (note that some documentation changes can be automated via scripts/fix-comments.py)
  7. Address any review comments
  8. 🎉

Details

Note there is a video showing the whole process on a very simple file. It is slightly out of date; the instructions below should be regarded as canonical.

  1. If you're a mathlib maintainer, please remember to regularly check #queue4 for mathlib4 PRs which are ready for review/merge.
  2. If you do not already have permissions to the mathlib4 repo, ask for it on Zulip, and include your Github handle.
  3. The first task is to decide what you want to port.
    • A bot regularly posts about progress on various targets. You can also find some somewhat-out-of-date graphs here:
    • Clicking on one of the links will show you a graph of all the files leading up to that target.
      • Green nodes have already been ported.
      • Dark blue nodes have an open PR; you can click on the node to open the PR.
      • Light blue nodes are ready to be ported.
    • If you are an experienced porter, you are encouraged to prioritise helping out with any open PRs that might be "stuck".
    • For everyone, the light blue nodes that appear on these graphs are the next highest priorities.
    • Before starting to port such a node, check the port status page to check there are no comments about this file that should be considered first.
    • You can also use the port status page to find files that need porting (whether or not they are needed for any of the near-term targets). This page shows the same results as you'd get from running scripts/port_status.py in the math mathlib directory.
  4. Once you've selected a file X/Y/Z.lean to work on, you can get started.
  5. Make sure you are in the root directory of the mathlib4 repository, and run ./scripts/start_port.sh Mathlib/X/Y/Z.lean. This will automatically do the following:
    • Creates a new branch port/X.Y.Z.
    • Downloads the corresponding file from the mathlib3port repository.
    • Commits it as-is. This allows reviewers to see the diff between mathport and the manually ported version, making it easy to check aligns and similar statements.
    • Adds the new file as import to Mathlib.lean.
    • Replaces all occurences of import Mathbin. with import Mathlib., running effectively
    find Mathlib -name "*.lean" | env LC_ALL=C sort | sed 's/\.lean//;s,/,.,g;s/^/import /' > Mathlib.lean
    
    • Creates a second commit with these changes.
    • Note that on OS X, the built-in version of sed does not work with the start_port.sh script. The fix is to install the GNU version (e.g. via brew install gsed) and ensure it – rather than the OS version – is the default for sed (e.g. by following the path instructions provided by brew).
  6. Already at this point you should open a PR on the mathlib4 repository. The title should be feat: port X, and you can label it as mathlib-port, and also WIP. We open the PR early so others know you are working on the file.
  7. If you want to record free-form comments about a file, please add them to the port comments.
  8. Perform a bunch of mechanical updates on the file (i.e. things that in principle mathport could do, but doesn't yet), see below.
  9. Try to get everything compiling again.
    • A helpful workflow is to go from red error to red error by pressing F8 in VScode, ignoring orange linting errors; then, once all or most red errors are gone, run scripts/fix-lints.py on the file to fix many of the remaining orange errors automatically.
    • Leave -- Porting note: ... liberally: anytime you leave something out that was in mathlib3, there should be a porting note.
    • Use -- Porting note: TODO ... for things that must be fixed before the PR can be merged.
    • Fix documentation changes via scripts/fix-comments.py
  10. It's sometimes useful to have the original mathlib file open in a separate window, so you can typecheck, discover where instances are coming from, etc. Some potential obstacles include:
    • Discovering that an import Y.lean which you thought was sufficiently well ported already in fact isn't. (Either patch it up, or go back to step 1 with X = Y.)
    • Finding that a tactic T used in a proof is still missing. (Usually for now you'll just work around it by using simpler tactics, but ambitious porters are invited to give up on porting X, and instead port T! Also check the open PR's—tactic T might be effectively ported, and just awaiting review.)
    • Encountering a surprising change in behavior in either a ported tactic or Lean core. (Ask on Zulip, create an issue, or hesitantly work around it.)
    • Make sure imports are accurate (even down to casing) - Mathport is not super great at these.
  11. If you get stuck on something, please add the label help-wanted, but also post on Zulip asking for assistance.
  12. If everything works, please remove any help-wanted or WIP labels, and add the label awaiting-review.

Implicit in this proposed scenario there is no "flag day" for all of mathlib: we just work our way up from the bottom.

In the optimistic scenario, the problems in step 10 become less and less frequent, and mathport gets some tweaks that reduce the effort required in step 8, and it becomes possible to do multiple files, or whole directories or areas in one go!

For any file that isn't already a "complete port" but has been manually partly ported, we take "starting over" as the default route.

This might involve:

  • move the existing X.lean to X2.lean
  • take X.lean from mathlib3port, and get it working as X.lean
  • import X.lean in X2.lean
  • delete everything in X2.lean that by now has already been defined in X.lean
  • find homes for or discard anything that remains (judgement required here!)
  • get everything downstream working again

You can also port from Lean 3 core, as well as from mathlib3. If you find something is missing that was in Lean 3 core, please put it in the Mathlib/Init/ directory. There is a repository lean3port which contains the output of mathport running on Lean 3 core, analogous to mathlib3port.

Some common fixes

  • change import Mathbin to import Mathlib
  • Recommendation: temporarily add the line set_option autoImplicit false -- **TODO** delete this later to the top of the file; this stops Lean from incorrectly assuming that certain incorrectly ported terms are just new variables which the user chose not to declare.
  • Add explicit imports to resolve missing dependencies where transitive imports have changed (often by identifying what the mathlib equivalent resolves to & what the mathlib4 equivalent is)
  • If a file imports another file from the tactic or meta folder: look at the status of which tactics that still need to be ported in this list. Tactics, attributes and commands that are not on this list should be working in Lean 4. If the tactics mentioned in the imports are already ported, you can import the corresponding mathlib4 files (the file name might have changed a bit). If not, check the open PR's on mathlib4 with the "meta" label—sometimes tactics are effectively ported and functional, but are not yet merged as they await review. If you find it there, feel free to merge that tactic's feature branch into your branch, and your PR will become dependent on that branch. (Note that the tactic porting tracking issue is not always up to date, so it's worth taking a look just in case.) If that fails too, you can replace the 'No' on /~https://github.com/leanprover-community/mathlib/wiki/mathlib4-port-status by 'Waiting on tactic X' or something.
  • when mathport suggests that you add in the name of an #align statement, figure out why it wants that and leave a porting note. Common reasons are:
    • a change in the order of universe variables, typeclass arguments or implicit arguments
    • some other defeq change, due to a change deep down in core or std.
  • Review the names assigned by mathport for adherence to naming conventions (see Aligning Names below) and fix and align names manually.
  • Ensure the names in documentation are consistent with the updated names. Once you are done with #aligns, ./scripts/fix-comments.py can fix many of these automatically, so save this step to late in the porting process.
  • Missing attributes:
    • For refl:
      import Mathlib.Tactic.Relation.Rfl
      
      @[refl]
      theorem foo : 1 = 1 := rfl
      
    • trans: for now use import Mathlib.Mathport.Attributes.
    • protect_proj: add protected before each field
    • If you run into other attributes, hopefully you will find them noted as needing porting in Mathlib/Mathport/Syntax.lean (which you can freely import if needed, I guess), or they should already be implemented somewhere, and searching for register.*attr should help you locate it.
  • See also the Lean 4 documentation about differences with Lean 3.
  • When declaring a structure, if a field needs to be put on a new line, then each fields need to be on a new line, or at least the first after the with keyword must be on a new line. If you hit the error "expected '}'", it's probably this.
  • For LawfulMonad bind_pure_comp_eq_map is now bind_pure_comp and there are more laws that need to be proved. LawfulMonad.mk' defaults many of these. As an example translation:
    instance : LawfulMonad (PFun α) where
      bind_pure_comp_eq_map β γ f x := funext fun a => Part.bind_some_eq_map _ _
      id_map β f := by funext a <;> dsimp [Functor.map, PFun.map] <;> cases f a <;> rfl
      pure_bind β γ x f := funext fun a => Part.bind_some.{u_1, u_2} _ (f x)
      bind_assoc β γ δ f g k := funext fun a => (f a).bind_assoc (fun b => g b a) fun b => k b a
    
    goes to
    instance : LawfulMonad (PFun α) := LawfulMonad.mk'
      (bind_pure_comp := fun f x => funext fun a => Part.bind_some_eq_map _ _)
      (id_map := fun f => by funext a ; dsimp [Functor.map, PFun.map] ; cases f a; rfl)
      (pure_bind := fun x f => funext fun a => Part.bind_some _ (f x))
      (bind_assoc := fun f g k => funext fun a => (f a).bind_assoc (fun b => g b a) fun b => k b a)
    
    (Note how implicit arguments could be omitted in Lean 4)
  • The command include and omit are gone in Lean4, you can safely delete the corresponding lines. Lean should be able to do the right thing by default.
  • Lean 4 names instances differently from Lean 3, in ways that can lead to exponential explosions in instance names. It is best to look up the instance name in Lean 3 and name instances explicitly. If you come across a reference to an instance name in another file where the instance is not explicitly named, rather than change the sensible name to instLongNameAnotherLongNameStillAnotherLongName, it is best to name the upstream instance in the other file explicitly in the PR.
  • The simplifier behaves differently, in ways that are still not fully documented. Some changes that might be needed:
    • add the lemma iff_self to a failing simp only [...], especially if the goal state looks like this might be relevant (P ↔ P): the Lean 3 simplifier did simp onlys a bit differently and that simplification was included by default. Zulip discussion, Lean 4 PR which would change this
    • a "non-confluent" simp is not guaranteed to behave the same way between Lean 3 and Lean 4. You can squeeze the Lean 3 simp to a simp only and copy it over, or you can identify the different path taken in Lean 4 by squeezing both simps and then omit the bad lemma which starts this path with simp [..., -bad_lemma]. Zulip discussion
    • If you get an error message containing "the following variables have been introduced by the implicit lambda feature", it may be that the Lean 3 code contained a proof term with a lambda introducing an implicit argument and this confused mathport. For instance you could have in Lean 3 example := ∀ {n : ℕ}, ... := λ n, bar which becomes in Lean 4 example := ∀ {n : Nat}, ... := bar. See the section about implicit lambdas in the Lean 4 manual.
  • If you get a simp can prove this lint, check if by simp can close the original goal and consider removing the simp attribute to address the lint. Put porting comments to indicate the change. If the original proof is rfl, check if by dsimp can close the goal. If not, do not remove the simp, instead add a nolint simpNF (and again, document the decision via porting notes). In some cases, the path to simp can prove this is very complicated or nonsensical - in such cases ask for guidance on the Zulip. Templates for porting notes:
    • "-- @[simp] -- Porting note: simp can prove this" -- if proof is not rfl
    • "-- @[simp] -- Porting note: dsimp can prove this" -- if proof is rfl
    • "@[simp, nolint simpNF] -- Porting note: dsimp cannot prove this" -- if proof is rfl
  • If you get simpNF lint that the left hand side can be simplified, it is often best to create a new simp lemma with the simplified LHS and remove the simp attribute from the original lemma.
  • The notation fun x y => z should be replaced by fun x y ↦ z. This change is not recommended, because it makes review more difficult and it can be done automatically later.
  • bit0 and bit1 yield deprecation warnings. You should still port lemmas that involve them, but use set_option linter.deprecated false within a section or set_option linter.deprecated false in before a single result.
  • nthLe has been deprecated for get but has a significant API change. If it's easy and targeted to refactor to get, add a porting note, and leave a deprecated version of the old lemma. If not, use set_option linter.deprecated false as for bit0/bit1 and add a porting note to refactor post-port
  • Long strings that include an explicit \n can have the explicit \n removed, and instead be broken up by actual newlines as necessary to satisfy the linter. (Single newlines in strings are ignored.)
    • \n\n should likewise be replaced with two actual newlines, as this signals a real line break in the generated documentation.
  • Sometimes a "failed to show termination" error in a recursive definition can be resolved just by specifying the type on the right hand side of a match (=>).

Workarounds for temporary issues

Equiv to ↑(RelIso.toRelEmbedding e).toEmbedding

See: https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/CoeFun/near/317644350

have : ↑(RelIso.toRelEmbedding e).toEmbedding = FunLike.coe e := rfl can often be used to simplify.

HPow and coersion

When you get coersion problems of the form r ^ ↑n : ℝ instead of r ^ n : ℝ (for n : ℕ), you can temporarily add the following at the top of the file:

local macro_rules | `($x ^ $y)   => `(HPow.hPow $x $y)

(see zulip discussion)

Linting

You can fix many linting errors automatically by running scripts/fix-lints.py on the file.

Run lints by adding #lint at the bottom of the file. Remove the #lint directive before checking in.

If the file you are importing don't give you access to the #lint command then you need to add

import Std.Tactic.Lint.Frontend
import Std.Tactic.Lint.Misc

Note that we do not expect new documentation to be added during the porting process. Writing #lint at the bottom of a file will complain about missing documentation, but this can be ignored. The documentation linter has been disabled in CI, for purposes of the port. Adding documentation will be done separately, so that it can be done with a focus on writing good documentation.

In Lean 4, notation can also be documented. This can usually be done using the @[inherit_doc] attribute, which will copy the documentation from another declaration. This attribute takes a declaration name as an optional parameter in case Lean doesn't figure out which declaration should give its documentation to the notation. In case @[inherit_doc] is not sufficient, please leave a -- Porting note: TODO add documentation.

Aligning names

In Mathlib.Mathport.Rename there are a bunch of commands of the form #align lean3_name lean4Name, but you can place them anywhere. These instruct mathport to adjust its naming on a case-by-case basis, and we need to be using it a lot more than we are. Note that mathport#187 removes the \_x's in exchange for now quite often running into severe type errors and having to stub stuff out. Basically, it will now assume that any name clashes are intentional and will use the clashed name even if it's not type correct (instead of making up \_x names), but that means that some definitions that actually do need to be different, like CoeT or Stream, need to be manually marked for realignment.

But even beyond this, whenever mathport gets a name wrong you want to use an #align directive to change the name to the desired thing, even if you have already fixed the definition, because that influences all future references of that definition.

If Lean complains it does not know about the #align command then you need to import Mathlib.Mathport.Rename.

Note that #align does not automatically include namespaces.

Naming convention

(note: These are work in progress)

We use the following conventions

  1. Terms of Props (e.g. proofs, theorem names) are in snake_case.
  2. Props and Types (or Sort) (inductive types, structures, classes) are in UpperCamelCase.
  3. Functions are named the same way as their return values (e.g. a function of type A → B → C is named as though it is a term of type C).
  4. All other terms of Types (basically anything else) are in lowerCamelCase.
  5. When something named with UpperCamelCase is part of something named with snake_case, it is referenced in lowerCamelCase.
  6. Acronyms like LE are written upper-/lowercase as a group, depending on what the first character would be.
  7. Rules 1-6 apply to fields of a structure or constructors of an inductive type in the same way.

There are some rare counter-exceptions to preserve local naming symmetry: e.g., we use Ne rather than NE to follow the example of Eq; outParam has a Sort output but is not UpperCamelCase; etc. Any such counter-exceptions should be discussed on Zulip.

Note that if a theorem contains a lowerCamelCase component with multiple words as in rule 4, mathport will usually get the name wrong and it has to be #aligned by hand.

Examples

-- follows rule 2
structure OneHom (M : Type _) (N : Type _) [One M] [One N] where
  toFun : M → N -- follows rule 4 via rule 3 and rule 7
  map_one' : toFun 1 = 1 -- follows rule 1 via rule 7

-- follows rule 2 via rule 3
class CoeIsOneHom [One M] [One N] : Prop where
  coe_one : (↑(1 : M) : N) = 1 -- follows rule 1 via rule 6

-- follows rule 1 via rule 3
theorem map_one [OneHomClass F M N] (f : F) : f 1 = 1 := sorry

-- follows rules 1 and 5
theorem MonoidHom.toOneHom_injective [MulOneClass M] [MulOneClass N] :
  Function.Injective (MonoidHom.toOneHom : (M →* N) → OneHom M N) := sorry
-- manual align is needed due to `lowerCamelCase` with several words inside `snake_case`
#align monoid_hom.to_one_hom_injective MonoidHom.toOneHom_injective

-- follows rule 2
class HPow (α : Type u) (β : Type v) (γ : Type w) where
  hPow : α → β → γ -- follows rule 3 via rule 6; note that rule 5 does not apply

-- follows rules 2 and 6
class LT (α : Type u) where
  lt : α → α → Prop -- follows rule 4 and 6

-- follows rules 1 and 6
theorem gt_iff_lt [LT α] {a b : α} : a > b ↔ b < a := sorry

-- follows rule 2; `Ne` is an exception to rule 6
class NeZero : Prop := sorry

-- follows rules 1 and 5
theorem neZero_iff {R : Type _} [Zero R] {n : R} : NeZero n ↔ n ≠ 0 := sorry
-- manual align is needed due to `lowerCamelCase` with several words inside `snake_case`
#align ne_zero_iff neZero_iff

Further conventions

irreducible_def

There is a difference between irreducible_def and @[irreducible] def, namely that irreducible_def applies to both the kernel and the elaborator, whereas @[irreducible] def only applies to the elaborator. Moreover, irreducible_def foo creates a definitional lemma for unfolding called foo_def, so one should use rw [foo_def] instead of rw [foo] or unfold foo. If you are in a situation where you see Wrapper.value✝ wrapped✝ somewhere in your goal, it is likely the result of an irreducible_def which has been unfolded improperly (i.e., not using foo_def).

Type Classes

Where Mathlib3 used has_foo Mathlib 4 will use Foo.

Examples

  • Lean 3: has_mul, Lean 4: Mul

Where Mathlib 3 used is_foo, Mathlib 4 will use IsFoo

Note, however that Core LawfulFoo names follow a different convention (which may be changed in the future).

  • Lean 3: is_lawful_monad, Lean 4: LawfulMonad

See the discussion starting here: https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/ENnReal/near/312549431

Categories

  • Categories are suffixed with Cat when a name clash would otherwise occur, e.g. GroupCat.

Coercions

In Lean 4 coercions are elaborated directly, so given

variable (f : α ≃ β) (a : α)

the application f a is syntactically equal to f.toFun a.

UPD: It seems that we have some magic that prevents Lean from unfolding past FunLike.coe, so for types with FunLike instance we have a Lean3-style behavior (thus we can have generic lemmas like map_mul). If you define a CoeFun instance instead, then it unfolds to toFun.

Similarly

variable (g : α → Prop) (x : Subtype g)

the coercion (x : α) is elaborated as x.val.

There are several different typeclasses for coercions, the most common ones being Coe, CoeFun and CoeTail. Usually, mathlib3's coe corresponds to Coe, coe_fn to CoeFun and coeTC to CoeTail. In most cases, replacing coe with Coe.coe (or one of the other CoeFoo classes) will work.

Consequences for porting

This means that we can get rid of lemmas that used to do the simplification f.to_fun x to ⇑f x as these lemmas are now syntactic tautologies. This lemma for Subtype is not needed anymore in mathlib4:

theorem val_eq_coe {x : Subtype p} : x.1 = ↑x :=
  rfl

Another consequence is that certain instances become unnecessary:

instance coe_trans [Zero M] [Coe R S] [CoeTail S M] [h : NeZero (r : M)] : NeZero ((r : S) : M) :=
  ⟨h.out⟩

If the coercion function appears explicitly in a theorem, then it needs to be replaced by the function that is used in the coercion. So for example

theorem cast_injective : injective (coe : ℕ → R) := sorry

becomes

theorem cast_injective : injective (Nat.cast : ℕ → R) := sorry

Coercion tag

To define coercions, there exists now the coe attribute to automatically create the delaborator and the norm_cast attribute. A complete definition of a coercion might look like this:

@[coe] def cast [AddGroupWithOne R] : ℤ → R := AddGroupWithOne.intCast

instance [AddGroupWithOne R] : CoeTail ℤ R where coe := cast

Known tactic issues

change ... with .../change ... at ...

  • change e defined in Mathlib.Tactic.Basic as an alias for show e, but the at form and change .. with .. are both defined without implementation in lean 4 core. These are just missing an implementation (and they should either be PR'd to lean 4 or moved to mathlib4).

In the meantime, sometimes dsimp works, sometimes an equivalent rw, sometimes a revert/change/intro, sometimes change … = _.

For change <foo> at h, replace h: <foo> := h works.

to_additive issues

  • to_additive attribute doesn't correctly translate some names. Just like in Lean 3, it has issues with flipping variable order in pow -> smul and zpow -> zsmul. In such cases, use @[to_additive my_additive_theorem] before the multiplicative theorem.
  • When applied to a structure: to_additive doesn't automatically translate projections to extended structures other than the first, in case that the extended structures have a field in common. In this case, manually add these tags for all projections other than the first. Example: attribute [to_additive AddCommGroup.toAddCommMonoid] CommGroup.toCommMonoid. You can easily see which projections are translated with attribute [to_additive? ...] ....
  • to_additive might not reliably deal with numerals or other operations on fixed types, like Nat. In this case, make a Zulip thread tagging @fpvandoorn. As a workaround, you can first manually state the additive version, and then the multiplicative version tagged with @[to_additive].
  • Docstrings appearing as strings after to_additive can be shortened to satisfy the linter by removing any explicit \n's that appear and inserting actual newlines as necessary, as single newlines are ignored in strings. \n\n should be replaced with two newlines.

Tips and tricks

Auto-implicits

Lean 4 has automatic generation for implicit types, so that the following code is valid without explicitly mentioning α or β:

variable (f : α ≃ β)

When porting this may cause errors due to mathlib3port not aligning names correctly, so it can help to turn auto-implicits off with

set_option autoImplicit false

Finding instances

#synth is useful for discovering if and where instances are available (e.g. as #synth Semiring Nat)

Reviewing and merging PRs

All mathlib maintainers are encouraged to regularly view the #queue4, and to

  • ask for changes (please change awaiting-review to awaiting-author),
  • bors merge, or
  • bors d+