Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Prepare an ADR on type annotations #162

Closed
konnov opened this issue Jul 10, 2020 · 56 comments
Closed

[FEATURE] Prepare an ADR on type annotations #162

konnov opened this issue Jul 10, 2020 · 56 comments
Assignees
Labels
doc Documentation FTC-Snowcat Feature: Fully-functional type checker Snowcat new New issue to be triaged.

Comments

@konnov
Copy link
Collaborator

konnov commented Jul 10, 2020

The current approach to type annotations in Apalache is quite hacky, for several reasons:

  • we did not want to write own parser for types and introduced a convention, which is okayish but not natural
  • we wanted to wrap problematic expressions such as {} and <<>> instead of declaring the types of variables, constants, operators, and bound variables.

With the new type inference by Jure @Kukovec (to be integrated), we have to find a good way to integrate type information into TLA+ code without touching SANY (the TLA+ parser). Keeping type information on a side, e.g., in a separate file, is not a nice solution.

I see the following approach. We write our own syntax for the type system (actually, we have one already, just no parser yet) and annotate constants, variables, operators and bound names as in the following example:

EXTENDS Sequences
CONSTANTS Range
Range_type == "set(z)"

VARIABLES list
list_type == "seq(z)"

Mem(e, es) ==
  LET Mem_type == "<a, seq(a)> => Bool" IN
  \E i \in DOMAIN es:
    LET i_type == "Int" IN
    e = es[i]

Init ==
  LET Init_type == "<> => Bool" IN
  list = <<>>

Next ==
  LET Next_type == "<> => Bool" IN
  \E e \in Range:
    LET e_type == "set(z)" IN
    list' = Append(list, e)

It is again "convention-over-configuration". Note that types are specified as strings. What I like about it is that operators are easy to write and they are available almost everywhere. However, we should think about type annotations for set comprehensions.

@konnov konnov added new New issue to be triaged. doc Documentation labels Jul 10, 2020
@konnov konnov added this to the tool-tlatypes milestone Jul 10, 2020
@lemmy
Copy link
Contributor

lemmy commented Jul 10, 2020

Just a general question: Why is touching SANY a no-go?

@konnov
Copy link
Collaborator Author

konnov commented Jul 10, 2020

We are still in the exploration phase and it would be a bad idea to change the syntax all the time. As far as I understand, Leslie also wants to keep the syntax minimalistic (kind of).

@lemmy
Copy link
Contributor

lemmy commented Jul 10, 2020

(I assume that the ratio of type annotations to the actual behavior spec is constant in the size of the spec)

The examples above don't really look minimalistic from a reader's perspective. Our eyes haven't been trained to filter types in TLA+ specs, but the type annotations make it more difficult to understand the true meaning of the spec. Perhaps, this is Leslie's main reason to keep the language minimalistic. As a tool developer, I'm worried that types-as-strings will make it hard for IDEs or end up having poor ergonomics. By the way, how does this work with ASSUME and the adopted best practice of TypeOK invariants? Will it be the user's responsibility to keep them in sync?

@konnov
Copy link
Collaborator Author

konnov commented Jul 10, 2020

The purpose of these type annotations is not to have them written by the user. Though it should be possible. The goal is to have transparent type information (maybe inferred by different tools) right in the spec. Once we have integrated our new type inference, every tool developer will be able to use them, without extending the TLA+ parser or using an external library. The current approach to developing all analyses in a black box makes it harder to integrate tools at all levels. On second thought, the approach presented here is not that different from Pluscal, where the whole code is written in a huge comment.

Think of these types like types in python. If you really want to write the type annotations, you can. But, ideally, it should be done by a type inference engine.

I think there is really a difference between TypeOK, which is a simple invariant on variable values, and types that are needed to do any kind of static analysis, translation, etc. Perhaps, these latter types do not add much value to the understanding of a specification, but they are absolutely essential for tooling.

@ebuchman
Copy link

So tools would be expected to auto update the specs with inferred types? And/or does the user need to provide some hints?

Why not follow the Pluscal convention and have these types in comments?

And could you give an example of how these type annotations would differ from a TypeOk ?

@konnov
Copy link
Collaborator Author

konnov commented Jul 12, 2020

I gave it a bit of thought. Ideally, the authors don't have to write type annotations at all, like they (almost never) do in OCaml or Isabelle. So the tools should infer the types, as they need them. However, there are three reasons why, I believe, the types must be transparent:

  1. It is much easier for the user to understand which types are assigned to TLA+ expressions by inspecting an annotated TLA+ file.
  2. If somebody is not happy with our type inference algorithm, they can write one of their own and produce a compatible TLA+ file.
  3. Our tools are still buggy. For instance, I know exactly how to fix the problem in [BUG] Out of order assignments #157 with a user annotation (in that case, we are talking about assignments, not types). But instead, we have to wait until the bug in the assignment analyzer is fixed. As a user, I am extremely annoyed by the inability to hack away the problem.

If we are talking about an ideal scenario, I would like to have something like Java annotations or LLVM intrinsics, that is, a way to store metadata right in the code, but without introducing additional code constructs, as metadata depends on the tool. I suspect that it is quite a bit of work on the SANY parser to introduce Java-like annotations. @lemmy, what is your opinion?

As for annotations in the comments, there are two problems:

  1. If we are talking about annotations in one large comment, there is no way to refer to an individual operator/variable name. We solve this problem in Apalache by renaming operators and variables, so every name is unique. But obviously this is too much to ask from the user.
  2. I do not see a simple way of writing an annotation preprocessor similar to javadoc, as modules can be instantiated. So the preprocessor has to understand the semantics of INSTANCE, in order to propagate annotations in the instances.

So the case of Pluscal is simple. You take one language in a large comment and translate it into one TLA+ spec. Here we are talking about type annotations for individual variables, operators, bound variables, etc.

Finally, TypeOK is an arbitrary TLA+ predicate. Even we assume that TypeOK looks like A_1 /\ \dots /\ A_n, where every A_i has the form x_i \in S_i, there is still plenty of freedom on how one writes S_i. For instance, if I write x \in 1..10, is 1..10 a type? Some people say "yes", some people say "no". Some people say, it is a refinement type {i \in Int | 1 <= i <= 10}. If we don't like to go in undecidable reasoning about types (and I don't) and only want to map expressions to simple types, we better write types as terms. For instance, we have this grammar in the OOPSLA paper:

τ ::= Name | Bool | Int | τ → τ | Set[τ] | Seq[τ] | τ ∗ ... ∗ τ | [h_1: τ, ..., h_k: τ]

(It does not work for operators, but let's ignore that fact.) Then, I would like to have an annotation x_type = "Int". Of course, I can write x \in Int in TypeOK, and that looks fine. But then I like to give a function type from a set of integers to a set of integers, that is, f_type = "Set[Int] → Set[Int]". In TypeOK, it is getting ugly: f \in [SUBSET Int -> SUBSET Int]. I also suspect that this TypeOK also becomes useless for TLC, as it would have to deal with SUBSET Int. So in practice, one would write something like f \in [SUBSET Proc -> SUBSET Proc], as the spec most likely talks only about integers from a constant set Proc. The type annotations in Apalache are actually so weird because I could not find a nice and concise way of expressing the types that derive from τ just by using TLA+ values.

To sum up, TypeOK is a useful predicate and it is a good practice to add one. But TypeOK inevitably talks about types as sets of values (basically, you have no other choice in TLA+). For the analysis tools, especially the symbolic tools, we care less about the precise sets of values, but we need to know variable sorts (integers, Booleans, sets of integers, sequences of sets of integers).

I guess this is actually the beginning of the ADR. Sorry if the text sounds a bit too aggressive or patronizing. I just had all these things for too long in my head. Now I am happy to have them in this issue :-)

@konnov
Copy link
Collaborator Author

konnov commented Jul 13, 2020

EXTENDS Sequences
CONSTANTS Range
(*                     ^@type("set(z)") *)

VARIABLES list
(*                  ^@type("seq(z)") *)

Mem(e, es) ==
(*
^@type("<a, seq(a)> => Bool") *)
  \E i, j \in DOMAIN es:
(*   ^@type("Int")
         ^@type("Int") *)
    e = es[i]

Init ==
(*
^@type("<> => Bool") *)
  list = <<>>

Next ==
(*@
^@type("<> => Bool") @*)
  \E e \in Range:
(*@^@type("set(z)") @*) 
    list' = Append(list, e)

@konnov konnov closed this as completed Jul 13, 2020
@konnov konnov reopened this Jul 13, 2020
@konnov
Copy link
Collaborator Author

konnov commented Jul 13, 2020

That looks completely different from what I expected, but the idea here is that we can introduce annotations graphically right in the comments. Somehow, github preview has a different idea of spaces than the final formatter.

@konnov
Copy link
Collaborator Author

konnov commented Jul 13, 2020

Another solution proposed by Leslie Lamport:

  THEOREM \A a : \A e \in a, es \in Seq(a) : Mem(e, es) \in BOOLEAN

Using the (little-used) TLA+ method of referring to subformulas, this could also be used to provide type declarations for expressions inside definitions. These type theorems would also be useful in writing proofs.

@lemmy
Copy link
Contributor

lemmy commented Jul 13, 2020

Assuming I understand you correctly, I think it's a good idea to defer the how of the visualization representation of types to IDEs. E.g. the Toolbox could show the type annotations on-demand in mouse hovers and basic text editors can show them in-lined. Pretty-printed specs would simply exclude all type annotations.

As to writing types when necessary, Leslie's idea of using theorems and the method of referring to subformulas would probably require IDEs to provide refactoring support for (positional) references because - contrary to TLAPS proofs - the spec is unfinished/unstable when a user writes type annotations. On the other hand, it's probably good advice to use labeled references and rely on positional ones only for "glassbox" specs.

@konnov
Copy link
Collaborator Author

konnov commented Jul 13, 2020

I like the approach of using positioned annotations in a comment below an expression. It also gives us a way to introduce all kinds of annotations, not only type annotations. Leslie's idea is interesting, as the theorem could be also proven in TLAPS, if one really wants to do that. But then the types will be far away from the place where they are used.

@lemmy
Copy link
Contributor

lemmy commented Jul 13, 2020

About types being far away: The drawback could be alleviated by IDEs showing them in hovers/overlays and commands that navigate the cursor back and forth (similar to the PlusCal <> TLA+ mapping commands).

@konnov
Copy link
Collaborator Author

konnov commented Jul 13, 2020

As a vim user, I am a bit skeptical about putting too much trust on IDEs :-)

@lemmy
Copy link
Contributor

lemmy commented Jul 13, 2020

It's 2020 and vim still can do something like overlays/hovers/...? :-p Teasing aside, TLA+ has several features (PCal mapping, TLAPS annotations, profiling annotations), which only exist in the Toolbox.

Btw. I intentionally wrote IDEs and not "Toolbox" because all of this should be implemented via e.g. LSP in an IDE agnostic way.

@konnov
Copy link
Collaborator Author

konnov commented Jul 13, 2020

Btw. I intentionally wrote IDEs and not "Toolbox" because all of this should be implemented via e.g. LSP in an IDE agnostic way.

Do you think it makes sense to have TLA+ on the LSP list?

Btw, there are two kinds of annotations: the user annotations (rare but possible) and output annotations. I see how LSP would work for output annotations. Leslie's approach works for both. The approach with comments also works for both, though the formatting is a bit of pain.

@konnov
Copy link
Collaborator Author

konnov commented Jul 13, 2020

I guess, vim's approach is to use folding/unfolding, which works well with comments :)

@konnov
Copy link
Collaborator Author

konnov commented Jul 13, 2020

LSP looks interesting. Maybe we do not need type annotations tightly integrated into the source code.

@shonfeder, you mentioned LSP in other contexts several times. What is your take on that?

@lemmy
Copy link
Contributor

lemmy commented Jul 13, 2020

Taking a step back, I wonder if it is really helpful to distinguish between user and output type annotations where they are shown. Won't users just care for what the type inference engine makes out of user-provided annotations? In other words, why not only ever show output annotations regardless from where the type inference engine sourced its information. Taking another step back, I would probably prefer to read the result of type inference as TLA+ expressions instead of a new syntax (what's <a, seq(a)> => Bool>?).

(*   ^@type("Int")
         ^@type("Int") *)

vs

(*  i \in Int
         j \in Int *)

What LSP list are you referring to (note that there exists no LSP for TLA+ yet)? LSP is not a proposal on its own, but just a suggestion on how to implement any proposal in a way that is sufficiently IDE agnostic to allow for making IDE features part of a proposal.

@konnov
Copy link
Collaborator Author

konnov commented Jul 13, 2020

What LSP list are you referring to (note that there exists no LSP for TLA+ yet)?

That is what I meant. TLA+ is not on that list ;-)

@konnov
Copy link
Collaborator Author

konnov commented Jul 13, 2020

<a, seq(a)> => Bool

This is an operator type. It takes two arguments (a type variable a and a sequence of a elements) and returns a Boolean.

@lemmy
Copy link
Contributor

lemmy commented Jul 13, 2020

It looked at implementing an LSP for TLA+ last summer and there are many use cases for when an LSP would be useful. Perhaps, types are the tipping point to finally implement an LSP.

@konnov
Copy link
Collaborator Author

konnov commented Jul 13, 2020

Taking another step back, I would probably prefer to read the result of type inference as TLA+ expressions instead of a new syntax

I understand that wish, but there are two problems: (1) typically type grammars are more concise than the types-as-sets syntax, (2) it is not clear how to nicely express some types, e.g., the type of a function from Int to Int, or an operator over an Int and a high-level operator returning a Boolean.

@lemmy
Copy link
Contributor

lemmy commented Jul 13, 2020

<a, seq(a)> => Bool

This is an operator type. It takes two arguments (a type variable a and a sequence of a elements) and returns a Boolean.

What I meant is that it is new syntax that increases the mental burden, which, perhaps, is fine if I only have to read it rarely but probably an obstacle if I have to type it (especially infrequently).

@konnov
Copy link
Collaborator Author

konnov commented Jul 20, 2020

For the operators, we could also use the inverse notation, as suggested by @shonfeder, e.g.:

A(x, y) == "(Int, Int) => Int" :>
  x + y

This is actually quite similar to Isabelle/HOL. We just use :> instead of where. (We could even think of writing inductive definitions by using CASE.)

@shonfeder
Copy link
Contributor

shonfeder commented Jul 21, 2020

Good catch re: typing records! fwiw, the string expressions inside a special operator -- as you're showing here -- look pretty good to me.

@andrey-kuprianov
Copy link
Contributor

A(x, y) == "(Int, Int) => Int" :>
  x + y

As a potential user writing type annotations I kind of like that approach -- it looks pretty neat :)

@konnov
Copy link
Collaborator Author

konnov commented Jul 29, 2020

I just realized that SANY rejects assumptions over state variables, so we cannot write e.g. ASSUME(x <: "Str") for a VARIABLE x. This kills yet another neat solution :(

@konnov
Copy link
Collaborator Author

konnov commented Jul 29, 2020

It looks like type annotations in comments are the most stable solution after all. That is, points 1.3 and 2.3 in ADR1 #179. Alternatively, we can use my hack, that is, a combination of 1.3 and 2.4, but I agree that it is more of a hack.

@shonfeder
Copy link
Contributor

Oh well... at least that helps make the decision easier!

@konnov
Copy link
Collaborator Author

konnov commented Jul 30, 2020

It's not easy to extract annotations from the comments unless we really integrate with the SANY lexer. So we might still consider the combination of <:, :> and a type axiom, similar to TypeOK:

TypeAxiom ==
  /\ RM <: "Set(RM)"
  /\ rmState <: "RM -> Str"
  /\ tmState <: "Str"
  /\ tmPrepared <: "Set(RM)"
  /\ msgs <: "Set([type: Str, rm: RM])"

In this example, RM is a global type variable, which can be instantiated with any type. I think this solution is conceptually much simpler than writing an annotation preprocessor.

@lemmy
Copy link
Contributor

lemmy commented Jul 30, 2020

TypeAxiom looks similar to what an earlier prototype of a symbolic variant of TLC ("BTLC") intended to use: /~https://github.com/tlaplus/tlaplus/blob/ab14a33e39c78e4c88e81b664b9a8c916b943cab/tlatools/org.lamport.tlatools/src/tlc2/tool/impl/ModelConfig.java#L63-L64

@konnov
Copy link
Collaborator Author

konnov commented Jul 30, 2020

I was actually quite surprised that one can only write ASSUME over constants. This is in contrast with the assume/assert in model checkers.

@lemmy
Copy link
Contributor

lemmy commented Jul 30, 2020

TLC!Assert

@konnov
Copy link
Collaborator Author

konnov commented Jul 30, 2020

Right. Normally, one would restrict values with assume and check them with assert, e.g., Boogie has it. TLC has assert, but TLA+'s ASSUME is quite restricted.

@konnov
Copy link
Collaborator Author

konnov commented Jul 31, 2020

I have renamed the previous document to 001rfc-types.md, as it contains all available options.

The actual ADR docs/adr/002adr-types.md contains a concrete proposal on the type grammar and type annotations. See PR #179.

I already love these type annotations, even without having any type checker for them :-)

@konnov
Copy link
Collaborator Author

konnov commented Aug 2, 2020

Leslie Lamport proposes to use the proofs syntax for the higher-order operators as well:

THEOREM BarType ==
  ASSUME NEW G(_,_),
         \A x \in Int, y \in STRING : G(x,y) \in Int
         PROVE  Bar(G) \in BOOLEAN

Similar to that, we can write a theorem about the type of Foo:

THEOREM FooType ==
  \A a \in Int: \A b \in STRING: Foo(a, b) \in Int

This looks like a great way of integrating the output of type inference into a TLA+ specification. Theorems like that could be used for the manual proofs with TLAPS.

@konnov
Copy link
Collaborator Author

konnov commented Aug 2, 2020

If we like to write types interactively, that is, in the process of spec writing, ADR-002 gives us a concise way to do that.

@konnov
Copy link
Collaborator Author

konnov commented Aug 5, 2020

Here is a solution to type annotations suggested by Leslie Lamport. The method SemanticNode.getPreComments in SANY returns the comments that are immediately preceding an AST node. Since SANY keeps the comments, we can extract a type annotation from the comments.

This is an interesting approach and it probably increases usability in comparison to what we have in ADR-002. For instance, the user could write:

---- MODULE TypedExample ----
VARIABLE
  \* type: Set(Int)
  S

\* type: (Int) => Bool
Mem(e) == \E x \in S: x = e

\* type: () => Bool
Init ==
  S = {}
...

The syntax in the comments is quite minimalistic and still easy to parse. Interestingly, SANY is keeping comments as an array, so it should be quite easy to extract the above annotations from the comments.

@konnov konnov modified the milestones: v0.10.0-tool-typeinfer, v0.9.0-tool-typecheck Sep 23, 2020
@konnov
Copy link
Collaborator Author

konnov commented Sep 23, 2020

The ADR has not changed much since the discussion. I find it amazing that the type checker is working exactly as we discussed in this issue. There are two-level of annotations: (1) AssumeType and :> (soon to be pushed), and (2) in-comment annotations (to be implemented). The annotations of the first kind are needed for the internal tooling, whereas the annotations of the second kind are needed for better UX.

@konnov
Copy link
Collaborator Author

konnov commented Sep 30, 2020

Just found that :> is also defined by the TLC module. We need a different operator in Typing.tla, before we move to in-comment annotations.

@konnov
Copy link
Collaborator Author

konnov commented Sep 30, 2020

As a temporary solution, I will change :> to ## in the type checker.

@konnov
Copy link
Collaborator Author

konnov commented Dec 6, 2020

The ADR has not been changing for several months. Closing the issue.

@konnov konnov closed this as completed Dec 6, 2020
@konnov konnov added the FTC-Snowcat Feature: Fully-functional type checker Snowcat label Dec 6, 2020
@konnov konnov modified the milestones: v0.9.0-tool-typecheck, backlog2020 Dec 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation FTC-Snowcat Feature: Fully-functional type checker Snowcat new New issue to be triaged.
Projects
None yet
Development

No branches or pull requests

6 participants