-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grammar tests #2234
Comments
How many negative parsing tests do we have? I'm guessing not many... but we |
I also would assume that coverage of failure cases in the parser is not great, so we could kill two birds with one stone. |
Speaking of coverage, today I realized that #690 is unblocked. It might be possible for us to measure test coverage. |
Antlr is a possibility but I think it is willing to handle a lot more grammar ambiguity, and it tends to blur lexing and parsing rules. I want us to remain in the classical regular-lexing + LL(1)-parsing space. I picked the EBNF in the manual for compatibility with llnextgen, http://os.ghalkes.nl/LLnextgen/ ; I got part way into wiring up the rules for extracting and testing the grammar but didn't finish in time for 0.1, haven't come back to it yet. Lexical rules I figured we could feed to quex http://quex.sourceforge.net/ but other possibilities exist. It just seems like the current leader in the space we're interested in. |
We can use the fuzzer to find arbitrary numbers of random samples to feed to both parsers. |
good idea... |
nominating for well-defined |
Still relevant |
I had a look at this issue and started working on a parser using Flex and LLnextgen (/~https://github.com/fhahn/rust-grammer). Right now, the parser supports only a tiny tiny bit of the Rust grammer, but I wanted to make sure my approach is valid, before continuing. One main difference to the grammer specification in the documentation is that flex uses regular expressions for token definitions, not ebnf, so I started converting the ebnf from section 3 "Lexical structure" to regular expressions for flex. |
Note that grammar in the manual is highly likely to be incorrect (which is presumably exactly what this issue is aiming to address). |
Note that someone already completed an grammar months ago, it just never got used for anything yet. No idea where to find it though. |
/~https://github.com/jbclements/rust-antlr To the best of his knowledge, it was correct at the time. On Sun, Sep 15, 2013 at 9:20 AM, Marvin Löbel notifications@github.comwrote:
|
I found a extract_grammer.py script in So far, I stumbled over one part of the grammer where I think the productions in the
But I think ignoring comments should be done in the lexer, so this wouldn't be a problem for the Rust grammer being LL(1). |
@fhahn Again, the grammar fragments in the manual are useless, what should actually be done is
But thanks for showing interest for this work. :) |
What's the preferred parser generator? |
@fhahn use whatever you're comfortable with, is my suggestion. |
I'm still very keen to make this happen for 1.0. There is existing infrastructure in the tree for testing the manual's grammer with llnextgen. |
I agree, this would be very valuable. Especially when we get automated testing working. |
I'm working on updating rust-antlr. |
I've made some good progress on an LLnextgen-capable grammar at /~https://github.com/bleibig/rust-grammar. There's still a ways to go, but once it's done it shouldn't be hard to integrate the grammar into the manual and have the grammar tests work with that. |
Nominating. I want to have confidence in our grammar. |
leaving as P-high. We really would like to have a formal definition of our grammar and have it tested, but we do not think it should be a blocker for 1.0 at this time. Cc'ing @cmr since he is working on grammar stuff. |
@cmr did you end up updating rust-antlr? I wanted to have some sort of IDE support for Rust, and having an existing ANTLR grammar will make that happen a lot faster. |
@Arcnor actively working on it. |
Was going to move to the RFC repo, but let's see how #21452 shakes out. |
This adds a new lexer/parser combo for the entire Rust language can be generated with with flex and bison, taken from my project at /~https://github.com/bleibig/rust-grammar. There is also a testing script that runs the generated parser with all *.rs files in the repository (except for tests in compile-fail or ones that marked as "ignore-test" or "ignore-lexer-test"). If you have flex and bison installed, you can run these tests using the new "check-grammar" make target. This does not depend on or interact with the existing testing code in the grammar, which only provides and tests a lexer specification. OS X users should take note that the version of bison that comes with the Xcode toolchain (2.3) is too old to work with this grammar, they need to download and install version 3.0 or later. The parser builds up an S-expression-based AST, which can be displayed by giving the "-v" argument to parser-lalr (normally it only gives output on error). It is only a rough approximation of what is parsed and doesn't capture every detail and nuance of the program. Hopefully this should be sufficient for issue #2234, or at least a good starting point.
#21452 added |
It would be nice to have confidence about what sort of grammar Rust has. One possible way we could do that:
The text was updated successfully, but these errors were encountered: