Skip to content

Latest commit

 

History

History
109 lines (82 loc) · 2.59 KB

common.md

File metadata and controls

109 lines (82 loc) · 2.59 KB

Describing common syntactic structures

This chapter of the documentation shows how to implement common grammar patterns like lists & repetitions in a way that's most efficient for a LALR(1) parser like Dissect.

List of 1 or more Foos

$this('Foo+')
    ->is('Foo+', 'Foo')
    ->call(function ($list, $foo) {
        $list[] = $foo;

        return $list;
    })

    ->is('Foo')
    ->call(function ($foo) {
        return [$foo];
    });

With some practice, it's very easy to see how this works: when the parser recognizes the first Foo, it reduces it to a single-item array and for each following Foo, it just pushes it onto the array.

Note that Foo+ is just a rule name, it could be equally well called Foos, ListOfFoo or anything else you feel like.

List of 0 or more Foos

$this('Foo*')
    ->is('Foo*', 'Foo')
    ->call(function ($list, $foo) {
        $list[] = $foo;

        return $list;
    })

    ->is(/* empty */)
    ->call(function () {
        return [];
    });

This works pretty much the same like the previous example, the only difference being that we allow Foo* to match nothing.

A comma separated list

The first example of this chapter is trivial to modify to include commas between the Foos. Just change the second line to:

$this('Foo+')
    ->is('Foo+', ',', 'Foo')
...

The second example, however, cannot be modified so easily. We cannot just put a comma in the first alternative:

$this('Foo*')
    ->is('Foo*', ',', 'Foo')
...

since that would allow the list to start with a comma:

, Foo , Foo , Foo

Instead, we say that a "list of zero or more Foos separated by commas" is actually "a list of one or more Foos separated by commas or nothing at all". So our rule now becomes:

$this('Foo*')
    ->is('Foo+')

    ->is(/* empty */)
    ->call(function () {
        return [];
    });

$this('Foo+')
    ->is('Foo+', ',', 'Foo')
...

Expressions

A grammar for very basic mathematical expressions is described in the chapter on parsing. It would require extensive modifications to allow for other operators, function calls, unary operators, ternary operator(s), but there's a lot of grammars for practical programming languages on the internet that you can take inspiration from.

For starters, take a look at this grammar for PHP itself.