Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array shapes support #30

Merged
merged 5 commits into from
Jun 7, 2019
Merged

Conversation

arnaud-lb
Copy link
Contributor

@arnaud-lb arnaud-lb commented Jun 3, 2019

Adds support for parsing array shapes (object-like arrays).

Examples:

array{'foo': int, "bar": string}
array{0: int, 1?: int}
array{int, int}
array{foo: int, bar: string}

Keys can be single quoted strings, double quoted strings, integers, or identifiers (represented as ConstExprStringNode, ConstExprIntegerNode, IdentifierTypeNode).

They can be marked optional with a ? before the key and the :.

They can be omitted entirely.

Values can be any type, including nullable types or nested array shapes.

@arnaud-lb
Copy link
Contributor Author

Related: phpstan/phpstan#2173

@ondrejmirtes
Copy link
Member

I love it! 😍 But looks like you've forgot to add those new classes in. The build (after my CS fix) fails on:

1) Warning
The data provider specified for PHPStan\PhpDocParser\Parser\TypeParserTest::testParse is invalid.
Class 'PHPStan\PhpDocParser\Ast\Type\ArrayShapeNode' not found

BTW: The optional keys in PHPStan are right now modeled with a union of two constant arrays - with the keys and without the keys. Unfortunately, this leads to some wrongly interpreted code. We can improve that later for example by introducing optional keys to ConstantArrayType but that brings more work. I'd continue represent them with unions of two arrays right now.

@@ -53,6 +55,9 @@ private function parseAtomic(TokenIterator $tokens): Ast\Type\TypeNode

} elseif ($tokens->isCurrentTokenType(Lexer::TOKEN_OPEN_SQUARE_BRACKET)) {
$type = $this->tryParseArray($tokens, $type);

} elseif ($type->name === 'array' && $tokens->isCurrentTokenType(Lexer::TOKEN_OPEN_CURLY_BRACKET)) {
$type = $this->parseArrayShape($tokens, $type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be maybe tryParseArrayShape, so that /** @return array {curly description} */ is not suddenly invalid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with tryParse is that it silently ignores errors. An array shape specification will be silently ignored if it has a minor typo in it.

Alternatively, if we don't allow spaces between array and {, this would prevent the problem you mention. We could do that by adding an array{ token. What do you think ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with tryParse is that it silently ignores errors.

That's not a problem of tryParse but inherent problem of phpDoc which is designed to be human friendly and not computer friendly (same problem as markdown).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without tryParse it's a BC break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what do you think of not allowing a whitespace between array and { ? This would prevent the BC break, while also avoiding silenced errors (which are not user friendly).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/Parser/TypeParser.php Outdated Show resolved Hide resolved
@arnaud-lb arnaud-lb force-pushed the array-shapes branch 2 times, most recently from f93ce08 to d5c7122 Compare June 4, 2019 07:31
src/Parser/TypeParser.php Show resolved Hide resolved
@@ -53,6 +55,9 @@ private function parseAtomic(TokenIterator $tokens): Ast\Type\TypeNode

} elseif ($tokens->isCurrentTokenType(Lexer::TOKEN_OPEN_SQUARE_BRACKET)) {
$type = $this->tryParseArray($tokens, $type);

} elseif ($type->name === 'array' && $tokens->isCurrentTokenType(Lexer::TOKEN_OPEN_CURLY_BRACKET)) {
$type = $this->parseArrayShape($tokens, $type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with tryParse is that it silently ignores errors.

That's not a problem of tryParse but inherent problem of phpDoc which is designed to be human friendly and not computer friendly (same problem as markdown).

@@ -53,6 +55,9 @@ private function parseAtomic(TokenIterator $tokens): Ast\Type\TypeNode

} elseif ($tokens->isCurrentTokenType(Lexer::TOKEN_OPEN_SQUARE_BRACKET)) {
$type = $this->tryParseArray($tokens, $type);

} elseif ($type->name === 'array' && $tokens->isCurrentTokenType(Lexer::TOKEN_OPEN_CURLY_BRACKET)) {
$type = $this->parseArrayShape($tokens, $type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without tryParse it's a BC break.

src/Parser/TypeParser.php Outdated Show resolved Hide resolved
src/Parser/TypeParser.php Outdated Show resolved Hide resolved
src/Ast/Type/ArrayItemNode.php Outdated Show resolved Hide resolved
src/Ast/Type/ArrayItemNode.php Outdated Show resolved Hide resolved
src/Ast/Type/ArrayShapeNode.php Show resolved Hide resolved
src/Ast/Type/ArrayShapeNode.php Outdated Show resolved Hide resolved
doc/grammars/type.abnf Outdated Show resolved Hide resolved
@JanTvrdik
Copy link
Collaborator

I wonder if we should make array{ byte sequence a new token instead of having array and { as two tokens. Mostly depends on whether allowing array {string, int} is desired or not.

Also it should be noted that allowing quoted string (which psalm does not allow) introduces grammar inconsistency (because it creates a place which looks like it expects constants expression but identifiers are not treated as constants) so I'm not sure it's a good idea.

@JanTvrdik
Copy link
Collaborator

cc @muglug

@arnaud-lb
Copy link
Contributor Author

arnaud-lb commented Jun 4, 2019

I wonder if we should make array{ byte sequence a new token instead of having array and { as two tokens. Mostly depends on whether allowing array {string, int} is desired or not.

I'm in favor of not allowing whitespaces between array and { (by introducing a new array{ token, or by taking profit of the fact that the tokeniser reports whitespace tokens).

Pros:

  • We can report syntax errors in a array{ definition without BC break
  • No silent errors

Cons:

  • We can not write array { (with a whitespace)

Also it should be noted that allowing quoted string (which psalm does not allow) introduces grammar inconsistency (because it creates a place which looks like it expects constants expression but identifiers are not treated as constants) so I'm not sure it's a good idea.

While looking at Psalm I though it was supported, but you are right: it's not. In Pslam, array{'a': int} will expect an array with the key '\'a\'', and not 'a'. I think that there is a risk of generating confusing errors here: expected array{'a': int}, array{a: int} provided.

So I would be in favor of either allowing quoted strings (so that array{'a': int} expects a key 'a'), or to explicitly disallow them (such that array{'a': int} is a syntax error).

@ondrejmirtes
Copy link
Member

Thank you both for improving this!

@arnaud-lb I don't see a testcase for something wrong inside the array{...} so that we know you're not using tryParse....

@arnaud-lb
Copy link
Contributor Author

Added :)

More about keys:

  • If we allow only identifiers, we limit the set of keys we can express with this syntax to the keys that are valid identifiers. Psalm would support our syntax entirely.
  • If we allow identifiers and quoted strings at the same time, there are grammar inconsistencies, as noted by @JanTvrdik.
  • If we allow only quoted strings, the grammar inconsistencies disappears, and we can express all possible keys, however we lose any compatibility with Psalm on array shapes.
  • Psalm allows anything as key, and treats the whole thing as a string: if there are quotes, they are part of the key's value. All characters appear to be allowed. I think that this is dangerous, as this prevents any backwards-compatible evolution of this syntax in the future. This is also confusing when keys appears to be quoted strings, but are not really.

Thoughts ?

@JanTvrdik
Copy link
Collaborator

JanTvrdik commented Jun 4, 2019

If we allow only identifiers, we limit the set of keys we can express with this syntax to the keys that are valid identifiers.

I think this is the best option for now. Edit: identifiers + integers

@muglug
Copy link

muglug commented Jun 4, 2019

Array key escaping is an open issue vimeo/psalm#1518

Psalm should support it, I just haven’t got around to it yet

@ondrejmirtes
Copy link
Member

We should have a test that verifies that quoted string as a key parsing fails :) Otherwise it's ready to merge, right, everyone?

@arnaud-lb
Copy link
Contributor Author

👍 added a test

@ondrejmirtes
Copy link
Member

Thank you very much! I'll start poking it with ConstantArrayType right away :)

@ondrejmirtes ondrejmirtes merged commit 8c4ef2a into phpstan:master Jun 7, 2019
@ondrejmirtes
Copy link
Member

And done: phpstan/phpstan@c4ed3c7 Feel free to review :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants