Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numeric separators #20324

Merged
merged 11 commits into from
Dec 9, 2017
4 changes: 4 additions & 0 deletions src/compiler/diagnosticMessages.json
Original file line number Diff line number Diff line change
Expand Up @@ -3407,6 +3407,10 @@
"category": "Message",
"code": 6187
},
"Numeric seperators are not allowed at the end of a literal.": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separators

"category": "Error",
"code": 6188
},
"Variable '{0}' implicitly has an '{1}' type.": {
"category": "Error",
"code": 7005
Expand Down
91 changes: 84 additions & 7 deletions src/compiler/scanner.ts
Original file line number Diff line number Diff line change
Expand Up @@ -858,28 +858,76 @@ namespace ts {
}
}

function scanNumberFragment(): string {
let start = pos;
let allowSeperator = false;
let result: string;
while (true) {
const ch = text.charCodeAt(pos);
if (allowSeperator && ch === CharacterCodes._) {
allowSeperator = false;
tokenFlags |= TokenFlags.ContainsSeparator;
result = (result || "") + text.substring(start, pos);
pos++;
start = pos;
continue;
}
if (isDigit(ch)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the similarities between this, scanHexDigit, and scanBinaryOrOctalDigits perhaps we should unify (at least in part) these three functions by having scanNumberFragment take a callback for isDigit and then calling it as scanNumberFragment(isDigit), scanNumberFragment(isHexDigit), scanNumberFragment(isBinaryDigit), and scanNumberFragment(isOctalDigit).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scanBinaryOrOctalDigits and scanHexDigit also do arithmetic to calculate the actual numeric value, however (rather than relying on the host being able to parse the literal correctly). I would also need to pass thru another callback to retain that behavior, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scanHexDigit also has to deal with validating that there are a certain number of digits and have an option to not recognize separators, for when its used while handling escape sequences. IMO, while the general structure is similar, they're distinct enough to not warrant a shared base.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you can pass in a base of type 2 | 8 | 10 | 16 like in scanBinaryOrOctalDigits and special case hex characters for base 16?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth keeping focus on performance when parsing numbers, even at a cost of flexibility. Too hot a code path.

allowSeperator = true;
pos++;
continue;
}
break;
}
if (text.charCodeAt(pos - 1) === CharacterCodes._) {
pos--;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this exact set of three lines multiple times in this PR. Would it make sense to provide an optional argument to error to override the position it uses?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add a look-ahead and check whether the subsequent character could continue the literal? That way you don't need to rewind at the end.

error(Diagnostics.Numeric_seperators_are_not_allowed_at_the_end_of_a_literal, 1);
pos++;
}
return result = (result || "") + text.substring(start, pos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You definitely assign result to at least "", so why not just initialize result above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you don't need to reassign result here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You definitely assign result to at least "", so why not just initialize result above?

result is only assigned in the presence of a separator.

Also, you don't need to reassign result here.

Done.

}

function scanNumber(): string {
const start = pos;
while (isDigit(text.charCodeAt(pos))) pos++;
const mainFragment = scanNumberFragment();
let decimalFragment: string;
let scientificFragment: string;
if (text.charCodeAt(pos) === CharacterCodes.dot) {
pos++;
while (isDigit(text.charCodeAt(pos))) pos++;
decimalFragment = scanNumberFragment();
}
let end = pos;
if (text.charCodeAt(pos) === CharacterCodes.E || text.charCodeAt(pos) === CharacterCodes.e) {
pos++;
tokenFlags |= TokenFlags.Scientific;
if (text.charCodeAt(pos) === CharacterCodes.plus || text.charCodeAt(pos) === CharacterCodes.minus) pos++;
if (isDigit(text.charCodeAt(pos))) {
pos++;
while (isDigit(text.charCodeAt(pos))) pos++;
const preNumericPart = pos;
const finalFragment = scanNumberFragment();
if (!finalFragment) {
error(Diagnostics.Digit_expected);
}
else {
scientificFragment = text.substring(end, preNumericPart) + finalFragment;
end = pos;
}
}
if (tokenFlags & TokenFlags.ContainsSeparator) {
if (decimalFragment && scientificFragment) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about just

let result = mainFragment;
if (decimalFragment) {
    result += "." + decimalFragment;
}
if (scientificFragment) {
    result += "." + scientificFragment;
}
return "" + +mainFragment;

return "" + +(mainFragment + "." + decimalFragment + scientificFragment);
}
else if (decimalFragment) {
return "" + +(mainFragment + "." + decimalFragment);
}
else if (scientificFragment) {
return "" + +(mainFragment + scientificFragment);
}
else {
error(Diagnostics.Digit_expected);
return "" + +mainFragment;
}
}
return "" + +(text.substring(start, end));
else {
return "" + +(text.substring(start, end)); // No need to use all the fragments; no _ removal needed
}
}

function scanOctalDigits(): number {
Expand Down Expand Up @@ -909,8 +957,16 @@ namespace ts {
function scanHexDigits(minCount: number, scanAsManyAsPossible: boolean): number {
let digits = 0;
let value = 0;
let seperatorAllowed = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separatorAllowed

while (digits < minCount || scanAsManyAsPossible) {
const ch = text.charCodeAt(pos);
if (seperatorAllowed && ch === CharacterCodes._) {
seperatorAllowed = false;
tokenFlags |= TokenFlags.ContainsSeparator;
pos++;
continue;
}
seperatorAllowed = true;
if (ch >= CharacterCodes._0 && ch <= CharacterCodes._9) {
value = value * 16 + ch - CharacterCodes._0;
}
Expand All @@ -929,6 +985,11 @@ namespace ts {
if (digits < minCount) {
value = -1;
}
if (text.charCodeAt(pos - 1) === CharacterCodes._) {
pos--;
error(Diagnostics.Numeric_seperators_are_not_allowed_at_the_end_of_a_literal, 1);
pos++;
}
return value;
}

Expand Down Expand Up @@ -1218,8 +1279,17 @@ namespace ts {
// For counting number of digits; Valid binaryIntegerLiteral must have at least one binary digit following B or b.
// Similarly valid octalIntegerLiteral must have at least one octal digit following o or O.
let numberOfDigits = 0;
let seperatorAllowed = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separator

while (true) {
const ch = text.charCodeAt(pos);
// Numeric seperators are allowed anywhere within a numeric literal, except not at the beginning, or following another seperator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separator

if (seperatorAllowed && ch === CharacterCodes._) {
seperatorAllowed = false;
tokenFlags |= TokenFlags.ContainsSeparator;
pos++;
continue;
}
seperatorAllowed = true;
const valueOfCh = ch - CharacterCodes._0;
if (!isDigit(ch) || valueOfCh >= base) {
break;
Expand All @@ -1232,6 +1302,13 @@ namespace ts {
if (numberOfDigits === 0) {
return -1;
}
if (text.charCodeAt(pos - 1) === CharacterCodes._) {
// Literal ends with underscore - not allowed
pos--;
error(Diagnostics.Numeric_seperators_are_not_allowed_at_the_end_of_a_literal, 1);
pos++; // Consume character anyway to reduce followon errors from a single erroneous trailing `_`, like `Cannot find name '_'`
return value;
}
return value;
}

Expand Down
3 changes: 2 additions & 1 deletion src/compiler/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1490,8 +1490,9 @@ namespace ts {
HexSpecifier = 1 << 6, // e.g. `0x00000000`
BinarySpecifier = 1 << 7, // e.g. `0b0110010000000000`
OctalSpecifier = 1 << 8, // e.g. `0o777`
ContainsSeparator = 1 << 9, // e.g. `0b1100_0101`
BinaryOrOctalSpecifier = BinarySpecifier | OctalSpecifier,
NumericLiteralFlags = Scientific | Octal | HexSpecifier | BinarySpecifier | OctalSpecifier
NumericLiteralFlags = Scientific | Octal | HexSpecifier | BinarySpecifier | OctalSpecifier | ContainsSeparator
}

export interface NumericLiteral extends LiteralExpression {
Expand Down
2 changes: 1 addition & 1 deletion src/compiler/utilities.ts
Original file line number Diff line number Diff line change
Expand Up @@ -345,7 +345,7 @@ namespace ts {
export function getLiteralText(node: LiteralLikeNode, sourceFile: SourceFile) {
// If we don't need to downlevel and we can reach the original source text using
// the node's parent reference, then simply get the text as it was originally written.
if (!nodeIsSynthesized(node) && node.parent) {
if (!nodeIsSynthesized(node) && node.parent && !(isNumericLiteral(node) && node.numericLiteralFlags & TokenFlags.ContainsSeparator)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should emit them as is for ESNext i would say..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acctually ES2018 and later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add an es2018 target now or wait until the ES2018 spec is official?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhegazy said offline that we should open an issue to add an ES2018 target, since there are some lib additions to make in addition to adding an output mode that preserves these separators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #20342 to track.

return getSourceTextOfNodeFromSourceFile(sourceFile, node);
}

Expand Down
12 changes: 12 additions & 0 deletions tests/baselines/reference/parser.numericSeparators.binary.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
//// [parser.numericSeparators.binary.ts]
0b00_11;
0B0_1;
0b1100_0011;
0B0_11_0101;


//// [parser.numericSeparators.binary.js]
3;
1;
195;
53;
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
=== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/parser.numericSeparators.binary.ts ===
0b00_11;
No type information for this code.0B0_1;
No type information for this code.0b1100_0011;
No type information for this code.0B0_11_0101;
No type information for this code.
No type information for this code.
13 changes: 13 additions & 0 deletions tests/baselines/reference/parser.numericSeparators.binary.types
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
=== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/parser.numericSeparators.binary.ts ===
0b00_11;
>0b00_11 : 3

0B0_1;
>0B0_1 : 1

0b1100_0011;
>0b1100_0011 : 195

0B0_11_0101;
>0B0_11_0101 : 53

Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
tests/cases/conformance/parser/ecmascriptnext/numericSeparators/1.ts(1,5): error TS6188: Numeric seperators are not allowed at the end of a literal.
tests/cases/conformance/parser/ecmascriptnext/numericSeparators/2.ts(1,3): error TS1177: Binary digit expected.
tests/cases/conformance/parser/ecmascriptnext/numericSeparators/2.ts(1,3): error TS2304: Cannot find name '_110'.
tests/cases/conformance/parser/ecmascriptnext/numericSeparators/3.ts(1,2): error TS6188: Numeric seperators are not allowed at the end of a literal.
tests/cases/conformance/parser/ecmascriptnext/numericSeparators/3.ts(1,3): error TS1005: ';' expected.
tests/cases/conformance/parser/ecmascriptnext/numericSeparators/3.ts(1,3): error TS2304: Cannot find name 'B0101'.
tests/cases/conformance/parser/ecmascriptnext/numericSeparators/4.ts(1,5): error TS6188: Numeric seperators are not allowed at the end of a literal.
tests/cases/conformance/parser/ecmascriptnext/numericSeparators/4.ts(1,6): error TS1005: ';' expected.
tests/cases/conformance/parser/ecmascriptnext/numericSeparators/4.ts(1,6): error TS2304: Cannot find name '_11'.
tests/cases/conformance/parser/ecmascriptnext/numericSeparators/5.ts(1,12): error TS6188: Numeric seperators are not allowed at the end of a literal.
tests/cases/conformance/parser/ecmascriptnext/numericSeparators/5.ts(1,13): error TS1005: ';' expected.
tests/cases/conformance/parser/ecmascriptnext/numericSeparators/5.ts(1,13): error TS2304: Cannot find name '_'.
tests/cases/conformance/parser/ecmascriptnext/numericSeparators/6.ts(1,3): error TS1177: Binary digit expected.
tests/cases/conformance/parser/ecmascriptnext/numericSeparators/6.ts(1,3): error TS2304: Cannot find name '___0111010_0101_1'.


==== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/1.ts (1 errors) ====
0b00_
~
!!! error TS6188: Numeric seperators are not allowed at the end of a literal.

==== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/2.ts (2 errors) ====
0b_110

!!! error TS1177: Binary digit expected.
~~~~
!!! error TS2304: Cannot find name '_110'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider just parsing out the rest of the number rather than just stop? Then we'd avoid the excess Cannot find name errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed grammar says the production is effectively Digit Sep Digit, so we'd be much more lenient than is needed. Which seems fine; I don't think 10__01 can be treated as anything other than a number with too many separators.


==== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/3.ts (3 errors) ====
0_B0101
~
!!! error TS6188: Numeric seperators are not allowed at the end of a literal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message isn't clear considering the context. Perhaps it should be Numeric separators are not allowed here.

~~~~~
!!! error TS1005: ';' expected.
~~~~~
!!! error TS2304: Cannot find name 'B0101'.

==== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/4.ts (3 errors) ====
0b01__11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, after we report the numeric separator error, we may want to continue to parse out the rest of the number to avoid excessive unrelated errors, though I'd like to get @mhegazy's or @DanielRosenwasser's opinion on that.

~
!!! error TS6188: Numeric seperators are not allowed at the end of a literal.
~~~
!!! error TS1005: ';' expected.
~~~
!!! error TS2304: Cannot find name '_11'.

==== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/5.ts (3 errors) ====
0B0110_0110__
~
!!! error TS6188: Numeric seperators are not allowed at the end of a literal.
~
!!! error TS1005: ';' expected.
~
!!! error TS2304: Cannot find name '_'.

==== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/6.ts (2 errors) ====
0b___0111010_0101_1

!!! error TS1177: Binary digit expected.
~~~~~~~~~~~~~~~~~
!!! error TS2304: Cannot find name '___0111010_0101_1'.

Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
//// [tests/cases/conformance/parser/ecmascriptnext/numericSeparators/parser.numericSeparators.binaryNegative.ts] ////

//// [1.ts]
0b00_

//// [2.ts]
0b_110

//// [3.ts]
0_B0101

//// [4.ts]
0b01__11

//// [5.ts]
0B0110_0110__

//// [6.ts]
0b___0111010_0101_1


//// [1.js]
0;
//// [2.js]
0;
_110;
//// [3.js]
0;
B0101;
//// [4.js]
1;
_11;
//// [5.js]
102;
_;
//// [6.js]
0;
___0111010_0101_1;
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
=== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/1.ts ===
0b00_
No type information for this code.
No type information for this code.=== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/2.ts ===
0b_110
No type information for this code.
No type information for this code.=== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/3.ts ===
0_B0101
No type information for this code.
No type information for this code.=== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/4.ts ===
0b01__11
No type information for this code.
No type information for this code.=== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/5.ts ===
0B0110_0110__
No type information for this code.
No type information for this code.=== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/6.ts ===
0b___0111010_0101_1
No type information for this code.
No type information for this code.
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
=== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/1.ts ===
0b00_
>0b00_ : 0

=== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/2.ts ===
0b_110
>0b : 0
>_110 : any

=== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/3.ts ===
0_B0101
>0_ : 0
>B0101 : any

=== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/4.ts ===
0b01__11
>0b01_ : 1
>_11 : any

=== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/5.ts ===
0B0110_0110__
>0B0110_0110_ : 102
>_ : any

=== tests/cases/conformance/parser/ecmascriptnext/numericSeparators/6.ts ===
0b___0111010_0101_1
>0b : 0
>___0111010_0101_1 : any

32 changes: 32 additions & 0 deletions tests/baselines/reference/parser.numericSeparators.decimal.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
//// [parser.numericSeparators.decimal.ts]
1_000_000_000
1.1_00_01
1e1_0
1e+1_0
1e-1_0
1.1e10_0
1.1e+10_0
1.1e-10_0
12_34_56
1_22_333
1_2.3_4
1_2.3_4e5_6
1_2.3_4e+5_6
1_2.3_4e-5_6


//// [parser.numericSeparators.decimal.js]
1000000000;
1.10001;
10000000000;
10000000000;
1e-10;
1.1e+100;
1.1e+100;
1.1e-100;
123456;
122333;
12.34;
1.234e+57;
1.234e+57;
1.234e-55;
Loading