More precise parseInt and toString #86

thejoshwolfe · 2017-09-21T23:25:49Z

BigInt seems to be mimicking the parseInt and toString semantics of Number, but I don't understand why so much implementation-defined behavior is allowed for BigInt. For Number there are practical issues with loss of precision and exponential notation, but these are not concerns for BigInt.

This particular excerpt is concerning (from here relevant to BigInt through here):

However, if R is 10 and Z contains more than 20 significant digits, every significant digit after the 20th may be replaced by a 0 digit, at the option of the implementation; and if R is not 2, 4, 8, 10, 16, or 32, then mathInt may be an implementation-dependent approximation to the mathematical integer value that is represented by Z in radix-R notation.

It seems to defeat the purpose of arbitrary-precision BigInt if implementations can discard precision during parseInt.

And here's a concerning exceprt regarding toString (from here when radix is not 10):

The precise algorithm is implementation-dependent, however the algorithm should be a generalization of that specified in 3.1.4.1.

Why are we not precisely defining an abstract algorithm?

littledan · 2017-09-22T13:57:13Z

That's a very good point. There is a related source of ambiguity in https://tc39.github.io/ecma262/#sec-runtime-semantics-mv-s

Otherwise, the rounded value must be the Number value for the MV (in the sense defined in 6.1.6), unless the literal includes a StrUnsignedDecimalLiteral and the literal has more than 20 significant digits, in which case the Number value may be either the Number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit or the Number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit and then incrementing the literal at the 20th digit position.

I think it'd be a good idea to make a uniform decision and fix all of these, for both Number and BigInt. I don't think I'll be able to do so by the September meeting, though.

For BigInt in particular, there should never be such a 20-digit limitation. The exact answer should be given regardless of the number of digits and the radix. I guess that's a separate spec text fix which will be easier than the Number one.

ajklein · 2017-10-10T20:55:56Z

Not sure if this is directly related, but parseInt also has some funny behavior with respect to radix prefixes: it allows "0x" prefixes if radix is either not passed or 16, but disallows it (or rather, returns 0) otherwise. It's not clear to me that this legacy behavior is something we want to bring forward, unless the intention of this particular method is to match the behavior of the legacy parseInt.

littledan · 2017-10-26T14:12:53Z

@ajklein Thanks for raising that (rather separate) question. I don't see a big harm coming from that stripPrefix behavior, but maybe it'd make sense to clean it up. @cxielarko , you implemented this functionality and tests for it, do you have an opinion here?

The previous specification for BigInt.parseInt was based on referencing and making small changes to BigInt.parseInt. This patch inlines the definition for more clarity, and to remove some unnecessary implementation-defined parts. Addresses part of #86

littledan · 2017-10-26T15:53:03Z

I pushed a patch to fix @thejoshwolfe 's issue, but the issue from @ajklein might have different points of view, so I'll do that as a pull request to get more feedback.

As long as we're cleaning up parseInt, we could also do a few more things beyond what @ajklein is suggesting:

When the radix is invalid, throw a RangeError rather than a SyntaxError. (I believe @cxielarko suggested this in the past, but somehow I didn't understand at the time.)
Throw an exception if there is more non-whitespace text after the recognized numeric part, rather than ignoring it.

I'll put together all three of these changes into a PR and make a matching test262 PR.

@cxielarko

- Throw a RangeError rather than a SyntaxError when an out-of-bounds radix is passed in. (Thanks, @cxielarko) - Throw a syntax error with "0x" prefixes. (Thanks, @ajklein) - Throw an exception if there is more non-whitespace text after the recognized numeric part, rather than ignoring it. Addresses #86

For proposed changes from tc39/proposal-bigint#86 in the PR at tc39/proposal-bigint#97

littledan · 2017-11-29T02:13:46Z

We settled on removing the parseInt function at the November 2017 TC39 meeting.

jakobkummerow · 2017-11-29T03:37:34Z

What?! How is one supposed to parse BigInts with custom radixes? Or is that simply not a supported use case any more?

bakkot · 2017-11-29T05:29:26Z

@jakobkummerow

How is one supposed to parse BigInts with custom radixes?

const alphabet = '0123456789abcdefghijklmnopqrstuvwxyz'.split('');
function parseBigInt(str, radix = 10) {
  if (radix < 2 || radix > alphabet.length || Math.floor(radix) !== radix) {
    throw new RangeError('radix out of range');
  }
  let val = 0n;
  for (const c of ('' + str).split('')) {
    const index = alphabet.indexOf(c);
    if (index < 0 || index >= radix) {
      throw new RangeError('character out of range');
    }
    val = val * radix + BigInt(index);
  }
  return val;
}

(Modulo casing, _ separators, etc)

Something like that can be added to the language later, of course, but I don't think it's that big of a deal to leave it out initially, given that it's doable in userland.

littledan · 2017-11-29T05:42:39Z

The idea would be to leave it for user libraries. This proposal already leaves several other things to user libraries, such as bitcasting Numbers to 64-bit ints, writing and reading arbitrarily large BigInts from ArrayBuffers, and bit instructions such as popcount, leftmost/rightmost set/clear bit, etc. All of those could be implemented by user code or be faster as built-ins, but are left out in the interest of minimalism; this proposal could join the club.

jakobkummerow · 2017-11-30T20:09:59Z

given that it's doable in userland.

Of course it is; with TypedArrays and | 0 arithmetic you can implement the entire BigInt proposal (except for the overloaded operators) in userland.

In a quick benchmark based on our unit test's inputs, @bakkot 's polyfill (with an added radix = BigInt(radix) to make it work) is 20-30x slower than the native version. I am surprised that the committee is now demanding that we delete that native implementation.

IMHO it's weird to have .toString(radix) but no reverse function.

thejoshwolfe · 2017-11-30T20:52:29Z

You can get the native implementation to do the parsing for the 4 common radixes with the BigInt() constructor:

function parseBigInt(str, radix = 10) {
  switch (radix) {
    case 2:  return BigInt("0b" + str);
    case 8:  return BigInt("0o" + str);
    case 10: return BigInt(str);
    case 16: return BigInt("0x" + str);
  }
  // fallback to a userland implementation...
}

This neglects lots of corner cases like empty input and leading whitespace.

Python has a similar asymmetry between parsing and stringification, but amusingly it's the reverse situation. int() provides parsing with any radix, and str(), hex(), oct(), and bin() provide stringification for the 4 common radixes.

littledan · 2017-12-02T15:51:36Z

IIRC @ajklein was pushing more for removal for now; maybe you could clarify.

ajklein · 2017-12-04T16:03:54Z

I discussed this offline with @jakobkummerow, filling him in on the discussion around a hypothetical fromString method at TC39. Given the context of that conversation, and his agreement that the use-case for arbitrary parsing-in-radix of BigInts is unclear, my impression was that he didn't strictly object to parseInt's removal. It seems his biggest complaint is the lack of symmetry with toString.

jakobkummerow · 2017-12-04T22:18:28Z

Yes, I think there should either be both toString(radix) and fromString(..., radix) or neither of them (by which I mean that toString() should not accept a radix parameter then). I don't feel strongly about how the latter function is called; "fromString" is a nice way to break free from the legacy behavior associated with the "parseInt" name.

(I find none of "it can be polyfilled in userland", "in the interest of minimalism", or "the use-case is unclear" particularly convincing arguments, because they could be used to shoot down the entire proposal.)

littledan · 2017-12-05T04:09:20Z

OK, the committee didn't seem necessarily opposed to adding fromString, except that it might be weird to have fromString on BigInt and not Number. Would it be OK to add a matching Number.fromString in this proposal? Cc @ljharb

ljharb · 2017-12-05T05:57:31Z

I think it would be excellent to add both; and would address any consistency argument from only adding one.

littledan · 2017-12-06T18:20:18Z

After discussing further with the @jakobkummerow and @ajklein offline, we concluded to stick with the November 2017 committee decision and pursue BigInt.fromString together with Number.fromString as a follow-on proposal, and remove Decimal.parseInt from this proposal.

ljharb · 2017-12-06T18:23:05Z

Decimal, or BigInt?

littledan · 2017-12-06T18:30:23Z

BigInt (edited the comment above)--sorry, silly typo. If we have a follow-on Decimal proposal, it will follow the pattern established by this fromString proposal.

After some discussion about various edge cases in BigInt.parseInt, TC39 decided in the November 2017 meeting to remove this feature in favor of pursuing a follow-on proposal to add Number.fromString and BigInt.fromString, as new, cleaner functions. Closes #86

littledan mentioned this issue Oct 26, 2017

Normative: Clean up BigInt.parseInt semantics #97

Closed

littledan added a commit to littledan/test262 that referenced this issue Oct 26, 2017

BigInt: Test semantics changes for parseInt

1154abc

For proposed changes from tc39/proposal-bigint#86 in the PR at tc39/proposal-bigint#97

littledan mentioned this issue Oct 26, 2017

BigInt: Test semantics changes for parseInt tc39/test262#1321

Closed

littledan mentioned this issue Dec 6, 2017

Normative: Remove BigInt.parseInt #101

Merged

littledan closed this as completed in #101 Dec 6, 2017

bakkot mentioned this issue Jun 6, 2018

Parsing non-base10 strings, like parseBigInt(string, radix)? #155

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More precise parseInt and toString #86

More precise parseInt and toString #86

thejoshwolfe commented Sep 21, 2017

littledan commented Sep 22, 2017

ajklein commented Oct 10, 2017

littledan commented Oct 26, 2017

littledan commented Oct 26, 2017

littledan commented Nov 29, 2017

jakobkummerow commented Nov 29, 2017

bakkot commented Nov 29, 2017

littledan commented Nov 29, 2017

jakobkummerow commented Nov 30, 2017

thejoshwolfe commented Nov 30, 2017 •

edited

Loading

littledan commented Dec 2, 2017

ajklein commented Dec 4, 2017

jakobkummerow commented Dec 4, 2017

littledan commented Dec 5, 2017

ljharb commented Dec 5, 2017

littledan commented Dec 6, 2017 •

edited

Loading

ljharb commented Dec 6, 2017

littledan commented Dec 6, 2017

More precise parseInt and toString #86

More precise parseInt and toString #86

Comments

thejoshwolfe commented Sep 21, 2017

littledan commented Sep 22, 2017

ajklein commented Oct 10, 2017

littledan commented Oct 26, 2017

littledan commented Oct 26, 2017

littledan commented Nov 29, 2017

jakobkummerow commented Nov 29, 2017

bakkot commented Nov 29, 2017

littledan commented Nov 29, 2017

jakobkummerow commented Nov 30, 2017

thejoshwolfe commented Nov 30, 2017 • edited Loading

littledan commented Dec 2, 2017

ajklein commented Dec 4, 2017

jakobkummerow commented Dec 4, 2017

littledan commented Dec 5, 2017

ljharb commented Dec 5, 2017

littledan commented Dec 6, 2017 • edited Loading

ljharb commented Dec 6, 2017

littledan commented Dec 6, 2017

thejoshwolfe commented Nov 30, 2017 •

edited

Loading

littledan commented Dec 6, 2017 •

edited

Loading