Skip to content
This repository has been archived by the owner on Oct 10, 2019. It is now read-only.

More precise parseInt and toString #86

Closed
thejoshwolfe opened this issue Sep 21, 2017 · 18 comments
Closed

More precise parseInt and toString #86

thejoshwolfe opened this issue Sep 21, 2017 · 18 comments

Comments

@thejoshwolfe
Copy link
Contributor

BigInt seems to be mimicking the parseInt and toString semantics of Number, but I don't understand why so much implementation-defined behavior is allowed for BigInt. For Number there are practical issues with loss of precision and exponential notation, but these are not concerns for BigInt.

This particular excerpt is concerning (from here relevant to BigInt through here):

However, if R is 10 and Z contains more than 20 significant digits, every significant digit after the 20th may be replaced by a 0 digit, at the option of the implementation; and if R is not 2, 4, 8, 10, 16, or 32, then mathInt may be an implementation-dependent approximation to the mathematical integer value that is represented by Z in radix-R notation.

It seems to defeat the purpose of arbitrary-precision BigInt if implementations can discard precision during parseInt.

And here's a concerning exceprt regarding toString (from here when radix is not 10):

The precise algorithm is implementation-dependent, however the algorithm should be a generalization of that specified in 3.1.4.1.

Why are we not precisely defining an abstract algorithm?

@littledan
Copy link
Member

That's a very good point. There is a related source of ambiguity in https://tc39.github.io/ecma262/#sec-runtime-semantics-mv-s

Otherwise, the rounded value must be the Number value for the MV (in the sense defined in 6.1.6), unless the literal includes a StrUnsignedDecimalLiteral and the literal has more than 20 significant digits, in which case the Number value may be either the Number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit or the Number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit and then incrementing the literal at the 20th digit position.

I think it'd be a good idea to make a uniform decision and fix all of these, for both Number and BigInt. I don't think I'll be able to do so by the September meeting, though.

For BigInt in particular, there should never be such a 20-digit limitation. The exact answer should be given regardless of the number of digits and the radix. I guess that's a separate spec text fix which will be easier than the Number one.

@ajklein
Copy link

ajklein commented Oct 10, 2017

Not sure if this is directly related, but parseInt also has some funny behavior with respect to radix prefixes: it allows "0x" prefixes if radix is either not passed or 16, but disallows it (or rather, returns 0) otherwise. It's not clear to me that this legacy behavior is something we want to bring forward, unless the intention of this particular method is to match the behavior of the legacy parseInt.

@littledan
Copy link
Member

@ajklein Thanks for raising that (rather separate) question. I don't see a big harm coming from that stripPrefix behavior, but maybe it'd make sense to clean it up. @cxielarko , you implemented this functionality and tests for it, do you have an opinion here?

littledan added a commit that referenced this issue Oct 26, 2017
The previous specification for BigInt.parseInt was based on referencing
and making small changes to BigInt.parseInt. This patch inlines the definition
for more clarity, and to remove some unnecessary implementation-defined
parts.

Addresses part of #86
littledan added a commit that referenced this issue Oct 26, 2017
The previous specification for BigInt.parseInt was based on referencing
and making small changes to BigInt.parseInt. This patch inlines the definition
for more clarity, and to remove some unnecessary implementation-defined
parts.

Addresses part of #86
@littledan
Copy link
Member

I pushed a patch to fix @thejoshwolfe 's issue, but the issue from @ajklein might have different points of view, so I'll do that as a pull request to get more feedback.

As long as we're cleaning up parseInt, we could also do a few more things beyond what @ajklein is suggesting:

  • When the radix is invalid, throw a RangeError rather than a SyntaxError. (I believe @cxielarko suggested this in the past, but somehow I didn't understand at the time.)
  • Throw an exception if there is more non-whitespace text after the recognized numeric part, rather than ignoring it.

I'll put together all three of these changes into a PR and make a matching test262 PR.

littledan added a commit that referenced this issue Oct 26, 2017
- Throw a RangeError rather than a SyntaxError when an out-of-bounds
  radix is passed in. (Thanks, @cxielarko)
- Throw a syntax error with "0x" prefixes. (Thanks, @ajklein)
- Throw an exception if there is more non-whitespace text after the
  recognized numeric part, rather than ignoring it.

Addresses #86
littledan added a commit to littledan/test262 that referenced this issue Oct 26, 2017
@littledan
Copy link
Member

We settled on removing the parseInt function at the November 2017 TC39 meeting.

@jakobkummerow
Copy link
Collaborator

What?! How is one supposed to parse BigInts with custom radixes? Or is that simply not a supported use case any more?

@bakkot
Copy link
Contributor

bakkot commented Nov 29, 2017

@jakobkummerow

How is one supposed to parse BigInts with custom radixes?

const alphabet = '0123456789abcdefghijklmnopqrstuvwxyz'.split('');
function parseBigInt(str, radix = 10) {
  if (radix < 2 || radix > alphabet.length || Math.floor(radix) !== radix) {
    throw new RangeError('radix out of range');
  }
  let val = 0n;
  for (const c of ('' + str).split('')) {
    const index = alphabet.indexOf(c);
    if (index < 0 || index >= radix) {
      throw new RangeError('character out of range');
    }
    val = val * radix + BigInt(index);
  }
  return val;
}

(Modulo casing, _ separators, etc)

Something like that can be added to the language later, of course, but I don't think it's that big of a deal to leave it out initially, given that it's doable in userland.

@littledan
Copy link
Member

The idea would be to leave it for user libraries. This proposal already leaves several other things to user libraries, such as bitcasting Numbers to 64-bit ints, writing and reading arbitrarily large BigInts from ArrayBuffers, and bit instructions such as popcount, leftmost/rightmost set/clear bit, etc. All of those could be implemented by user code or be faster as built-ins, but are left out in the interest of minimalism; this proposal could join the club.

@jakobkummerow
Copy link
Collaborator

given that it's doable in userland.

Of course it is; with TypedArrays and | 0 arithmetic you can implement the entire BigInt proposal (except for the overloaded operators) in userland.

In a quick benchmark based on our unit test's inputs, @bakkot 's polyfill (with an added radix = BigInt(radix) to make it work) is 20-30x slower than the native version. I am surprised that the committee is now demanding that we delete that native implementation.

IMHO it's weird to have .toString(radix) but no reverse function.

@thejoshwolfe
Copy link
Contributor Author

thejoshwolfe commented Nov 30, 2017

You can get the native implementation to do the parsing for the 4 common radixes with the BigInt() constructor:

function parseBigInt(str, radix = 10) {
  switch (radix) {
    case 2:  return BigInt("0b" + str);
    case 8:  return BigInt("0o" + str);
    case 10: return BigInt(str);
    case 16: return BigInt("0x" + str);
  }
  // fallback to a userland implementation...
}

This neglects lots of corner cases like empty input and leading whitespace.

Python has a similar asymmetry between parsing and stringification, but amusingly it's the reverse situation. int() provides parsing with any radix, and str(), hex(), oct(), and bin() provide stringification for the 4 common radixes.

@littledan
Copy link
Member

IIRC @ajklein was pushing more for removal for now; maybe you could clarify.

@ajklein
Copy link

ajklein commented Dec 4, 2017

I discussed this offline with @jakobkummerow, filling him in on the discussion around a hypothetical fromString method at TC39. Given the context of that conversation, and his agreement that the use-case for arbitrary parsing-in-radix of BigInts is unclear, my impression was that he didn't strictly object to parseInt's removal. It seems his biggest complaint is the lack of symmetry with toString.

@jakobkummerow
Copy link
Collaborator

Yes, I think there should either be both toString(radix) and fromString(..., radix) or neither of them (by which I mean that toString() should not accept a radix parameter then). I don't feel strongly about how the latter function is called; "fromString" is a nice way to break free from the legacy behavior associated with the "parseInt" name.

(I find none of "it can be polyfilled in userland", "in the interest of minimalism", or "the use-case is unclear" particularly convincing arguments, because they could be used to shoot down the entire proposal.)

@littledan
Copy link
Member

OK, the committee didn't seem necessarily opposed to adding fromString, except that it might be weird to have fromString on BigInt and not Number. Would it be OK to add a matching Number.fromString in this proposal? Cc @ljharb

@ljharb
Copy link
Member

ljharb commented Dec 5, 2017

I think it would be excellent to add both; and would address any consistency argument from only adding one.

@littledan
Copy link
Member

littledan commented Dec 6, 2017

After discussing further with the @jakobkummerow and @ajklein offline, we concluded to stick with the November 2017 committee decision and pursue BigInt.fromString together with Number.fromString as a follow-on proposal, and remove Decimal.parseInt from this proposal.

@ljharb
Copy link
Member

ljharb commented Dec 6, 2017

Decimal, or BigInt?

@littledan
Copy link
Member

BigInt (edited the comment above)--sorry, silly typo. If we have a follow-on Decimal proposal, it will follow the pattern established by this fromString proposal.

littledan added a commit that referenced this issue Dec 6, 2017
After some discussion about various edge cases in BigInt.parseInt,
TC39 decided in the November 2017 meeting to remove this feature
in favor of pursuing a follow-on proposal to add
Number.fromString and BigInt.fromString, as new, cleaner functions.

Closes #86
littledan added a commit that referenced this issue Dec 6, 2017
After some discussion about various edge cases in BigInt.parseInt,
TC39 decided in the November 2017 meeting to remove this feature
in favor of pursuing a follow-on proposal to add
Number.fromString and BigInt.fromString, as new, cleaner functions.

Closes #86
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants