Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format/Parse binary from/to BigInteger #85392

Conversation

lateapexearlyspeed
Copy link
Contributor

@lateapexearlyspeed lateapexearlyspeed commented Apr 26, 2023

Fix #83619
Hi, currently PR is just in draft stage and in developing, will mark it ready for review when ready, thanks.
Would need final binary format definition firstly.

This is marked as ready for review now, thanks.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Apr 26, 2023
@ghost
Copy link

ghost commented Apr 26, 2023

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

Fix #83619
Hi, currently PR is just in draft stage and in developing, will mark it ready for review when ready, thanks.

Author: lateapexearlyspeed
Assignees: -
Labels:

area-System.Numerics

Milestone: -

}

// each byte is typically eight chars
var sb = new ValueStringBuilder(stackalloc char[Math.Min(charCount, 512)]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

var sb = (charCount <= 512)
   ? new ValueStringBuilder(stackalloc char[charCount])
   : new ValueStringBuilder(charCount);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, didn't know this pattern can mark sb as 'unsafe to return' for stackalloc before, thanks !

@@ -1002,6 +1096,103 @@ internal static char ParseFormatSpecifier(ReadOnlySpan<char> format, out int dig
}
}

private static string? FormatBigIntegerToBin(bool targetSpan, BigInteger value, int digits, Span<char> destination, out int charsWritten, out bool spanSuccess)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike existing FormatBigIntegerToHex(), this FormatBigIntegerToBin() is implemented by calculating required char length before format, this can:

  • avoids from extending ValueStringBuilder's capacity during append char;
  • for targetSpan flow, formatted chars can write to wanted destination span directly rather than allocate buffer in ValueStringBuilder and copy to destination at the end

Please give advice if not proper, thanks !

@@ -440,11 +440,11 @@ private static unsafe bool ParseNumber(ref char* str, char* strEnd, NumberStyles
int digEnd = 0;
while (true)
{
if (char.IsAsciiDigit(ch) || (((options & NumberStyles.AllowHexSpecifier) != 0) && char.IsBetween((char)(ch | 0x20), 'a', 'f')))
if ((options & NumberStyles.AllowBinarySpecifier) == 0 ? char.IsAsciiDigit(ch) || ((options & NumberStyles.AllowHexSpecifier) != 0 && char.IsBetween((char)(ch | 0x20), 'a', 'f')) : char.IsBetween(ch, '0', '1'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would that last bit of the ternary be better as : (char == '0' || char == '1) instead of the char.IsBetween(ch, '0', '1')? Seems we're really driving toward optimal performance :)

Copy link
Contributor Author

@lateapexearlyspeed lateapexearlyspeed Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also struggled on these 2, but selected to use char.IsBetween() based on its implementation because it can always just use one check but need 2 arithmetic operation..

Np, changed to (char == '0' || char == '1) now, thanks.

{
state |= StateDigits;

if (ch != '0' || (state & StateNonZero) != 0 || (bigNumber && ((options & NumberStyles.AllowHexSpecifier) != 0)))
if (ch != '0' || (state & StateNonZero) != 0 || (bigNumber && (options & (NumberStyles.AllowHexSpecifier | NumberStyles.AllowBinarySpecifier)) != 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we hoist this options test and the ones on line 443 to outside the loop so they are only bool flags by the time we're inside the loop?

Perhaps insert around line 440...

const bool allowBinary = (options & NumberStyles.AllowBinarySpecifier) != 0 ;
const bool allowHex = (options & NumberStyles.AllowHexSpecifier) != 0 ;
const bool allowLeadingZero = allowBinary || allowHex;

Then inside the loop, at line 443 looks like

if (allowBinary
    ? (char == '0' || char == '1)
    : char.IsAsciiDigit(ch) || (allowHex && char.IsBetween((char)(ch | 0x20), 'a', 'f')))

and line 447 becomes

if (ch != '0' || (state & StateNonZero) != 0 || (bigNumber && allowLeadingZero))

The same thing could be done for options tested elsewhere inside the loop if performance testing makes that wise...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not sure what's going on with the bigNumber && here. This flag is set to indicate that we're backed by a StringBuilder, not that we're parsing a particular kind of number...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love all the new tests :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we hoist this options test

Got your point. As you found all the existing options tests code were inside whole loop, and during run time some option tests may happen only when data is met, so not sure whether we should hoist 'options' test which will turn to be called once but always happen. That is why I did not change this convention :)

I am also not sure what's going on with the bigNumber && here.

I am also not sure, so that I just added Binary stuff behind it like Hex :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting it apart for readability is generally helpful. The logic is getting quite verbose to have on a single line IMO.

For the primitive types, we have it split apart into some early checks that go to TryParseBinaryInteger*Style: https://source.dot.net/#System.Private.CoreLib/src/libraries/System.Private.CoreLib/src/System/Number.Parsing.cs,437

The check for "IsValidChar" can then be specialized by use of the generics and made a lot cheaper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding Fixed. After introducing Hex/Bin parser types, here simplifies not only prev line 447, but also prev line 443 which involved hex/binary checks previously. Please check, thanks.

{
public static bool IsValidChar(char c) => char.IsAsciiDigit(c);

public static bool IsHexBinary() => false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: IsHexOrBinary

-or- maybe IsHexOrBinaryParser and rename IDigitValidator to IDigitParser, to more closely match the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longer term I'd like to see this move even closer to the IHexOrBinaryParser logic we have in corelib, but that can be done in a future PR and isn't necessary as part of this.

Having the names and general logic mostly start to line up helps with that, however.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

@lateapexearlyspeed
Copy link
Contributor Author

@tannergooding all comments are fixed, please check.

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

It would be good to get a secondary review from someone. CC. @stephentoub since you've helped on similar bits for the corelib side (let me know if you're busy and I can find someone else).

@@ -511,6 +515,112 @@ private static ParsingStatus HexNumberToBigInteger(ref BigNumberBuffer number, o
}
}

private static ParsingStatus BinNumberToBigInteger(ref BigNumberBuffer number, out BigInteger result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we expand "Bin" to "Binary" in the name? I realize this is likely an attempt to match the conciseness of "Hex", but "Bin" and "Big" are so close that this keeps making me do a double-take to know which it was.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed method names, also for other 'Bin' related ones.

// each byte is typically eight chars
sb = charsIncludeDigits > 512
? new ValueStringBuilder(charsIncludeDigits)
: new ValueStringBuilder(stackalloc char[512].Slice(0, charsIncludeDigits));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why slice it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is unnecessary, removed.

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@lateapexearlyspeed
Copy link
Contributor Author

Hi @tannergooding I have also addressed @stephentoub 's comments.

@lateapexearlyspeed
Copy link
Contributor Author

Hi @tannergooding any chance to have a look at status of this approved PR ? thanks !

@tannergooding tannergooding merged commit 71d6476 into dotnet:main Oct 12, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Nov 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Numerics community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[API Proposal]: Support binary format specifier 'b' and 'B' (standard numeric format strings)
4 participants