-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add low-level APIs for converting integers to strings #46281
Conversation
…r bound. For example, 99 would be assume to have decimal length 3 since it has the same bit length as 127.
CC @rust-lang/libs |
src/libcore/num/mod.rs
Outdated
@@ -1315,6 +1420,98 @@ macro_rules! uint_impl { | |||
from_str_radix(src, radix) | |||
} | |||
|
|||
/// Write the representation in a given base in a pre-allocated buffer. | |||
/// | |||
/// Digits are a subset (depending on `radix`) of `0-9A-F`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0-9A-Z
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Fixed.
In general I'm in favor of exposing low-level APIs like that to be able to reuse the optimized libstd implementation as building block for tasks that can't or don't want to use
seems worrying. I can see the desire for the different behavior, but if these APIs are supposed to be the "high level of control" equivalent of I'm also wondering whether it could be worthwhile to wait for const generics for the |
For what it’s worth, I believe that the footgun is that the formatting traits use the two’s complement representation in the first place. I trace the git history of this code and found no evidence that this isn’t accidental (such as discussion of this behavior or why it is desirable). I’d propose changing that behavior, but it’s been I think of of these new APIs less as the low-level equivalent of formatting traits, and more as the inverse of As to making the radix a type parameter instead of a runtime parameter, we already have a |
I fully agree, but if it's here to stay, internal consistency also matters.
I didn't consider this. A valid perspective, but these new APIs are also solving the same problem as
Good point. |
Slipped through assignment... r? @dtolnay |
/// assert_eq!(i16::min_value().to_str(&mut [0; i16::MAX_STR_LEN]), "-32768") | ||
/// ``` | ||
#[unstable(feature = "int_to_str", issue = /* FIXME */ "0")] | ||
pub fn to_str(self, buffer: &mut [u8]) -> &mut str { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not thrilled about how an inherent method to_str
(which is almost never what you want) competes in rustdoc searches and autocomplete with a trait method to_string
(which is almost always what you want). Would it make sense to name this in a scarier way?
It would be neat to explore how I am sympathetic to these being the inverse of println!("{}", <i16>::from_str_radix("-ff", 16).unwrap());
println!("{}", <i16>::from_str_radix("ffff", 16).unwrap_err()); Overall I would prefer to have these bake in a high-quality integer formatting library on crates.io first. Let's run it by the rest of the team in case someone wants to justify why these need to be exposed by std. @rfcbot fcp close |
Team member @dtolnay has proposed to close this. The next step is review by the rest of the tagged teams: No concerns currently listed. Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
I don’t mind renaming
Do we really want something like
Hasn’t that already happened with https://crates.io/crates/itoa ? We could also keep only the |
ping @BurntSushi @Kimundi @aturon @sfackler waiting for your ticky boxes here! |
@dtolnay, FCP to close immediately after stating some concerns feels premature. What do you think of not keeping the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback Simon! I see what you mean. I apologize for not expressing my perspective clearly on the first try. It isn't so much that I had a few concerns -- it is that I have a general sense that it would be more productive to iterate on this feature in a library than adding it here now.
Even restricted to just the base-10 method, it feels like there is enough design space that we shouldn't hope to get it right on the first try. Having it bake in a library is better for everyone involved.
- A library would be immediately available on the stable channel, so we can migrate code that currently uses
itoa
or similar to collect feedback. I would be happy to migrate serde_json as long as the new one is at least as fast. - A library can be versioned by Cargo so we avoid breaking nightly users as we settle on the best approach.
This is similar to the path that failure
is following. For something like this, I believe bringing an established high-quality library to the table can be a faster path to stability in the standard library than iterating on #[unstable] stuff.
As an example of there being more design space than I think we have explored, here is a totally different approach that brings many advantages. Compare:
let mut buf = unsafe { mem::uninitialized::<[u8; i16::MAX_STR_LEN]>() };
let s = i.write_to(&mut buf);
// wrapper around uninitialized buffer of the right size
let mut buf = NumberBuffer::new();
let s = buf.stringify(i);
Advantages
- Easier to use correctly.
- Impossible to use incorrectly.
- No more awkward MAX_STR_LEN constant.
- No bounds check that your &[u8] is big enough.
- No ambiguity about whether the &[u8] needs to be big enough to hold any integer, or just your integer.
- No ambiguity about alignment of the result within the buffer.
- Cannot panic.
- Safe code gets to use an uninitialized buffer.
- Googleable type name.
- Forward compatible with controlling the base in a way that allocates a buffer of exactly the right size for your base using a const fn,
struct NumberBuffer<const RADIX: usize = 10>
. - No conflict with
to_string
autocomplete on the primitives. - The &mut is provided by autoref.
- No ambiguity about whether the method stomps on the front of your buffer as scratch space.
Disadvantages
- No longer exposes the ability to write right-aligned into an existing buffer. Unclear if this is ever valuable in practice.
- Other things, I'm sure.
As maintainer of the |
I don't think I am going to have bandwidth for this in the near term. It would be better if someone else can drive this and changes are not blocked on me, so I think releasing a different library would make more sense. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
☔ The latest upstream changes (presumably #46922) made this pull request unmergeable. Please resolve the merge conflicts. |
I will go ahead and close this. Thanks for the PR and discussion! Again I am not opposed to adding a low-level number to string API in the standard library but I believe it will be more productive to iterate on this in a crate. The |
This adds the following new APIs on primitive integer types, unstable under the
int_to_str
feature gate:They are the “mirror” APIs of
from_str
andfrom_str_radix
.The
&mut [u8] -> &mut str
pattern is the same aschar::encode_utf8
: take a (possibly stack-allocated) buffer, write into it, and return a slice of it. It is maximally flexible in terms of allocation/copying strategy.to_str
is equivalent to the existing impls of thefmt::Display
trait (it uses the same optimized algorithm) but does not have the overhead of creating afmt::Formatter
and otherwise invoking the string formatting machinery. This overhead was deemed high enough to make a crate dedicated to avoiding it: /~https://github.com/dtolnay/itoa/It always requires a buffer of at least
MAX_STR_LEN
bytes, even for values that would fit in a smaller buffer. This is to avoid adding bound checks to the algorithm. Alternatives are to add bound checks, or add them only forto_str
and notDisplay
with more (internal) generics magic.to_lowercase_str_radix
andto_lowercase_str_radix
are similar to the existing impls of thefmt::LowerHex
,UpperHex
,Octal
, andBinary
traits; but they support more bases and behave differently for negative integers: they use a minus sign followed by the representation of the mathematically opposite integer, whereas the formatting traits use the two’s complement. For example:There’s two separate methods for upper v.s. lower case to avoid a
bool
parameter that wouldn’t be self-explanatory at call sites, or a dedicated enum that seems like more new API surface than is warranted. An alternative might be to pick one of upper or lower case and only provide that, but that seems arbitrary since neither is obviously superior IMO.