Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hexadecimal integers with fmt::Debug, including within larger types #2226

Merged
merged 4 commits into from
Feb 27, 2018

Conversation

SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Nov 24, 2017

Add support for formatting integers as hexadecimal with the fmt::Debug trait,
including when they occur within larger types.

println!("{:02X?}", b"AZaz\0")
[41, 5A, 61, 7A, 00]

Rendered

```rust
println!("{:02X?}", b"AZaz\0")
```
```
[41, 5A, 61, 7A, 00]
```
@fstirlitz
Copy link

It seems like the output of current Debug implementations is (usually) valid Rust syntax that can be copy-pasted into Rust code as-is (the problem of fully qualifying paths aside). I think it's a useful property worth preserving; omitting the 0x prefix as proposed here breaks it.

Re 'Unresolved questions': given that you're asking about other possible formatting styles, adding an 'ASCII literal' style for u8 and [u8] might be considered as well, which would basically render as the output of std::ascii::escape_default wrapped in Rust literal syntax. It'd be very useful when dealing with ASCII-compatible bytestrings (compare Python 3 repr() on the bytes type). In one of my side projects, I have a wrapper type defined specifically for that purpose. I'm not sure how to fit that into the proposal as written, though.

@SimonSapin
Copy link
Contributor Author

Debug outputting valid Rust syntax is far from universal, though. And {:?} is still there unchanged if you want that. I went with not including 0x because it becomes redundant quickly in for example a slice that contains many integers, but I’m not supper attached to that point.

That unresolved question is specifically about other existing formatting traits like std::fmt::Pointer, not every single kind of formatting one might think of. ASCII-compatible formatting would be useful, but it’s out of scope for this RFC and should be proposed separately.

@eddyb
Copy link
Member

eddyb commented Nov 25, 2017

FWIW #x prints the 0x, so we could have support for it.

@SimonSapin
Copy link
Contributor Author

The problem here is that # has different meanings depending on which formatting trait is used: add a 0x or 0o or 0b prefix for UpperHex/LowerHex/Octal/Binary, and for Debug it causes "pretty" formatting with newlines and indentation.

So we need to decide whether # in {:#X?} means either or both.

@ExpHP
Copy link

ExpHP commented Nov 26, 2017

What makes hexadecimal so special? I've wanted this kind of argument forwarding for floats since forever, considering that the default format is prone to generate things like 0.0000000000000000000000000000000000011102230246251565

@SimonSapin
Copy link
Contributor Author

Hexadecimal is special because one octet (which is a fairly fundamental unit) is exactly 2 hex digits, so it is a compact way to visualize bit-packed data. Before this RFC, hexadecimal and "Default over Display" are mutually exclusive.

I’m not sure what’s your point about floats. If you want to use fixed-precision formatting for floats within larger structures, the formatting string syntax already allows you to specify both a precision and ? for Debug. Forwarding the precision and other flags in slices and derived impls is fixed by rust-lang/rust#46233, independently of this RFC.

@ExpHP
Copy link

ExpHP commented Nov 26, 2017

Does #46233 enable the use of {Upper,Lower}Exp?

@SimonSapin
Copy link
Contributor Author

Ah ok, sorry I misunderstood your previous comment. I think that fits in the “Other formatting types/traits?” unresolved question. I don’t have an opinion on it.

@ExpHP
Copy link

ExpHP commented Nov 27, 2017

Okay. Even if this does not include such capability, I think my primary concern more than anything else is to make sure that this does not preclude future extensions such as (strawman syntax) {:10.4e?}. On the surface it appears that adding e | E to radix in the grammar is possible, though exposing it through the Formatter API would require another function.

Personally, I'm still holding out for some kind of "general number formatting" option like %g in printf (which switches between exponential and plain formatting based on the value), and I think it'd be unfortunate for the Formatter API to end up with a whole bunch of entirely unorthogonal, hacky functions just because these features were all added piecemeal:

    // Added in e.g. Rust 1.25
    fn number_radix(&self) -> u32;
    fn number_uppercase(&self) -> bool;

    // Added in e.g. Rust 1.27
    // true for {:e?} and {:E?}
    fn number_exponential(&self) -> bool;

    // Added in e.g. Rust 1.33
    // true for {:g?} and {:G?}
    // never simultaneously true with number_exponential()
    fn number_general(&self) -> bool;

@SimonSapin
Copy link
Contributor Author

Right, the Formatter API is the part I’m least certain about. Suggestions welcome.

@aturon
Copy link
Member

aturon commented Nov 28, 2017

cc @rust-lang/libs, highfive failed to assign a reviewer here.

@ia0
Copy link

ia0 commented Nov 29, 2017

If it can help, one possible alternative would be to have both the decimal and hexadecimal representations. The idea is that Debug is meant for debugging, so we don't really know in advance what we want to see. If there is a debugging level, it could be used to control the amount of displayed information.

println!("{:?}", b"AZaz\0");
println!("{:#?}", b"AZaz\0");
[65 /* 0x41 */, 90 /* 0x5a */, 97 /* 0x61 */, 122 /* 0x7a */, 0 /* 0x0 */]
[
    65 /* 0x41 */,
    90 /* 0x5a */,
    97 /* 0x61 */,
    122 /* 0x7a */,
    0 /* 0x00 */
]

The size used for the hexadecimal comments could be inferred from the type (u8 -> 2, u16 -> 4, etc.). It could be minimal without # (like {:?} -> 0x0) and full with # (like {:#?} -> 0x0000 for u16).

Advantages:

  • The API stays the same.
  • The output is still a valid Rust value.
  • If a nested structure would require both decimal and hexadecimal representation, we don't need to commit for only one of them. For example a struct containing 2 integers: one which should be displayed as decimal (because it's a length) and one which should be displayed as hexadecimal (because it's a set of flags).

Disadvantages:

  • We change the current Debug output (this could be controlled with some debugging level maybe if there is one or some environnement variable), although the Rust value stays the same (only comments are added).
  • The output may be too verbose if we don't have a way to control the verbosity from outside.

I have no opinion about which way to go, but just felt that such an alternative could be considered before committing to a decision (in particular if we have a way to control the debug verbosity).

@SimonSapin
Copy link
Contributor Author

[65 /* 0x41 */, 90 /* 0x5a */, 97 /* 0x61 */, 122 /* 0x7a */, 0 /* 0x0 */]

This example is a very small slice. This formatting would get overly verbose very quickly.

this could be controlled with some debugging level maybe if there is one or some environnement variable

Such control belongs inside the formatting string. It should be written once by the person that wrote for example the unit test, not up to every user not to forget to set some boilerplate environment variable.

@ia0
Copy link

ia0 commented Nov 30, 2017

Yes, ideally the debug verbosity level would be in the format string, although that would change the format string API. If it's fine to go that way, then the remaining differences would be:

  • Pro: The output is still a valid Rust value (this disappears if we use 0x).
  • Pro: Heterogeneous structures can be printed correctly (see below).
  • Con: If we know we only need hexadecimal, the output is more verbose.
#[derive(Debug)]
struct Foo {
    len: u32,
    flags: u32,
}

println!("{:x?}", Foo { len: 100, flags: 0x20 | 0x4 });    
println!("{:#x?}", Foo { len: 100, flags: 0x20 | 0x4 });    

would give

Foo { len: 100 /* 0x64 */, flags: 36 /* 0x24 */ }
Foo {
    len: 100 /* 0x00000064 */,
    flags: 36 /* 0x00000024 */
}

instead of

Foo { len: 0x64, flags: 0x24 }
Foo {
    len: 0x64,
    flags: 0x24
}

I guess it depends on the use-case. For slices, it seems better to choose only one of decimal or hexadecimal since the values are homogeneous. But for more complex values, enforcing the user to choose one seems too constraining.

@SimonSapin
Copy link
Contributor Author

ideally the debug verbosity level would be in the format string, although that would change the format string API

This RFC is proposing to extend the formatting string syntax and Formatter API.

And to be honest, using comment syntax in formatting output seems really unusual and not appropriate for x in formatting strings which has prior meaning of "hexadecimal" and nothing more. You may be interested in proposing new syntax for dual-formatting, but that seems out of scope for RFC.

@ia0
Copy link

ia0 commented Nov 30, 2017

I am not sure this is unusual when you are debugging. For example some debuggers comment the assembly with the decimal, hexadecimal, character, or string value based on heuristics:
https://monosource.gitbooks.io/radare2-explorations/content/tut2/tut2_strcpy.png
https://www.hex-rays.com/products/ida/debugger/win_ce/14_failed_bpt.gif
Here, when I saw that the wish is to use Debug to print hexadecimal when it makes sense, I thought about this existing usage in debugging environments.

Is the issue with heterogeneous values (see Foo above) a concern? Because it looks like the current proposal is tailored for slice-like (homogeneous) values. Maybe the current answer for heterogeneous values is simply to implement Debug instead of deriving it.

@Centril Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Dec 6, 2017
@aturon
Copy link
Member

aturon commented Jan 11, 2018

cc @sfackler @alexcrichton, can one of you take a look at this RFC?

@alexcrichton
Copy link
Member

This seems like a pretty slick RFC, thanks again for writing it up @SimonSapin!

I'm mostly eyeing this from the perspective of "what do we need to stabilize to get the maximal benefit from this". It'd all for sure start out as unstable, but one part I like about this RFC is that there's a very tiny sliver I think we'll need to stabilize to actually get benefits, namely the new syntax in format strings.

I'm not so sure about the Formatter APIs as well but I think they don't need to be stable to actually benefit from this. Almost everything uses #[derive(Debug)] which would "do the right thing" by default without any change from today's code (thanks to rust-lang/rust#46233 I believe).

Along those lines I sort of mostly see this RFC as "let's add some more syntax and then here's some details of how to implement it in libstd", where for now the biggest thing to debate I think is the syntax and the mechanics that go along with it.

I personally really like that you don't need to add to #[derive] annotations to get the benefit of this RFC, everything that uses #[derive(Debug)] automatically inherits the functionality. In that sense the syntax here feels "natural ish" to me as well (the format string syntax is already sort of a mishmash here and there). Along these lines though, @SimonSapin do you know if Python (where I believe we originally cribbed the syntax from) has a similar feature to this? Perhaps any experience we can draw from?

Overall I'd be willing to merge, but would just want a gut check against other languages (especially Python) to see if this is already a feature somewhere else that we can draw experience from.

@SimonSapin
Copy link
Contributor Author

Python classes can have separate __repr__ and __str__ methods whose role roughly map to our Debug v.s. Display. But in the new-style formatting strings (the ones with {} as opposed to %) they’re not represented as a :? formatting "type" but as !r and !s "conversions" that cause repr() or str() to be called on the value before formatting it. So we already diverge there.

As to hexadecimal formatting, Python also has :x and :X but it seems to be only supported on integers. Trying to use it for example on a list of integers raises an error.

@alexcrichton
Copy link
Member

Oh ok so just to confirm, at least with Python, if you have a custom composite like:

struct Foo {
    a: i64,
}

There's no automatic way to format that with hex vs decimal?

@quodlibetor
Copy link

No, there's a magic method you can implement that allows you to customize formatting:

class MyHex:
    def __init__(self, val):
        self.val = val
    def __format__(self, spec):
        if spec == 'x':
            return hex(self.val)
        if spec == '':
            return str(self.val)
        return 'unknown spec'
>>> h = MyHex(89)
>>> '{:x}'.format(h)
'0x59'

>>> '{}'.format(h)
'89'

>>> '{:jk}'.format(h)
'unknown spec'

You get the raw string, so you would need to implement all the custom hex
display per-field yourself.

In general Python doesn't have anything nearly as fancy as Debug, similar
things are got from inheriting from different classes (namedtuple, the new
dataclasses coming in 3.7, a few third-party libraries.)

@ExpHP
Copy link

ExpHP commented Jan 17, 2018

There's no automatic way to format that with hex vs decimal?

There's no way to automatically format that, period.

  • The closest thing to Debug in Python is self.__repr__(), which takes no arguments.
  • The closest thing to Display in Python is self.__str__(), which takes no arguments.
  • The default implementation of __str__ is to call __repr__. (and thanks to this, it seems to me the vast majority of python coders do not appear to understand the difference. Good __repr__ definitions are rare...)
  • The default implementation of __repr__ prints something like
<my_module.Klass object at 0x7f934cc33c88>

so basically you need to do what DebugStruct does yourself. And that's just to print it without format flags. To get format flags, you can indeed override __format__ and have __repr__ delegate to it, but overall, __format__ definitions are spectacularly rare in the wild.

@alexcrichton
Copy link
Member

Ok thanks for the info @quodlibetor and @ExpHP! In that case sounds like there's not a lot of prior art on this specifically. As a result since it seems reasonable to me I'll....

@rfcbot fcp merge

@rfcbot
Copy link
Collaborator

rfcbot commented Jan 18, 2018

Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Jan 18, 2018
[summary]: #summary

Add support for formatting integers as hexadecimal with the `fmt::Debug` trait,
including when the occur within larger types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/the/they

For example, an RGBA color encoded in `u32` with 8 bits per channel is easier to understand
when shown as `00CC44FF` than `13387007`.

The `std::fmt::UpperHex` and `std::fmt::LowerHex` provide hexadecimal formatting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The std::fmt::UpperHex and std::fmt::LowerHex traits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

@rfcbot
Copy link
Collaborator

rfcbot commented Feb 16, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

1 similar comment
@rfcbot
Copy link
Collaborator

rfcbot commented Feb 16, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Feb 16, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Feb 26, 2018

The final comment period is now complete.

@alexcrichton
Copy link
Member

Ok! I'll now merge this with a tracking issue

@alexcrichton alexcrichton merged commit 16cfef8 into rust-lang:master Feb 27, 2018
@quodlibetor
Copy link

The correct rendered link (the target of the original comment is deleted) is now: /~https://github.com/rust-lang/rfcs/blob/master/text/2226-fmt-debug-hex.md

@Centril
Copy link
Contributor

Centril commented Mar 1, 2018

@quodlibetor Fixed, thanks!

@SimonSapin
Copy link
Contributor Author

rust-lang/rust#48978 implements this RFC, but without the new Formatter public API (for reasons explained there).

SimonSapin added a commit to SimonSapin/rust that referenced this pull request Mar 13, 2018
This can be used for integers within a larger types which implements Debug
(possibly through derive) but not fmt::UpperHex or fmt::LowerHex.

```rust
assert!(format!("{:02x?}", b"Foo\0") == "[46, 6f, 6f, 00]");
assert!(format!("{:02X?}", b"Foo\0") == "[46, 6F, 6F, 00]");
```

RFC: rust-lang/rfcs#2226
bors added a commit to rust-lang/rust that referenced this pull request Mar 19, 2018
Add hexadecimal formatting of integers with fmt::Debug

This can be used for integers within a larger types which implements Debug (possibly through derive) but not fmt::UpperHex or fmt::LowerHex.

```rust
assert!(format!("{:02x?}", b"Foo\0") == "[46, 6f, 6f, 00]");
assert!(format!("{:02X?}", b"Foo\0") == "[46, 6F, 6F, 00]");
```

RFC: rust-lang/rfcs#2226

The new formatting string syntax (`x?` and `X?`) is insta-stable in this PR because I don’t know how to change a built-in proc macro’s behavior based of a feature gate. I can look into adding that, but I also strongly suspect that keeping this feature unstable for a time period would not be useful as possibly no-one would use it during that time.

This PR does not add the new (public) `fmt::Formatter` proposed in the API because:

* There was some skepticism on response to this part of the RFC
* It is not possible to implement as-is without larger changes to `fmt`, because `Formatter` at the moment has no easy way to tell apart for example `Octal` from `Binary`: it only has a function pointer for the relevant `fmt()` method.

If some integer-like type outside of `std` want to implement this behavior, another RFC will likely need to propose a different public API for `Formatter`.
@Centril Centril added A-debugging Debugging related proposals & ideas A-primitive Primitive types related proposals & ideas labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-debugging Debugging related proposals & ideas A-primitive Primitive types related proposals & ideas final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.