-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved docs for CStr, CString, OsStr, OsString #44855
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
|
Oops, long lines... will fix. I'll also clarify that CStr/CString are bags of zero-terminated bytes, and UTF-8 only happens when making a string out of them. |
src/libstd/ffi/c_str.rs
Outdated
/// This type serves the primary purpose of being able to safely generate a | ||
/// C-compatible string from a Rust byte slice or vector. An instance of this | ||
/// This type serves the purpose of being able to safely generate a | ||
/// C-compatible UTF-8 string from a Rust byte slice or vector. An instance of this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't guaranteed by CStr
.
src/libstd/ffi/mod.rs
Outdated
@@ -8,7 +8,145 @@ | |||
// option. This file may not be copied, modified, or distributed | |||
// except according to those terms. | |||
|
|||
//! Utilities related to FFI bindings. | |||
//! This module provides utilities to handle C-like strings. It is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is a bit misleading because OsString
isn't a C string on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better way to describe it might be "to handle data across non-Rust interfaces, like other programming languages and the underlying operating system"
src/libstd/ffi/mod.rs
Outdated
//! borrowed slices of strings with the [`str`] primitive. Both are | ||
//! always in UTF-8 encoding, and may contain nul bytes in the middle, | ||
//! i.e. if you look at the bytes that make up the string, there may | ||
//! be a `0` among them. Both `String` and `str` know their length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the '0' here makes it look like you're referring to a zero digit, not a literal zero. Perhaps use '\0'
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another nit: I'd word "know their length" as "store their length explicitly" because technically we "know" the length of a C-string, but it's not computed in O(1) time.
src/libstd/ffi/mod.rs
Outdated
//! | ||
//! C strings are different from Rust strings: | ||
//! | ||
//! * **Encodings** - C strings may have different encodings. If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that "encoding" here is a bit inaccessible to people who are unfamiliar with how string encoding works. I'd say introduce it with "Rust strings are UTF-8, but C strings may use other encodings. If you're using a string from C, you may have to check its encoding explicitly, rather than just assuming that it's UTF-8 like you can in Rust."
src/libstd/ffi/mod.rs
Outdated
//! you are bringing in strings from C APIs, you should check what | ||
//! encoding you are getting. Rust strings are always UTF-8. | ||
//! | ||
//! * **Character width** - C strings may use "normal" or "wide" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Width" here may be what C uses, but it's again misleading because Unicode has its own specific definition of width. I'd say "size" instead. Instead of using "normal" and "wide," I'd just say directly that C uses two types, char
(clarifying that this is different from Rust's type) and wchar_t
, which are different sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also clarify that wchar_t
is referred to by "wide character" but that this doesn't actually reflect the Unicode width, but the size of the character in bytes.
src/libstd/ffi/mod.rs
Outdated
//! '[Unicode code point]'. | ||
//! | ||
//! * **Nul terminators and implicit string lengths** - Often, C | ||
//! strings are nul-terminated, i.e. they have a `0` character at the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'd use '\0'
here instead of '0'.
src/libstd/ffi/mod.rs
Outdated
//! | ||
//! * **Nul terminators and implicit string lengths** - Often, C | ||
//! strings are nul-terminated, i.e. they have a `0` character at the | ||
//! end. The length of a string buffer is not known *a priori*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to use Latin; just say that it isn't stored, but has to be calculated. IMHO we should keep language simple if possible to be more accessible to non-native speakers.
src/libstd/ffi/mod.rs
Outdated
//! `wcslen()` for `wchar_t`-based ones. Those functions return the | ||
//! number of characters in the string excluding the nul terminator, | ||
//! so the buffer length is really `len+1` characters. Rust strings | ||
//! don't have a nul terminator, and they always know their length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also note in here somewhere that Rust's way of doing it means that you can easily access a string's length, whereas there's an implicit cost to it in C. This also may carry over to CStr
if its implementation changes.
src/libstd/ffi/mod.rs
Outdated
//! so the buffer length is really `len+1` characters. Rust strings | ||
//! don't have a nul terminator, and they always know their length. | ||
//! | ||
//! * **No nul characters in the middle of the string** - When C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd word this as "Internal NULs" as a more succinct version
src/libstd/ffi/mod.rs
Outdated
//! strings have a nul terminator character, this usually means that | ||
//! they cannot have nul characters in the middle — a nul character | ||
//! would essentially truncate the string. Rust strings *can* have | ||
//! nul characters in the middle, since they don't use nul |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than "don't use nul terminators," it's clearer to say "because NUL doesn't have to mark the end of the string in Rust"
src/libstd/ffi/mod.rs
Outdated
//! # Representations of non-Rust strings | ||
//! | ||
//! [`CString`] and [`CStr`] are useful when you need to transfer | ||
//! UTF-8 strings to and from C, respectively: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expand this to languages with a C ABI like Python, etc. People should know that a CStr
might be necessary when interacting with other languages too.
src/libstd/ffi/mod.rs
Outdated
//! UTF-8 strings to and from C, respectively: | ||
//! | ||
//! * **From Rust to C:** [`CString`] represents an owned, C-friendly | ||
//! UTF-8 string: it is valid UTF-8, it is nul-terminated, and has no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not always valid UTF-8.
src/libstd/ffi/mod.rs
Outdated
//! | ||
//! * **From C to Rust:** [`CStr`] represents a borrowed C string; it | ||
//! is what you would use to wrap a raw `*const u8` that you got from | ||
//! a C function. A `CStr` is just guaranteed to be a nul-terminated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"just" seems out of place here; I'd remove it.
src/libstd/ffi/mod.rs
Outdated
//! * **From C to Rust:** [`CStr`] represents a borrowed C string; it | ||
//! is what you would use to wrap a raw `*const u8` that you got from | ||
//! a C function. A `CStr` is just guaranteed to be a nul-terminated | ||
//! array of bytes; the UTF-8 validation step only happens when you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the UTF-8 validation step" is only just mentioned here so I'd just make a separate sentence describing how that works instead, along the lines of "once you have a CStr
, you can convert it to a Rust str
if it's valid UTF-8, or lossily convert it by adding replacement characters"
src/libstd/ffi/mod.rs
Outdated
//! request to convert it to a `&str`. | ||
//! | ||
//! [`OsString`] and [`OsStr`] are useful when you need to transfer | ||
//! strings to and from operating system calls. If you need Rust |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of programmers may not know what system calls are; I'd probably word this as "the operating system itself."
It may also make sense to include examples where this happens, like in opening files and running external commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the "If you need Rust strings out of them [...]" section is kind of redundant and wordy. I'd probably just say that conversions between OsStr
and str
work very similarly to CStr
and leave it at that.
Great work! I've interacted a lot with |
I've integrated the changes per your comments. How's it look now? :) |
Looks good to me! Again, great work! :) |
Thank you! |
Poke @aturon — this is now ready for your masterful reviewing skills! |
Actually, @aturon wasn't available last week and is on PTO this week, so let's try.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic, thank you so much!
I have a few little formatting nits, but after that, let's get this merged!
src/libstd/ffi/c_str.rs
Outdated
@@ -149,8 +209,13 @@ pub struct CStr { | |||
} | |||
|
|||
/// An error returned from [`CString::new`] to indicate that a nul byte was found | |||
/// in the vector provided. | |||
/// in the vector provided. While Rust strings may contain nul bytes in the middle, | |||
/// C strings can't, as that byte would effectively truncate the string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we change this up a bit? We try to have a summary sentence first, then the rest of it. This one has a long summary, and repeats itself since you added the information below. How about:
/// An error indicating that an interior nul byte was found.
///
/// While Rust strings may contain nul bytes in the middle, C strings can't, as that byte would effectively
/// truncate the string.
///
/// This `struct`....
with the correct wrapping, I just guessed here. What do you think?
src/libstd/ffi/c_str.rs
Outdated
/// that a nul byte was found too early in the slice provided, or one | ||
/// wasn't found at all for the nul terminator. The slice used to | ||
/// create a `CStr` must have one and only one nul byte at the end of | ||
/// the slice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here; don't repeat where it came from, make sure to have a short summary, some space, and then a longer description.
src/libstd/ffi/c_str.rs
Outdated
/// UTF-8 error was encountered during the conversion. `CString` is | ||
/// just a wrapper over a buffer of bytes with a nul terminator; | ||
/// [`into_string`][`CString::into_string`] performs UTF-8 validation | ||
/// and may return this error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
src/libstd/ffi/c_str.rs
Outdated
/// underlying bytes to construct a new string, ensuring that | ||
/// there is a trailing 0 byte. This trailing 0 byte will be | ||
/// appended by this method; the provided data should *not* | ||
/// contain any 0 bytes in it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't a method; could you say "function" instead?
src/libstd/ffi/mod.rs
Outdated
@@ -8,7 +8,156 @@ | |||
// option. This file may not be copied, modified, or distributed | |||
// except according to those terms. | |||
|
|||
//! Utilities related to FFI bindings. | |||
//! This module provides utilities to handle data across non-Rust |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep this short summary, but with a newline between it, so you get the summary. That is:
///! Utilities related to FFI bindings.
//!
//! This module provides utilities....
src/libstd/ffi/mod.rs
Outdated
//! C strings are different from Rust strings: | ||
//! | ||
//! * **Encodings** - Rust strings are UTF-8, but C strings may use | ||
//! other encodings. If you are using a string from C, you should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one space after a period, not two please!
src/libstd/ffi/mod.rs
Outdated
//! characters; please **note** that C's `char` is different from Rust's. | ||
//! The C standard leaves the actual sizes of those types open to | ||
//! interpretation, but defines different APIs for strings made up of | ||
//! each character type. Rust strings are always UTF-8, so different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here, and everywhere 😄
… the beginning Per #44855 (comment) and subsequent ones.
Thanks! @bors: r+ rollup |
📌 Commit 5fb8e3d has been approved by |
…eklabnik Improved docs for CStr, CString, OsStr, OsString This expands the documentation for those structs and their corresponding traits, per rust-lang#29354
rust-lang/rust#44855 should be attributed to @federicomenaquintero
This expands the documentation for those structs and their corresponding traits, per #29354