Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for const_is_char_boundary #131516

Open
1 of 3 tasks
zachs18 opened this issue Oct 10, 2024 · 2 comments · May be fixed by #134016
Open
1 of 3 tasks

Tracking Issue for const_is_char_boundary #131516

zachs18 opened this issue Oct 10, 2024 · 2 comments · May be fixed by #134016
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@zachs18
Copy link
Contributor

zachs18 commented Oct 10, 2024

Feature gate: #![feature(const_is_char_boundary)]

This is a tracking issue for using str::is_char_boundary in const, which allows checking that index-th byte is the first byte in a UTF-8 code point sequence or the end of the string during const-eval.

Public API

// core::str
impl str {
    pub const fn is_char_boundary(&self, index: usize) -> bool;
}

Steps / History

Unresolved Questions

  • None yet.

Footnotes

  1. https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html

@zachs18 zachs18 added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Oct 10, 2024
@okaneco
Copy link
Contributor

okaneco commented Oct 10, 2024

I was looking at this as well, it opens the door for making str::split_at* and the round_char_boundary unstable feature functions const (the latter needs the iterator code to be rewritten).

I don't believe the check for index == 0 is necessary so one possible implementation is as follows.

pub const fn is_char_boundary(&self, index: usize) -> bool {
    if index < self.as_bytes().len() {
        self.as_bytes()[index].is_utf8_char_boundary()
    } else {
        // The start and end of a string are considered boundaries
        index == self.len()
    }
}

@zachs18
Copy link
Contributor Author

zachs18 commented Oct 10, 2024

Yeah, this was motivated by making str::split_at* const (cc #131518, which #131520 also implements). I looked at {floor,ceil}_char_boundary but did not want to mess with them at the moment since the rest were relatively simple changes.

workingjubilee added a commit to workingjubilee/rustc that referenced this issue Oct 29, 2024
Mark `str::is_char_boundary` and `str::split_at*` unstably `const`.

Tracking issues: rust-lang#131516, rust-lang#131518

First commit implements `const_is_char_boundary`, second commit implements `const_str_split_at` (which depends on `const_is_char_boundary`)

~~I used `const_eval_select` for `is_char_boundary` since there is a comment about optimizations that would theoretically not happen with the simple `const`-compatible version (since `slice::get` is not `const`ifiable) cc rust-lang#84751. I have not checked if this code difference is still required for the optimization, so it might not be worth the code complication, but 🤷.~~

This changes `str::split_at_checked` to use a new private helper function `split_at_unchecked` (copied from `split_at_mut_unchecked`) that does pointer stuff instead of `get_unchecked`, since that is not currently `const`ifiable due to using the `SliceIndex` trait.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Oct 29, 2024
Rollup merge of rust-lang#131520 - zachs18:const-str-split, r=Noratrieb

Mark `str::is_char_boundary` and `str::split_at*` unstably `const`.

Tracking issues: rust-lang#131516, rust-lang#131518

First commit implements `const_is_char_boundary`, second commit implements `const_str_split_at` (which depends on `const_is_char_boundary`)

~~I used `const_eval_select` for `is_char_boundary` since there is a comment about optimizations that would theoretically not happen with the simple `const`-compatible version (since `slice::get` is not `const`ifiable) cc rust-lang#84751. I have not checked if this code difference is still required for the optimization, so it might not be worth the code complication, but 🤷.~~

This changes `str::split_at_checked` to use a new private helper function `split_at_unchecked` (copied from `split_at_mut_unchecked`) that does pointer stuff instead of `get_unchecked`, since that is not currently `const`ifiable due to using the `SliceIndex` trait.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants