-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACP: implement char_slice
for &str
#481
Comments
I would argue that if you're trying to slice by indexes of chars, you're designing your code wrong, so this method is intentionally left out and you should strongly consider using byte indexes instead or use something like splitting into grapheme clusters (which are actually what you should be using if you're trying to split it into user visible characters, though there are still caveats surrounding special characters like "ffi" (which is only 1 |
@programmerjake I'm writing a database to provide string functions. Database users typically slice strings by indices of chars (e.g., SUBSTR). I know that some unicode character is not typically "one intuitive human-readable character," and that may result in a third-party crates rather than an std function (the current state). So here is the issue to see other users feedback.
Are there some references or implementations to refer to? |
#![feature(iter_advance_by)]
fn char_slice(a: &str, begin: usize, end: usize) -> &str {
let mut chars = a.chars();
chars.advance_by(begin).expect("begin index in range");
let slice_1 = chars.as_str();
chars.advance_by(end - begin).expect("end index in range");
let slice_2 = chars.as_str();
&slice_1[..slice_1.len() - slice_2.len()]
}
fn main() {
assert_eq!(char_slice("零1二3四5六7八9十", 2, 7), "二3四5六");
} |
ok, so SQL was mis-designed (though in their defense they probably designed it back when everyone thought unicode characters were the one true character, like Win32's
reference: https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries |
A stable version that almost works: fn char_slice(a: &str, begin: usize, end: usize) -> &str {
let (begin, _) = a.char_indices().nth(begin).unwrap();
let (end, _) = a.char_indices().nth(end).unwrap();
&a[begin..end]
}
fn main() {
assert_eq!(char_slice("零1二3四5六7八9十", 2, 7), "二3四5六");
} Though that really makes me want a |
a stable version that should work: fn remove_leading_chars(s: &str, count: usize) -> &str {
count.checked_sub(1).map_or(s, |n| {
let mut chars = s.chars();
chars.nth(n).expect("index out of range");
chars.as_str()
})
}
fn char_slice(s: &str, begin: usize, end: usize) -> &str {
let slice_1 = remove_leading_chars(s, begin);
let slice_2 = remove_leading_chars(slice_1, end - begin);
&slice_1[..slice_1.len() - slice_2.len()]
}
imo we need to stabilize |
While I would like to see that stabilized, I don't think that's the reason I'm wanting it. The problem is the existing It's like we allowed (Not this issue's problem, though.) |
Proposal
Problem statement
Although
&str
haschars
andchar_indices
, currently, obtain a substring in aspect of chars is still wordy.Motivating examples or use cases
Solution sketch
Implement a
char_slice
method for&str
:See also Alternatives below for other possible APIs. And I feel that we can discuss about the details of the implementation on a PR once we agree on the overall direction.
Alternatives
It may be more intuitive to use the slice syntax
"my lovely string"[lo..hi]
, but that is taken by the bytes-level slices.It can be also possible to add a wrapper type like
struct CharStr<'a>(&'a str)
and implement the slice syntax on the new type, but I'm not sure if it falls into Rust's idiom.There is also third-party crate like stringslice. But IMO it's a bit over generic and less maintainable. Given that this is an essential part of string manipulation, perhaps we can add it to the std.
Links and related work
The text was updated successfully, but these errors were encountered: