Cargo build error: required because of the requirements on the impl of Pattern<'a>
for regex::Regex
#834
-
Hi! Could I ask for some help with my first Rust project? Basically I'm trying to split a string on a regex keeping the separators with src/main.rs struct RegexInclusiveSplit<'a> {
text: &'a str,
indices_iter: &'a str::MatchIndices<'a, Regex>,
last_item_consumed: (usize, &'a str), // Consumed from indices_iter
next_token_index: usize,
}
/* Some more code...
impl RegexInclusiveSplit<'_> {
fn new(&mut self, text: &str, pattern: Regex) -> Self {
...
impl<'a> Iterator for RegexInclusiveSplit<'a> {
type Item = &'a str;
fn next(&mut self) -> Option<Self::Item> {
...
*/
fn main() {
let input_text = fs::read_to_string("neuromancer.txt").unwrap();
let token_delimiter_re = Regex::new(r"(([\.,:]?( |\t|\n)+)|--)").unwrap();
for token in RegexInclusiveSplit::new(input_text, token_delimiter_re) {
println!("{}", token);
}
} rust-toolchain.toml [toolchain]
channel = "nightly" Cargo.toml [package]
name = "rust-markov-strings"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies.regex]
version = "*"
features = ["std", "pattern"]
[features]
default = ["regex/pattern"] However, I get the following compilation error:
So I'm guessing I did something wrong in my I'd appreciate any feedback you can provide! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
Could you please provide enough information to reproduce your error? The code you gave here has sections of it commented out and otherwise appears incomplete. Separately, you might consider explaining the higher level problem you're trying to solve here. Using unstable nightly features feels like a bad idea to me for your first project. |
Beta Was this translation helpful? Give feedback.
-
Well hang on. In the SO post you linked, the code there is quite a bit different than the code you posted here. So the fact that it worked for the user on SO doesn't mean anything here, right? All right, so the code you gave has oodles of compilation errors beyond the one we're looking at. One of the main problems here is that the compiler error you're getting is inscrutable. I don't even understand it. So my next step is to go back to first principles. The problem has something to do with use std::{fs, str};
use regex::Regex;
struct RegexInclusiveSplit<'t, 'r> {
text: &'t str,
indices_iter: str::MatchIndices<'t, &'r Regex>,
last_item_consumed: Option<(usize, &'t str)>, // Consumed from indices_iter
next_token_index: usize,
}
impl<'t, 'r> RegexInclusiveSplit<'t, 'r> {
fn new(text: &'t str, pattern: &'r Regex) -> Self {
let mut indices_iter = text.match_indices(pattern);
let last_item_consumed = indices_iter.next();
RegexInclusiveSplit {
text,
indices_iter,
last_item_consumed,
next_token_index: 0,
}
}
}
impl<'t, 'r> Iterator for RegexInclusiveSplit<'t, 'r> {
type Item = &'t str;
fn next(&mut self) -> Option<Self::Item> {
if let Some((match_index, content)) = self.last_item_consumed {
if match_index == self.next_token_index {
self.next_token_index += content.len();
self.last_item_consumed = self.indices_iter.next();
return Some(content);
} else if match_index > self.next_token_index {
let retval = Some(&self.text[self.next_token_index..match_index]);
self.next_token_index = match_index;
return retval;
} else {
panic!("Should be unreachable");
}
} else {
None
}
}
}
fn main() {
let input_text = fs::read_to_string("neuromancer.txt").unwrap();
let token_delimiter_re = Regex::new(r"(([\.,:]?( |\t|\n)+)|--)").unwrap();
for token in RegexInclusiveSplit::new(&input_text, &token_delimiter_re) {
println!("{:?}", token);
}
} Popping up a level, what went wrong here? I think the first thing that went wrong is that you're trying to use unstable features while learning the language. This seems like a recipe for disaster to me. IMO, that SO answer has led you astray. There is absolutely nothing you can do with the OK, so how can we jettison the unstable feature here? Well first, I picked some input:
The above program outputs:
So how would I re-write your program without using unstable features in a way that produces the same output and otherwise preserves your iterator structure? I think this is what I'd do: use std::{fs, str};
use regex::{self, Regex};
struct RegexInclusiveSplit<'r, 't> {
text: &'t str,
it: regex::Matches<'r, 't>,
last: Option<regex::Match<'t>>,
reported: usize,
}
impl<'r, 't> RegexInclusiveSplit<'r, 't> {
fn new(text: &'t str, re: &'r Regex) -> RegexInclusiveSplit<'r, 't> {
let it = re.find_iter(text);
RegexInclusiveSplit { text, it, last: None, reported: 0 }
}
}
impl<'r, 't> Iterator for RegexInclusiveSplit<'r, 't> {
type Item = &'t str;
fn next(&mut self) -> Option<&'t str> {
if let Some(last) = self.last.take() {
self.reported = last.end();
Some(last.as_str())
} else {
let token = match self.it.next() {
Some(token) => token,
None if self.reported >= self.text.len() => return None,
None => {
let remaining = &self.text[self.reported..];
self.reported = self.text.len();
return Some(remaining);
}
};
// This handles the case when a token starts immediately after
// another token ends.
if self.reported == token.start() {
self.reported = token.end();
Some(token.as_str())
} else {
// Report the text between the last thing reported and the
// start of this token, and then stuff the token away to report
// on the next iteration.
let report = &self.text[self.reported..token.start()];
self.reported = token.start();
self.last = Some(token);
Some(report)
}
}
}
}
fn main() {
let input_text = fs::read_to_string("neuromancer.txt").unwrap();
let token_delimiter_re = Regex::new(r"(([\.,:]?( |\t|\n)+)|--)").unwrap();
for token in RegexInclusiveSplit::new(&input_text, &token_delimiter_re) {
println!("{:?}", token);
}
} With that said, I believe there is a simpler approach here. The idea is to write a regex that matches all text, and use capturing groups to determine whether a delimiter was matched or if a token was matched. This way, you aren't trying to track a bunch of state. Here's my crack at that: use std::{fs, str};
use regex::{self, Regex};
struct RegexInclusiveSplit<'r, 't> {
it: regex::CaptureMatches<'r, 't>,
}
impl<'r, 't> RegexInclusiveSplit<'r, 't> {
fn new(text: &'t str, re: &'r Regex) -> RegexInclusiveSplit<'r, 't> {
let it = re.captures_iter(text);
RegexInclusiveSplit { it }
}
}
impl<'r, 't> Iterator for RegexInclusiveSplit<'r, 't> {
type Item = &'t str;
fn next(&mut self) -> Option<&'t str> {
let caps = self.it.next()?;
if let Some(m) = caps.name("delimiter") {
Some(m.as_str())
} else {
// This unwrap is okay, because if 'delimiter' does not match,
// then it must follow that 'token' matches if the overall regex
// matches.
Some(caps.name("token").unwrap().as_str())
}
}
}
fn main() {
let input_text = fs::read_to_string("neuromancer.txt").unwrap();
let token_delimiter_re = Regex::new(
r"(?P<delimiter>([\.,:]?( |\t|\n)+)|--)|(?P<token>\w+)",
).unwrap();
for token in RegexInclusiveSplit::new(&input_text, &token_delimiter_re) {
println!("{:?}", token);
}
} So, a lot simpler. The downside is that you also have to write a regex for matching your tokens, which may or may not be feasible. |
Beta Was this translation helpful? Give feedback.
Well hang on. In the SO post you linked, the code there is quite a bit different than the code you posted here. So the fact that it worked for the user on SO doesn't mean anything here, right?
All right, so the code you gave has oodles of compilation errors beyond the one we're looking at. One of the main problems here is that the compiler error you're getting is inscrutable. I don't even understand it. So my next step is to go back to first principles. The problem has something to do with
MatchIndices
and the fact thatRegex
isn't working inMatchIndices<'a, Regex>
. If you look at its definition, its type parameterP
must implement thePat…