Cargo build error: required because of the requirements on the impl of `Pattern<'a>` for `regex::Regex` #834

nicolaslazo · 2022-01-30T10:32:33Z

nicolaslazo
Jan 30, 2022

Hi! Could I ask for some help with my first Rust project?

Basically I'm trying to split a string on a regex keeping the separators with str::match_indices. I made sure to use nightly Rust and I think I enabled the "pattern" feature properly.

src/main.rs

struct RegexInclusiveSplit<'a> {
    text: &'a str,
    indices_iter: &'a str::MatchIndices<'a, Regex>,
    last_item_consumed: (usize, &'a str),  // Consumed from indices_iter
    next_token_index: usize,
}

/* Some more code... 
impl RegexInclusiveSplit<'_> {
    fn new(&mut self, text: &str, pattern: Regex) -> Self {
       ...

impl<'a> Iterator for RegexInclusiveSplit<'a> {
    type Item = &'a str;

    fn next(&mut self) -> Option<Self::Item> {
       ...
*/

fn main() {
    let input_text = fs::read_to_string("neuromancer.txt").unwrap();
    let token_delimiter_re = Regex::new(r"(([\.,:]?( |\t|\n)+)|--)").unwrap();

    for token in RegexInclusiveSplit::new(input_text, token_delimiter_re) {
        println!("{}", token);
    }
}

rust-toolchain.toml

[toolchain]
channel = "nightly"

Cargo.toml

[package]
name = "rust-markov-strings"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies.regex]
version = "*"
features = ["std", "pattern"]

[features]
default = ["regex/pattern"]

However, I get the following compilation error:

error[E0277]: expected a `FnMut<(char,)>` closure, found `regex::Regex`
    --> src/main.rs:6:23
     |
6    |     indices_iter: &'a str::MatchIndices<'a, Regex>,
     |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected an `FnMut<(char,)>` closure, found `regex::Regex`
     |
     = help: the trait `FnMut<(char,)>` is not implemented for `regex::Regex`
     = note: required because of the requirements on the impl of `Pattern<'a>` for `regex::Regex`

So I'm guessing I did something wrong in my Cargo.toml? Those [dependencies.regex] and [features] feel redundant.

I'd appreciate any feedback you can provide!

Answered by BurntSushi

Feb 2, 2022

One user in the StackOverflow thread said that

Well hang on. In the SO post you linked, the code there is quite a bit different than the code you posted here. So the fact that it worked for the user on SO doesn't mean anything here, right?

All right, so the code you gave has oodles of compilation errors beyond the one we're looking at. One of the main problems here is that the compiler error you're getting is inscrutable. I don't even understand it. So my next step is to go back to first principles. The problem has something to do with MatchIndices and the fact that Regex isn't working in MatchIndices<'a, Regex>. If you look at its definition, its type parameter P must implement the Pat…

View full answer

BurntSushi · 2022-01-30T12:15:04Z

BurntSushi
Jan 30, 2022
Maintainer

Could you please provide enough information to reproduce your error? The code you gave here has sections of it commented out and otherwise appears incomplete.

Separately, you might consider explaining the higher level problem you're trying to solve here. Using unstable nightly features feels like a bad idea to me for your first project.

2 replies

nicolaslazo Jan 30, 2022
Author

Sorry, here's the full code!

use regex::Regex;
use std::{fs,str};

struct RegexInclusiveSplit<'a> {
    text: &'a str,
    indices_iter: &'a str::MatchIndices<'a, Regex>,
    last_item_consumed: (usize, &'a str),  // Consumed from indices_iter
    next_token_index: usize,
}

impl RegexInclusiveSplit<'_> {
    fn new(&mut self, text: &str, pattern: Regex) -> Self {
	RegexInclusiveSplit {
	    text: text,
	    indices_iter: text.match_indices(pattern),
	    last_item_consumed: self.indices_iter.next(),
	    next_token_index: 0
	}
    }
}

impl<'a> Iterator for RegexInclusiveSplit<'a> {
    type Item = &'a str;

    fn next(&mut self) -> Option<Self::Item> {
	if let Some((match_index, content)) = self.last_item_consumed {
	    if match_index == self.next_token_index {
		self.next_token_index += content.len();
		self.last_item_consumed = self.indices_iter.next();
		return Some(content)
	    }
	    else if match_index > self.next_token_index {
		let retval = Some(self.text[self.next_token_index..match_index]);
		self.next_token_index = match_index;
		return Some(retval);
	    }
	    else {
		panic!("Should be unreachable");
	    }
	}
	else { None }
    }
}

fn main() {
    let input_text = fs::read_to_string("neuromancer.txt").unwrap();
    let token_delimiter_re = Regex::new(r"(([\.,:]?( |\t|\n)+)|--)").unwrap();

    for token in RegexInclusiveSplit::new(input_text, token_delimiter_re) {
        println!("{}", token);
    }
}

What I'm trying to do right now is splitting a string on a regex while keeping the delimiters. I'm heavily inspired by this StackOverflow thread in the implementation. I was hoping Regex::split would work like in Python where you can use capture groups to keep separators but I'd really like to see if I can get this working on Rust

nicolaslazo Feb 1, 2022
Author

There's something else I forgot to mention, the compilation error was truncated as well. Here's the full thing:

   Compiling rust-markov-strings v0.1.0 (/home/user/rust-markov-chains)
error[E0277]: expected a `FnMut<(char,)>` closure, found `regex::Regex`
    --> src/main.rs:6:23
     |
6    |     indices_iter: &'a std::str::MatchIndices<'a, Regex>,
     |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected an `FnMut<(char,)>` closure, found `regex::Regex`
     |
     = help: the trait `FnMut<(char,)>` is not implemented for `regex::Regex`
     = note: required because of the requirements on the impl of `Pattern<'a>` for `regex::Regex`
note: required by a bound in `MatchIndices`
    --> /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/iter.rs:1012:1
     |
1012 | / generate_pattern_iterators! {
1013 | |     forward:
1014 | |         /// Created with the method [`match_indices`].
1015 | |         ///
...    |
1027 | |     delegate double ended;
1028 | | }
     | |_^ required by this bound in `MatchIndices`
     = note: this error originates in the macro `generate_pattern_iterators` (in Nightly builds, run with -Z macro-backtrace for more info)

For more information about this error, try `rustc --explain E0277`.
error: could not compile `rust-markov-strings` due to previous error

One user in the StackOverflow thread said that

[dependencies]
regex = { version = "1.1.8", features = ["pattern"] }

Worked for them, but it didn't do it for me. Neither did changing the version number with *.

BurntSushi · 2022-02-02T14:20:07Z

BurntSushi
Feb 2, 2022
Maintainer

One user in the StackOverflow thread said that

Well hang on. In the SO post you linked, the code there is quite a bit different than the code you posted here. So the fact that it worked for the user on SO doesn't mean anything here, right?

All right, so the code you gave has oodles of compilation errors beyond the one we're looking at. One of the main problems here is that the compiler error you're getting is inscrutable. I don't even understand it. So my next step is to go back to first principles. The problem has something to do with MatchIndices and the fact that Regex isn't working in MatchIndices<'a, Regex>. If you look at its definition, its type parameter P must implement the Pattern trait. So does Regex actually implement Pattern? It turns out that, no, it doesn't. Instead, it's &Regex that impls Pattern. So you can't have a MatchIndices<'a, Regex>. You need a MatchIndices<'a, &Regex>. Once I fixed that, there were many many other errors that popped up, so it's possible that the compiler got confused because of those. I'm not sure. Here's the final code that compiles (although I don't know if it works because you haven't provided the input to the program):

use std::{fs, str};

use regex::Regex;

struct RegexInclusiveSplit<'t, 'r> {
    text: &'t str,
    indices_iter: str::MatchIndices<'t, &'r Regex>,
    last_item_consumed: Option<(usize, &'t str)>, // Consumed from indices_iter
    next_token_index: usize,
}

impl<'t, 'r> RegexInclusiveSplit<'t, 'r> {
    fn new(text: &'t str, pattern: &'r Regex) -> Self {
        let mut indices_iter = text.match_indices(pattern);
        let last_item_consumed = indices_iter.next();
        RegexInclusiveSplit {
            text,
            indices_iter,
            last_item_consumed,
            next_token_index: 0,
        }
    }
}

impl<'t, 'r> Iterator for RegexInclusiveSplit<'t, 'r> {
    type Item = &'t str;

    fn next(&mut self) -> Option<Self::Item> {
        if let Some((match_index, content)) = self.last_item_consumed {
            if match_index == self.next_token_index {
                self.next_token_index += content.len();
                self.last_item_consumed = self.indices_iter.next();
                return Some(content);
            } else if match_index > self.next_token_index {
                let retval = Some(&self.text[self.next_token_index..match_index]);
                self.next_token_index = match_index;
                return retval;
            } else {
                panic!("Should be unreachable");
            }
        } else {
            None
        }
    }
}

fn main() {
    let input_text = fs::read_to_string("neuromancer.txt").unwrap();
    let token_delimiter_re = Regex::new(r"(([\.,:]?( |\t|\n)+)|--)").unwrap();

    for token in RegexInclusiveSplit::new(&input_text, &token_delimiter_re) {
        println!("{:?}", token);
    }
}

Popping up a level, what went wrong here? I think the first thing that went wrong is that you're trying to use unstable features while learning the language. This seems like a recipe for disaster to me. IMO, that SO answer has led you astray. There is absolutely nothing you can do with the Pattern trait that you can't do with simpler straight-line code. (Which I'll show in a second.) Jumping to unstable features just doesn't make sense. And if I could do it over again, I probably wouldn't have added Pattern support at all. It's just not worth it. Clearly, it's acting as a carrot for folks that don't know better to go and try and use it, which creates problems.

OK, so how can we jettison the unstable feature here? Well first, I picked some input:

Neuromancer -- is a 1984 science fiction novel by American-Canadian writer
William Gibson.

The above program outputs:

"Neuromancer"
" "
"--"
" "
"is"
" "
"a"
" "
"1984"
" "
"science"
" "
"fiction"
" "
"novel"
" "
"by"
" "
"American-Canadian"
" "
"writer"
"\n"
"William"
" "
"Gibson"
".\n"

So how would I re-write your program without using unstable features in a way that produces the same output and otherwise preserves your iterator structure? I think this is what I'd do:

use std::{fs, str};

use regex::{self, Regex};

struct RegexInclusiveSplit<'r, 't> {
    text: &'t str,
    it: regex::Matches<'r, 't>,
    last: Option<regex::Match<'t>>,
    reported: usize,
}

impl<'r, 't> RegexInclusiveSplit<'r, 't> {
    fn new(text: &'t str, re: &'r Regex) -> RegexInclusiveSplit<'r, 't> {
        let it = re.find_iter(text);
        RegexInclusiveSplit { text, it, last: None, reported: 0 }
    }
}

impl<'r, 't> Iterator for RegexInclusiveSplit<'r, 't> {
    type Item = &'t str;

    fn next(&mut self) -> Option<&'t str> {
        if let Some(last) = self.last.take() {
            self.reported = last.end();
            Some(last.as_str())
        } else {
            let token = match self.it.next() {
                Some(token) => token,
                None if self.reported >= self.text.len() => return None,
                None => {
                    let remaining = &self.text[self.reported..];
                    self.reported = self.text.len();
                    return Some(remaining);
                }
            };
            // This handles the case when a token starts immediately after
            // another token ends.
            if self.reported == token.start() {
                self.reported = token.end();
                Some(token.as_str())
            } else {
                // Report the text between the last thing reported and the
                // start of this token, and then stuff the token away to report
                // on the next iteration.
                let report = &self.text[self.reported..token.start()];
                self.reported = token.start();
                self.last = Some(token);
                Some(report)
            }
        }
    }
}

fn main() {
    let input_text = fs::read_to_string("neuromancer.txt").unwrap();
    let token_delimiter_re = Regex::new(r"(([\.,:]?( |\t|\n)+)|--)").unwrap();

    for token in RegexInclusiveSplit::new(&input_text, &token_delimiter_re) {
        println!("{:?}", token);
    }
}

With that said, I believe there is a simpler approach here. The idea is to write a regex that matches all text, and use capturing groups to determine whether a delimiter was matched or if a token was matched. This way, you aren't trying to track a bunch of state. Here's my crack at that:

use std::{fs, str};

use regex::{self, Regex};

struct RegexInclusiveSplit<'r, 't> {
    it: regex::CaptureMatches<'r, 't>,
}

impl<'r, 't> RegexInclusiveSplit<'r, 't> {
    fn new(text: &'t str, re: &'r Regex) -> RegexInclusiveSplit<'r, 't> {
        let it = re.captures_iter(text);
        RegexInclusiveSplit { it }
    }
}

impl<'r, 't> Iterator for RegexInclusiveSplit<'r, 't> {
    type Item = &'t str;

    fn next(&mut self) -> Option<&'t str> {
        let caps = self.it.next()?;
        if let Some(m) = caps.name("delimiter") {
            Some(m.as_str())
        } else {
            // This unwrap is okay, because if 'delimiter' does not match,
            // then it must follow that 'token' matches if the overall regex
            // matches.
            Some(caps.name("token").unwrap().as_str())
        }
    }
}

fn main() {
    let input_text = fs::read_to_string("neuromancer.txt").unwrap();
    let token_delimiter_re = Regex::new(
        r"(?P<delimiter>([\.,:]?( |\t|\n)+)|--)|(?P<token>\w+)",
    ).unwrap();

    for token in RegexInclusiveSplit::new(&input_text, &token_delimiter_re) {
        println!("{:?}", token);
    }
}

So, a lot simpler. The downside is that you also have to write a regex for matching your tokens, which may or may not be feasible.

2 replies

nicolaslazo Feb 3, 2022
Author

Wow, I'm just blown away by the detail of your answer!

Yeah I wasn't familiar enough with the language to consider the possibility that something else might be happening in the background, plus I thought that signatures mentioning variables with an & meant that they'd take a reference but could also accept ownership of that variable. I need to give the basics a through second look and then compare all 4 versions of the code in this thread.

In the meanwhile, I didn't see any ways to donate to sponsor your repos so I'll donate to Rust projects to give back to the community. Thanks for your time!

BurntSushi Feb 3, 2022
Maintainer

Thanks! This is generally where I point folks if they're interested in contributing: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#donations

Of course, supporting Rust projects is also great and fine by me as well! Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cargo build error: required because of the requirements on the impl of `Pattern<'a>` for `regex::Regex` #834

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Cargo build error: required because of the requirements on the impl of Pattern<'a> for regex::Regex #834

Uh oh!

nicolaslazo Jan 30, 2022

Replies: 2 comments · 4 replies

Uh oh!

BurntSushi Jan 30, 2022 Maintainer

Uh oh!

nicolaslazo Jan 30, 2022 Author

Uh oh!

nicolaslazo Feb 1, 2022 Author

Uh oh!

BurntSushi Feb 2, 2022 Maintainer

Uh oh!

nicolaslazo Feb 3, 2022 Author

Uh oh!

BurntSushi Feb 3, 2022 Maintainer

Cargo build error: required because of the requirements on the impl of `Pattern<'a>` for `regex::Regex` #834

nicolaslazo
Jan 30, 2022

Replies: 2 comments 4 replies

BurntSushi
Jan 30, 2022
Maintainer

nicolaslazo Jan 30, 2022
Author

nicolaslazo Feb 1, 2022
Author

BurntSushi
Feb 2, 2022
Maintainer

nicolaslazo Feb 3, 2022
Author

BurntSushi Feb 3, 2022
Maintainer