Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would you be open to a feature to allow passing regex captures onto handlers #19

Open
8176135 opened this issue Jul 17, 2019 · 14 comments · May be fixed by #20
Open

Would you be open to a feature to allow passing regex captures onto handlers #19

8176135 opened this issue Jul 17, 2019 · 14 comments · May be fixed by #20

Comments

@8176135
Copy link

8176135 commented Jul 17, 2019

I think it would be pretty nice to be able to do regex captures and have the arguments passed onto the handler. This would need to be on a separate struct from Route since captures cost more performance than just matches.
E.g.

RouterBuilder::new()
        .add(RouteWithCaptures::get("/hello/(?P<name>[^/]*)/?").using(request_handler))
...
fn request_handler(a: Request<Body>, args: HashMap<String,String>) {
...
}

This could also be a stepping stone for the

Todo: add the ability to distinguish requests by query parameters.

Would you be open to a PR implementing this?

@8176135
Copy link
Author

8176135 commented Jul 17, 2019

Well actually I just tried to do it, while the concept is not difficult (just collect the regex captures into a hashmap, then pass it into the handler), unless there is some generic trick I can't think of, it requires duplicating a lot of code if we want to have it in a separate type.

@marad
Copy link
Owner

marad commented Jul 23, 2019

Sure I'd be open to this contribution :)
In fact I planned doing this long time ago, but I didn't have enough Rust skills back then, and now my interests shifted, so... yeah PR would be great! Thanks!

@akatechis
Copy link

Hi all,
Is there a PR currently being developed? I decided to take a crack at it, and have something basic working. It's not very great, very basic, but gets the job done, and it's roughly at the level that you would expect a hyper router to be, eg. it doesn't try to do too many clever things with types, parsing, validation, etc, letting the user of the library handle that on their own.

fn greeting_for_name (input: Vec<String>) -> String {
    return if input[1].len() == 0 {
        "Hello, stranger!".to_string()
    } else {
        format!("Hello, {}!", input[1])
    }
}

fn greet_handler(_: Request<Body>, input: Vec<String>) -> Response<Body> {
    let greeting = greeting_for_name(input);
    Response::builder()
        .header(CONTENT_LENGTH, greeting.len() as u64)
        .header(CONTENT_TYPE, "text/plain")
        .body(Body::from(greeting))
        .expect("Failed to construct the response")
}

fn router_service() -> Result<RouterService, std::io::Error> {
    let router = RouterBuilder::new()
        .add(Route::get("/hello").using(request_handler))
        .add(Route::from(Method::PATCH, "/world").using(request_handler))
        .add(Route::get("/greet/to(.*)").using(greet_handler))
        .build();

    Ok(RouterService::new(router))
}

akatechis@f33abf1

@8176135
Copy link
Author

8176135 commented Jul 26, 2019

@akatechis I have been working a PR, but I haven't managed to get code duplication down to a point I like yet.

The problem as I mentioned above is that I didn't want to slot in the captures into the current structs, since parsing captures aren't free, especially if we are going to collect all the captures into a new Vec/Hashmap. Though I'm not sure if I am preoptimizing here.

@marad
Copy link
Owner

marad commented Jul 26, 2019

@akatechis Thanks for also looking into that!

@akatechis @8176135 Do you guys have your codes somwhere, so I can look at them? Maybe I could suggest something, or maybe we could work something out together?

@8176135
Copy link
Author

8176135 commented Jul 26, 2019

My current commit is here: 8176135/hyper-router@51db765

I tried to move as much stuff into traits as possible, so that there is minimal code duplication between the structs, not sure if this is the correct method though.

Still working on getting the router working for both, feel like I have to duplicate and trait that too.

Though if someone didn't put any captures into their regex, I'm not sure how much the performance is going to decrease if we just look for it anyway, so I might be over optimizing here. (Actually I am going to benchmark that right now)

@8176135
Copy link
Author

8176135 commented Jul 26, 2019

After some crude benchmarking, it seems like capture then creating a Vec runs about 6 times slower than is_match, even when there isn't a capture group in the regex. (without creating a Vec is about 3 times slower, but we have to create the Vec or Hashmap anyway)

Test I used:

fn main() {
	let regex = regex::Regex::new("https://a.*b[^assw]+").unwrap();
	for _ in 0..10000 {
		let a = regex.is_match("https://");
		let a = regex.is_match("https://asdsadwdwb");
		let a = regex.is_match("https://asdsadwdwbs");
		let a = regex.is_match("https://asdsadwdwbddd");
		let a = regex.is_match("https://asdsadwdwbddds");
		let b = regex.is_match("bttps://a");
	}

	let start = std::time::Instant::now();

	for _ in 0..10000 {
		let a = regex.captures("https://").map(|c| c.iter().map(|c| c.unwrap().as_str().to_owned()).collect::<Vec<String>>()).unwrap_or(Vec::new());
		let a = regex.captures("https://asdsadwdwb").map(|c| c.iter().map(|c| c.unwrap().as_str().to_owned()).collect::<Vec<String>>()).unwrap_or(Vec::new());
		let a = regex.captures("https://asdsadwdwbs").map(|c| c.iter().map(|c| c.unwrap().as_str().to_owned()).collect::<Vec<String>>()).unwrap_or(Vec::new());
		let a = regex.captures("https://asdsadwdwbddd").map(|c| c.iter().map(|c| c.unwrap().as_str().to_owned()).collect::<Vec<String>>()).unwrap_or(Vec::new());
		let a = regex.captures("https://asdsadwdwbddds").map(|c| c.iter().map(|c| c.unwrap().as_str().to_owned()).collect::<Vec<String>>()).unwrap_or(Vec::new());
		let b = regex.captures("bttps://a").map(|c| c.iter().map(|c| c.unwrap().as_str().to_owned()).collect::<Vec<String>>()).unwrap_or(Vec::new());
	}

	dbg!(std::time::Instant::now().duration_since(start));


	let start = std::time::Instant::now();

	for _ in 0..10000 {
		let a = regex.is_match("https://");
		let a = regex.is_match("https://asdsadwdwb");
		let a = regex.is_match("https://asdsadwdwbs");
		let a = regex.is_match("https://asdsadwdwbddd");
		let a = regex.is_match("https://asdsadwdwbddds");
		let b = regex.is_match("bttps://a");
	}

	dbg!(std::time::Instant::now().duration_since(start));
}

@marad
Copy link
Owner

marad commented Jul 26, 2019

Hmm, looking at this I started wondering if regex paths is what hyper router needs. What is the basic need that we are trying to fulfill?

I suspect that the most important thing is to be able to capture some parts of the URL. Maybe it would be faster to create a custom parser for URLs like this:

/orders/{id}/status

I imagine this would be significantly faster since it does not require regex matching, and I think it would address most needs regarding URLs. Also would be easier to use since... there would be no RegExes :)

What are some real-life use cases for using the regex power to match URLs?

@akatechis
Copy link

I dont think there's much of a use case for regex specifically, I think I was just using regex because it was the shortest path to get param capture working. Alternatives are node.js libraries like express/connect which use /orders/:id/status and python's flask which uses /orders/<id>/status.

@akatechis
Copy link

I hope this isn't too off-topic, but I sat down to try out some different kinds of APIs to describe router params and I was wondering if the following might be even easier to use. It's closer to a RESTful interface, so I don't know if the concept of a Resource might be a good abstraction:

fn router_service() -> Result<RouterService, std::io::Error> {
    let router = RouterBuilder::new()
        .add(Resource::new("/user/:id")
            .get(request_handler)
            .post(request_handler))
        .build();

    Ok(RouterService::new(router))
}

Internally, the Resource struct could have Option<Handler> for each of the methods, rather than a Vec, so we can avoid the extra iteration through the list of handlers for each method.

@marad
Copy link
Owner

marad commented Jul 27, 2019

@akatechis I'd rather not use REST nomenclature since there is nothing RESTful about the hyper-router and I like it that way.

I think that this is outside of the scope of this ticket. Changing the basic API that everyone uses should be considered very carefully.

As for the exact form of the path variables the most natural for me is with using {id} since I come from Java world. But I think the express way (:id)is even simpler to parse.

@akatechis
Copy link

Agreed on both issues. I'll try preparing something sometime today :)

@akatechis akatechis linked a pull request Jul 28, 2019 that will close this issue
@akatechis
Copy link

I've opened PR #20 for this.

@marad
Copy link
Owner

marad commented Jul 30, 2019

Thanks @akatechis! I'll take a look at this PR during the week or even weekend (depends on workload)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants