-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match absolute and relative paths with and without leading slash #57
Conversation
db7ae9b
to
642dcf9
Compare
} | ||
} | ||
Err(_) => { | ||
let url_leading = format!("/{}", uri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit ugly, I'm open to better ideas. Without this, path/to/file.css
will not be matched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine. I've seen this issue of the missing leading slash many times, and (to me) it makes sense to handle it in this way.
src/lib.rs
Outdated
const HTML: &str = r#"<!DOCTYPE html> | ||
<html> | ||
<body> | ||
<!-- 3 valid CSS paths --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there additional valid or invalid test cases we should be testing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe let's add a query argument to a css/js path.
<link href="http://example.com/example.css?version=abc123" rel="stylesheet" />
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent suggestion, I added a couple of different examples with query parameters. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, but I called out one thing to consider below (just in case it's relevant).
if parsed_host == user.base_url.host_str().unwrap() { | ||
// The URI host matches the base_url. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only note here is that I suspect this logic may result in a change in behavior when the base_url
includes a path. I don't know if that is a supported usage necessarily, but these tests look like it may be?
Consider a case where base_url
is something like http://example.com/my/base/
and you have two assets:
- A:
http://example.com/foo.js
- B:
http://example.com/my/base/bar.js
I think the prior behavior would have matched case B but not case A because it included the base_url
's path in the regexp.
However, the new behavior here would match both case A and case B because the Host
portion matches and the base_url
's path is not checked.
Personally, I think this is reasonable and don't consider it an issue / don't want to complicate things further, just highlighting this in case there could be an expectation that only resources under the base URL's path are loaded. :)
Disclaimer: I have only visually inspected this and not tested or verified this thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mixed as if this should be fixed and/or is a bug. I could see this being a problem in a multi-site installation, where multiple sites are served from different sub-directories. But in this case, any common files included from the base installation etc likely should be included in a load test. Is there another use-case where this may happen, where it would generally be considered invalid to include the files?
I'm going to merge as-is for now, and we can revisit this if it proves problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a problem and agree with your assessment, just noting that it's a change. :)
http
crate to validate uris scraped from page