-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URLs containing no host and a path starting with ./ or ../ are not serialized idempotently #601
Comments
Should this be filed in https://github.com/whatwg/url/issues ? |
Maybe! I haven't spent a lot of time verifying that the spec doesn't handle it yet. |
possibly related: I am noticing today that "../" are normalized in (to me) a strange way, ie:
I haven't checked the spec, maybe this is correct behavior. Personally I would expect a parse error for the "seems wrong" cases. edit: I checked the spec. The behavior appears "correct". I am wanting/needing a way to access the "raw" path without any such normalization. Filed #602 |
Probably a dupe of #459 Anne pointed me to whatwg/url#505 |
whatwg/url#505 just landed, tests in web-platform-tests/wpt#25113. @valenting has some patches in https://github.com/valenting/rust-url/tree/non-special-idempotent -- I will see about rebasing these onto current master. |
I've pulled in the new tests and rebased @valenting's patch on top of current master. Unfortunately the patch increases the number of failed tests by 1 rather than improving the situation, so this will still need some more work. Draft PR here: #629. Any help is appreciated! |
Yes, thank you @qsantos |
Hi, It occurs when mutating the path of a non-"cannot-be-a-base" URL without an authority. (note that use url::Url;
#[test]
fn test_can_be_a_base_with_set_path() {
let mut url = Url::parse("web+demo:/").unwrap();
assert!(!url.cannot_be_a_base());
// Set path to "//not-a-host" using `set_path`
url.set_path("//not-a-host");
// PASSES (path is correctly set):
assert_eq!(url.path(), "//not-a-host");
// PASSES (has segments):
let segments: Vec<_> = url.path_segments().expect("should have path segments").collect();
// PASSES:
assert_eq!(segments, vec!["", "not-a-host"]);
// **FAILS**
// EXPECTED: "web+demo:/.//not-a-host"
// ACTUAL: "web+demo://not-a-host"
assert_eq!(url.as_str(), "web+demo:/.//not-a-host");
}
#[test]
fn test_can_be_a_base_with_path_segments_mut() {
let mut url = Url::parse("web+demo:/").unwrap();
assert!(!url.cannot_be_a_base());
// Set path to "//not-a-host" using `path_segments_mut`
url.path_segments_mut()
.expect("should have path segments")
.push("") /* NOTE: any number of push("") here appears to make no difference (all ignored) */
.push("not-a-host");
// **FAILS**
// EXPECTED: "web+demo:/.//not-a-host"
// ACTUAL: "web+demo:/not-a-host"
assert_eq!(url.as_str(), "web+demo:/.//not-a-host");
// **FAILS**
// EXPECTED: "//not-a-host"
// ACTUAL: "/not-a-host"
assert_eq!(url.path(), "//not-a-host");
// PASSES (has segments):
let segments: Vec<_> = url.path_segments().expect("should have path segments").collect();
// **FAILS**
// EXPECTED: ["", "not-a-host"]
// ACTUAL: ["not-a-host"]
assert_eq!(segments, vec!["", "not-a-host"]);
} (sorry for multiple failing assertions in the same test case, i just thought it was clearer to communicate that way) Have I misread the spec?I am following the rules in URL Serializing section, but is this different to URL Writing? (should URL Writing be followed for mutating path)? Possibly if you follow URL Writing it would disallow setting this type of path? (I'm not super clear on this).
But it specifies that URL-path-segment strings may be empty strings, so its a little ambiguous whether "/" followed by empty path segment followed by "/" is allowed. But the fact that this kind of path is explicitly mentioned in URL Serializing makes it seem like this should be allowed. I also looked at the specification of the URL API pathname setter and it says to parse it using the "basic URL parser" starting in the "path start state", which I think should respect empty path segments and makes it seem like "//not-a-host" should be a valid path to set, and should cause the resultant URL to have "//not-a-host" as its path. What do other implementations do?I have not tested widely, just JavaScript URL class in a few runtimes i have handy var url = new URL("web+demo:/");
url.pathname = "//not-a-host";
console.log(url.href)
// Node v20.17.0 gives: 'web+demo:/.//not-a-host'
// Chromium 129.0.6668.100 gives: 'web+demo:/' Pushing empty path segmentAs well as the not serializing correctly issue, there is possibly another issue with |
@x11x Please file a new issue! It's easy to overlook new comments in old ones. |
Thanks for your reply jdm. |
Parsing a url like
scheme:/.//path
internally removes the./
in the serialization (per the spec), so parsing the serialized URL ofscheme://path
yields a URL that is parsed differently than the original one (ie. path is treated as a host now, instead of part of the URL's path).Cases to consider:
scheme:/.//path
scheme:/..//path
scheme:/./..//path
(any leading mixed collection of./
and../
)My reading of the spec is that this might be a shortcoming that needs to be addressed there, rather than just an implementation bug. I can envision a couple possible solutions:
./
or../
if it's the first component of a path, avoiding the issue entirely/
when the previous segment is empty (resulting inscheme:/path
)The text was updated successfully, but these errors were encountered: