-
Notifications
You must be signed in to change notification settings - Fork 286
feat: add join
method to Url
class
#1378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Meetesh-Saini
wants to merge
5
commits into
pydantic:main
Choose a base branch
from
Meetesh-Saini:dev-url-join
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
a7d9351
feat: add `join` method to `Url` class
Meetesh-Saini 7ef57ba
Merge branch 'pydantic:main' into dev-url-join
Meetesh-Saini e8bd322
refactor(url): Update URL join function and add corresponding tests
Meetesh-Saini 8b70975
refactor: update url `join` method implementation and function signature
Meetesh-Saini 6a4fa06
Merge branch 'pydantic:main' into dev-url-join
Meetesh-Saini File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,10 @@ | |
|
||
from ..conftest import Err, PyAndJson | ||
|
||
SIMPLE_BASE = 'http://a/b/c/d' | ||
QUERY_BASE = 'http://a/b/c/d;p?q' | ||
QUERY_FRAGMENT_BASE = 'http://a/b/c/d;p?q#f' | ||
|
||
|
||
def test_url_ok(py_and_json: PyAndJson): | ||
v = py_and_json(core_schema.url_schema()) | ||
|
@@ -1305,3 +1309,150 @@ def test_url_build() -> None: | |
) | ||
assert url == Url('postgresql://testuser:[email protected]:5432/database?sslmode=require#test') | ||
assert str(url) == 'postgresql://testuser:[email protected]:5432/database?sslmode=require#test' | ||
|
||
|
||
@pytest.mark.parametrize( | ||
'base_url,join_path,expected_with_slash,expected_without_slash', | ||
[ | ||
# Tests are based on the URL specification from https://url.spec.whatwg.org/ | ||
# Joining empty path with or without trailing slash should not affect the base url. | ||
('http://example.com/', '', 'http://example.com/', 'http://example.com/'), | ||
('svn://pathtorepo/dir1', 'dir2', 'svn://pathtorepo/dir2/', 'svn://pathtorepo/dir2'), | ||
('svn+ssh://pathtorepo/dir1', 'dir2', 'svn+ssh://pathtorepo/dir2/', 'svn+ssh://pathtorepo/dir2'), | ||
('ws://a/b', 'g', 'ws://a/g/', 'ws://a/g'), | ||
('wss://a/b', 'g', 'wss://a/g/', 'wss://a/g'), | ||
('http://a/b/c/de', ';x', 'http://a/b/c/;x/', 'http://a/b/c/;x'), | ||
# Non-RFC-defined tests, covering variations of base and trailing | ||
# slashes | ||
('http://a/b/c/d/e/', '../../f/g/', 'http://a/b/c/f/g/', 'http://a/b/c/f/g/'), | ||
('http://a/b/c/d/e', '../../f/g/', 'http://a/b/f/g/', 'http://a/b/f/g/'), | ||
('http://a/b/c/d/e/', '/../../f/g/', 'http://a/f/g/', 'http://a/f/g/'), | ||
('http://a/b/c/d/e', '/../../f/g/', 'http://a/f/g/', 'http://a/f/g/'), | ||
('http://a/b/c/d/e/', '../../f/g', 'http://a/b/c/f/g/', 'http://a/b/c/f/g'), | ||
('http://a/b/', '../../f/g/', 'http://a/f/g/', 'http://a/f/g/'), | ||
(SIMPLE_BASE, 'g:h', 'g:h/', 'g:h'), | ||
(SIMPLE_BASE, 'g', 'http://a/b/c/g/', 'http://a/b/c/g'), | ||
(SIMPLE_BASE, './g', 'http://a/b/c/g/', 'http://a/b/c/g'), | ||
(SIMPLE_BASE, 'g/', 'http://a/b/c/g/', 'http://a/b/c/g/'), | ||
(SIMPLE_BASE, '/g', 'http://a/g/', 'http://a/g'), | ||
(SIMPLE_BASE, '//g', 'http://g/', 'http://g/'), | ||
(SIMPLE_BASE, '?y', 'http://a/b/c/d?y', 'http://a/b/c/d?y'), | ||
(SIMPLE_BASE, 'g?y', 'http://a/b/c/g?y', 'http://a/b/c/g?y'), | ||
(SIMPLE_BASE, 'g?y/./x', 'http://a/b/c/g?y/./x', 'http://a/b/c/g?y/./x'), | ||
(SIMPLE_BASE, '.', 'http://a/b/c/', 'http://a/b/c/'), | ||
(SIMPLE_BASE, './', 'http://a/b/c/', 'http://a/b/c/'), | ||
(SIMPLE_BASE, '..', 'http://a/b/', 'http://a/b/'), | ||
(SIMPLE_BASE, '../', 'http://a/b/', 'http://a/b/'), | ||
(SIMPLE_BASE, '../g', 'http://a/b/g/', 'http://a/b/g'), | ||
(SIMPLE_BASE, '../..', 'http://a/', 'http://a/'), | ||
(SIMPLE_BASE, '../../g', 'http://a/g/', 'http://a/g'), | ||
(SIMPLE_BASE, './../g', 'http://a/b/g/', 'http://a/b/g'), | ||
(SIMPLE_BASE, './g/.', 'http://a/b/c/g/', 'http://a/b/c/g/'), | ||
(SIMPLE_BASE, 'g/./h', 'http://a/b/c/g/h/', 'http://a/b/c/g/h'), | ||
(SIMPLE_BASE, 'g/../h', 'http://a/b/c/h/', 'http://a/b/c/h'), | ||
(SIMPLE_BASE, 'http:g', 'http://a/b/c/g/', 'http://a/b/c/g'), | ||
(SIMPLE_BASE, 'http:g?y', 'http://a/b/c/g?y', 'http://a/b/c/g?y'), | ||
(SIMPLE_BASE, 'http:g?y/./x', 'http://a/b/c/g?y/./x', 'http://a/b/c/g?y/./x'), | ||
(SIMPLE_BASE + '/', 'foo', SIMPLE_BASE + '/foo/', SIMPLE_BASE + '/foo'), | ||
(QUERY_BASE, '?y', 'http://a/b/c/d;p?y', 'http://a/b/c/d;p?y'), | ||
(QUERY_BASE, ';x', 'http://a/b/c/;x/', 'http://a/b/c/;x'), | ||
(QUERY_BASE, 'g:h', 'g:h/', 'g:h'), | ||
(QUERY_BASE, 'g', 'http://a/b/c/g/', 'http://a/b/c/g'), | ||
(QUERY_BASE, './g', 'http://a/b/c/g/', 'http://a/b/c/g'), | ||
(QUERY_BASE, 'g/', 'http://a/b/c/g/', 'http://a/b/c/g/'), | ||
(QUERY_BASE, '/g', 'http://a/g/', 'http://a/g'), | ||
(QUERY_BASE, '//g', 'http://g/', 'http://g/'), | ||
(QUERY_BASE, '?y', 'http://a/b/c/d;p?y', 'http://a/b/c/d;p?y'), | ||
(QUERY_BASE, 'g?y', 'http://a/b/c/g?y', 'http://a/b/c/g?y'), | ||
(QUERY_BASE, '#s', 'http://a/b/c/d;p?q#s', 'http://a/b/c/d;p?q#s'), | ||
(QUERY_BASE, 'g#s', 'http://a/b/c/g#s', 'http://a/b/c/g#s'), | ||
(QUERY_BASE, 'g?y#s', 'http://a/b/c/g?y#s', 'http://a/b/c/g?y#s'), | ||
(QUERY_BASE, ';x', 'http://a/b/c/;x/', 'http://a/b/c/;x'), | ||
(QUERY_BASE, 'g;x', 'http://a/b/c/g;x/', 'http://a/b/c/g;x'), | ||
(QUERY_BASE, 'g;x?y#s', 'http://a/b/c/g;x?y#s', 'http://a/b/c/g;x?y#s'), | ||
(QUERY_BASE, '', 'http://a/b/c/d;p?q', 'http://a/b/c/d;p?q'), | ||
(QUERY_BASE, '.', 'http://a/b/c/', 'http://a/b/c/'), | ||
(QUERY_BASE, './', 'http://a/b/c/', 'http://a/b/c/'), | ||
(QUERY_BASE, '..', 'http://a/b/', 'http://a/b/'), | ||
(QUERY_BASE, '../', 'http://a/b/', 'http://a/b/'), | ||
(QUERY_BASE, '../g', 'http://a/b/g/', 'http://a/b/g'), | ||
(QUERY_BASE, '../..', 'http://a/', 'http://a/'), | ||
(QUERY_BASE, '../../', 'http://a/', 'http://a/'), | ||
(QUERY_BASE, '../../g', 'http://a/g/', 'http://a/g'), | ||
(QUERY_BASE, '../../../g', 'http://a/g/', 'http://a/g'), | ||
# Abnormal Examples | ||
(QUERY_BASE, '../../../g', 'http://a/g/', 'http://a/g'), | ||
(QUERY_BASE, '../../../../g', 'http://a/g/', 'http://a/g'), | ||
(QUERY_BASE, '/./g', 'http://a/g/', 'http://a/g'), | ||
(QUERY_BASE, '/../g', 'http://a/g/', 'http://a/g'), | ||
(QUERY_BASE, 'g.', 'http://a/b/c/g./', 'http://a/b/c/g.'), | ||
(QUERY_BASE, '.g', 'http://a/b/c/.g/', 'http://a/b/c/.g'), | ||
(QUERY_BASE, 'g..', 'http://a/b/c/g../', 'http://a/b/c/g..'), | ||
(QUERY_BASE, '..g', 'http://a/b/c/..g/', 'http://a/b/c/..g'), | ||
(QUERY_BASE, './../g', 'http://a/b/g/', 'http://a/b/g'), | ||
(QUERY_BASE, './g/.', 'http://a/b/c/g/', 'http://a/b/c/g/'), | ||
(QUERY_BASE, 'g/./h', 'http://a/b/c/g/h/', 'http://a/b/c/g/h'), | ||
(QUERY_BASE, 'g/../h', 'http://a/b/c/h/', 'http://a/b/c/h'), | ||
(QUERY_BASE, 'g;x=1/./y', 'http://a/b/c/g;x=1/y/', 'http://a/b/c/g;x=1/y'), | ||
(QUERY_BASE, 'g;x=1/../y', 'http://a/b/c/y/', 'http://a/b/c/y'), | ||
(QUERY_BASE, 'g?y/./x', 'http://a/b/c/g?y/./x', 'http://a/b/c/g?y/./x'), | ||
(QUERY_BASE, 'g?y/../x', 'http://a/b/c/g?y/../x', 'http://a/b/c/g?y/../x'), | ||
(QUERY_BASE, 'g#s/./x', 'http://a/b/c/g#s/./x', 'http://a/b/c/g#s/./x'), | ||
(QUERY_BASE, 'g#s/../x', 'http://a/b/c/g#s/../x', 'http://a/b/c/g#s/../x'), | ||
(QUERY_BASE, 'http:g', 'http://a/b/c/g/', 'http://a/b/c/g'), | ||
# Test with empty (but defined) components. | ||
(QUERY_FRAGMENT_BASE, '', 'http://a/b/c/d;p?q', 'http://a/b/c/d;p?q'), | ||
(QUERY_FRAGMENT_BASE, '#', 'http://a/b/c/d;p?q#', 'http://a/b/c/d;p?q#'), | ||
(QUERY_FRAGMENT_BASE, '#z', 'http://a/b/c/d;p?q#z', 'http://a/b/c/d;p?q#z'), | ||
(QUERY_FRAGMENT_BASE, '?', 'http://a/b/c/d;p?', 'http://a/b/c/d;p?'), | ||
(QUERY_FRAGMENT_BASE, '?#z', 'http://a/b/c/d;p?#z', 'http://a/b/c/d;p?#z'), | ||
(QUERY_FRAGMENT_BASE, '?y', 'http://a/b/c/d;p?y', 'http://a/b/c/d;p?y'), | ||
(QUERY_FRAGMENT_BASE, ';', 'http://a/b/c/;/', 'http://a/b/c/;'), | ||
(QUERY_FRAGMENT_BASE, ';?y', 'http://a/b/c/;?y', 'http://a/b/c/;?y'), | ||
(QUERY_FRAGMENT_BASE, ';#z', 'http://a/b/c/;#z', 'http://a/b/c/;#z'), | ||
(QUERY_FRAGMENT_BASE, ';x', 'http://a/b/c/;x/', 'http://a/b/c/;x'), | ||
(QUERY_FRAGMENT_BASE, '/w', 'http://a/w/', 'http://a/w'), | ||
(QUERY_FRAGMENT_BASE, '//;x', 'http://;x/', 'http://;x/'), | ||
(QUERY_FRAGMENT_BASE, '//v', 'http://v/', 'http://v/'), | ||
# For backward compatibility with RFC1630, the scheme name is allowed | ||
# to be present in a relative reference if it is the same as the base | ||
# URI scheme. | ||
(QUERY_FRAGMENT_BASE, 'http:', 'http://a/b/c/d;p?q', 'http://a/b/c/d;p?q'), | ||
(QUERY_FRAGMENT_BASE, 'http:#', 'http://a/b/c/d;p?q#', 'http://a/b/c/d;p?q#'), | ||
(QUERY_FRAGMENT_BASE, 'http:#z', 'http://a/b/c/d;p?q#z', 'http://a/b/c/d;p?q#z'), | ||
(QUERY_FRAGMENT_BASE, 'http:?', 'http://a/b/c/d;p?', 'http://a/b/c/d;p?'), | ||
(QUERY_FRAGMENT_BASE, 'http:?#z', 'http://a/b/c/d;p?#z', 'http://a/b/c/d;p?#z'), | ||
(QUERY_FRAGMENT_BASE, 'http:?y', 'http://a/b/c/d;p?y', 'http://a/b/c/d;p?y'), | ||
(QUERY_FRAGMENT_BASE, 'http:;', 'http://a/b/c/;/', 'http://a/b/c/;'), | ||
(QUERY_FRAGMENT_BASE, 'http:;?y', 'http://a/b/c/;?y', 'http://a/b/c/;?y'), | ||
(QUERY_FRAGMENT_BASE, 'http:;#z', 'http://a/b/c/;#z', 'http://a/b/c/;#z'), | ||
(QUERY_FRAGMENT_BASE, 'http:;x', 'http://a/b/c/;x/', 'http://a/b/c/;x'), | ||
(QUERY_FRAGMENT_BASE, 'http:/w', 'http://a/w/', 'http://a/w'), | ||
(QUERY_FRAGMENT_BASE, 'http://;x', 'http://;x/', 'http://;x/'), | ||
(QUERY_FRAGMENT_BASE, 'http:///w', 'http://w/', 'http://w/'), | ||
(QUERY_FRAGMENT_BASE, 'http://v', 'http://v/', 'http://v/'), | ||
# Different scheme is not ignored. | ||
(QUERY_FRAGMENT_BASE, 'https:;', 'https://;/', 'https://;/'), | ||
(QUERY_FRAGMENT_BASE, 'https:;x', 'https://;x/', 'https://;x/'), | ||
], | ||
) | ||
def test_url_join(base_url, join_path, expected_with_slash, expected_without_slash) -> None: | ||
"""Tests are based on | ||
https://github.com/python/cpython/blob/3a0e7f57628466aedcaaf6c5ff7c8224f5155a2c/Lib/test/test_urlparse.py | ||
and the URL specification from https://url.spec.whatwg.org/ | ||
""" | ||
url = Url(base_url) | ||
assert str(url.join(join_path, append_trailing_slash=True)) == expected_with_slash | ||
assert str(url.join(join_path, append_trailing_slash=False)) == expected_without_slash | ||
|
||
|
||
def test_url_join_operators() -> None: | ||
url = Url('http://a/b/c/d') | ||
assert str(url / 'e' / 'f') == 'http://a/b/c/e/f/' | ||
assert str(url / 'e' // 'f') == 'http://a/b/c/e/f' | ||
assert str(url // 'e' // 'f') == 'http://a/b/c/f' | ||
assert str(url / 'e' / '?x=1') == 'http://a/b/c/e/?x=1' | ||
assert str(url / 'e' / '?x=1' / '#y') == 'http://a/b/c/e/?x=1#y' | ||
assert str(url / 'e' / '?x=1' // '#y') == 'http://a/b/c/e/?x=1#y' | ||
assert str(url / 'e' // '?x=1' / '#y') == 'http://a/b/c/e/?x=1#y' | ||
assert str(url // 'e' / '?x=1' / '#y') == 'http://a/b/c/e?x=1#y' |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sorry I missed these in the last round of review. I think the difference between the
/
and//
operators here is subtle and hard to document.I think better we just have
/
, and make it so that it matches the default ofappend_trailing_slash=False
. This will also simplify testing, I think.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
__floordiv__
can be removed but I feel the__truediv__
should haveappend_trailing_slash=True
because this overloaded operator would likely be used to join multiple paths in shorter code. This behaviour would feel familiar to Python users, as it resemblespathlib
's path joining.For example,
With
append_trailing_slash=False
it would instead result inhttp://a/d
andfile:///home/user/pop
which I think is not what the user would expect.I chose to add
__floordiv__
too because it would simplify adding files at the end.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Yikes, there are so many subleties here!
It seems to me that our
.join()
method really works likeurllib.parse.urljoin
when it comes to semantics, e.g.versus pathlib's
Given these are inconsistent, I think we should perhaps back away from trying to have pathlib-like semantics at all.
Would you be open to the idea of dropping the operators from the PR completely, so we can get
.join()
merged? We could then open apydantic
issue to discuss the design of the operators and move forward with an implementation when there's consensus?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we could also have
joinpath()
which works likePathlib
and doesn't accept query string or fragments as the whole input?And then could have
/
operator work likejoinpath
? 🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
joinpath()
would certainly make things cleaner. Should I implementjoinpath()
in this PR, or should we drop the operators for now and discuss it in the issues instead?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. I think I'd prefer we just had
.join()
here and worried about.joinpath()
and the operators later. That said, there's potentially a desire to agree a sketch of the follow ups here. @pydantic/oss - any ideas?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think without comment from anyone else, let's just do
.join()
here and then follow-up with an issue in the mainpydantic
repo where we can discuss.joinpath()
and operators.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be great to have more time to discuss the semantics (does it need to match
urllib
? What about other libraries likefurl
? Should be double check with the current RFCs? We should also check what was said in this discussion).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really sorry for the late response. The main URL joining part is handled by rust-url's join method which implements the WHATWG URL spec. So, the
/
operator semantics are handled by the libraries in our case, python libraries likeurllib
andfurl
. Whilefurl
provides/
operator,urllib
does not.urllib.parse.urljoin
andfurl.furl.join
follows the RFC 3986 to resolve the new URL.furl
is using/
operator for only adding path to URL like pathlib.Path.I think the
furl
approach is good. I should not have written my function signature assignature=(path, append_trailing_slash=false)
but rather assignature=(url, append_trailing_slash=false)
because the argument can be a relative or absolute url, not just path.IMO, we can have
Url.join()
for URL joining, similar tofurl.furl.join
, andUrl.__truediv__
for the sole purpose of adding a path without a trailing slash, just likefurl.furl.__truediv__
.