Percent encoding |
in paths
#3479
Unanswered
nathaniel-daniel
asked this question in
Potential Issue
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
OS: Windows 11
python --version
:Python 3.12.8
httpx
version:0.28.1
I believe the
|
should be percent encoded in paths, which is not currently the case. If I'm understanding RFC3986 correctly, path characters arepchar
, which can beunreserved
,pct-encoded
,sub-delims
,":"
, or"@"
.unreserved
can be composed ofALPHA
,DIGIT
,"-"
,"."
,"_"
, or"~"
.pct-encoded
is the percent encoding sequences.sub-delims
can be"!"
,"$"
,"&"
,"'"
,"("
,")"
,"*"
,"+"
,","
,";"
, or"="
. Nowhere in this set is the|
character present, meaning it has to be percent-encoded.Simplifying my problem,
httpx
seems to call its internalurlparse
function to process urls. So, here's an example using that function. This function normally percent-encodes characters as needed, like spaces:will return
However, this does not happen for
|
:will return
In Firefox and Google Chrome,
|
is percent-encoded:will return
In the
requests
library,|
is also percent-encoded:will return
The
rfc3986
library also percent encodes|
:will return
Using
urllib
itself,|
also seems to be percent-encoded for path components:will return
'/%7C'
I'm fairly certain that I've interpreted this RFC right, and I think that
|
should be excluded from thePATH_SAFE
set here. Here is its current value:"!$%&'()*+,-./0123456789:;=@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_abcdefghijklmnopqrstuvwxyz|~"
.Potential Fix: nathaniel-daniel@a2f327f
Beta Was this translation helpful? Give feedback.
All reactions