-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode handling in header location #110
Comments
If possible, can you submit a PR which addresses your issue? |
#111 was already submitted, but closed by the submitter as not backwards compatible. I don't think there is a backwards compatible way to implement what is being requested (automatically escape location), as it would break cases where valid, already percent-escaped location are passed. I think the current expectation that the location provided is a valid URL/path is reasonable. |
Okay, so the solution is to close this issue without any fix? i.e. working as expected? |
Possibly we should update the documentation to mention this, if it isn't already mentioned (I haven't checked). I just closed a similar issue in Roda filed by the same submitter with a documentation fix. |
That would be possible to detect is it's already encoded or not to avoid re-encoding.
I don't think it is, because the user can provide a URL-encoded string that will be decoded on the fly by something on the way (web browser, reverse proxy, http server) to the application server that will receive URL decoded string. |
I had experienced a web app made with roda where the user provided a URL with the unicode URL-encoded and got this error while the http server was webrick and nt error while the http server was puma. It's clearly possible to support Unicode. I don't know if the best way is workarounding (encoding/escaping) URI, using something more modern like addressable or patching URI upstream. |
Trying to detect valid encoding and reencoding if not valid is prone to security issues. For example:
In general it's a bad idea for library code to make guesses as to whether to encode. It should always work in the same way. It's simpler and backwards compatible to assume the location is always already properly encoded. While we could add an option to toggle the behavior, I think it's better to document the expected behavior. Users can and should make sure the URL they are passing as the location is a valid URL. In the example you are providing, you appear to be directly passing user (attacker) provided input as the location, without any validation. This seems like a very risky security practice, and not something that we should make changes to accommodate. You should be taking the URL the user is providing and appropriately validate it (e.g. construct a valid URL from it), before using it as the location. If you are getting a user-provided value that you want to escape to use in part of the URL, you probably want to use |
I can't disagree with that re-encoding or detection is bad. Yet nowadays any modern software should handle Unicode properly. The issue here is upstream because ruby/uri supports only old RFC 2396 and RFC 3986 and not RFC 3987 and RFC 6570. So I see two options left:
Abstract of IRI
Overview and Motivation of IRI
|
I'm glad we can agree on that.
There is no way to handle Unicode properly with URIs. You need to use IRIs for that. If Ruby shipped with a library that supported IRIs, then I think we would be open to supporting IRIs in webrick (in that we could handle an IRI object and use it, not that we would treat string input as an IRI).
You can use header['location'] = Addressable::URI.parse(your_location).normalize Note that Addressable has the security problems I discussed: # Escapes
Addressable::URI.parse("http://app/foo/%").normalize.to_s
# => "http://app/foo/%25"
# Doesn't escape
Addressable::URI.parse("http://app/foo/%25").normalize.to_s
# => "http://app/foo/%25"
# Escapes some characters and not others
Addressable::URI.parse("http://app/foo/%25\u1000").normalize.to_s
#=> "http://app/foo/%25%E1%80%80" I'm not sure whether is possible to use Addressable in a secure manner, where it always treats input as decoded and encodes it (where the second path would give you a path of
I don't think URI should handle this, beyond the handling it already has: URI::DEFAULT_PARSER.escape("/foo/%25\u1000")
# =>"/foo/%2525%E1%80%80" Possibly Ruby could consider adding an IRI library. But such a library needs to be carefully designed, so it never has to guess as whether something should or should not be encoded. Maybe for partial IRI support in URI, we could consider a |
I wanted to bring that up, I have no definitive opinion about which stage is should be handled (dev using the web framework, web framework (like roda) or http server like webrick). I looked at why it is working in puma and it seems to be because they use a regexp (and that Ruby regexp are unicode aware) and not URI module. |
webrick doesn't handle Unicode in HTTP location header, eg. redirection to an URL like
http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg
.The following code is responsible:
webrick/lib/webrick/httpresponse.rb
Line 320 in 5510186
This is because methods such as
URI.parse
or hereURI.merge
only handles ASCII.So URL or fragments should be escaped first, with
CGI.escape
for URL component andURI::Parser.new.escape
for full URLs.Examples in https://github.com/noraj/ctf-party/blob/master/lib/ctf_party/cgi.rb.
cf. https://stackoverflow.com/questions/46849219/ruby-uriinvalidurierror-uri-must-be-ascii-only/75487328
patched code:
The text was updated successfully, but these errors were encountered: