Skip to content

Avoid panics in NgxListIterator, rename to_str to as_str where appropriate #183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

pchickey
Copy link
Contributor

@pchickey pchickey commented Jul 25, 2025

Proposed changes

ngx::http::NgxListIterator: ngx_str_t items may not be str:

The ngx_str_items in the header name and value are often untrusted
input, and may not have utf-8 contents. The use of ngx_str_t::to_str
in this iterator will panic when the contents are not utf-8. So, instead
of yielding a pair of strs here, yield a pair of byte slices.

ngx_str_t::to_str should not unwrap, and should be called as_str

In Rust, the convention is that to_ methods take ownership of self
(e.g. fn to_string(self) -> Self) whereas as_ methods borrow self
(e.g. fn as_str(&self) -> &str).

In general, we should avoid exposing methods that panic when its
reasonable to return a Result instead. This particular method was used,
likely without considering that it may panic or validating that the
contents are utf-8, in ngx::http::NgxListIterator's Iterator impl,
which made it very easy to panic based on untrusted input.

NgxStr::to_str renamed as_str

Same reasoning as above. This method already had the Result<&str, Utf8Error>
return type that I changed ngx_str_t::to_str to have.

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have written my commit messages in the Conventional Commits format.
  • I have read the CONTRIBUTING doc
  • I have added tests (when possible) that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

pchickey added 3 commits July 25, 2025 16:01
The ngx_str_items in the header name and value are often untrusted
input, and may not have utf-8 contents. The use of ngx_str_t::to_str
in this iterator will panic when the contents are not utf-8. So, instead
of yielding a pair of strs here, yield a pair of byte slices.
In Rust, the convention is that `to_` methods take ownership of self
(e.g. fn to_string(self) -> Self) whereas `as_` methods borrow self
(e.g. fn as_str(&self) -> &str).

In general, we should avoid exposing methods that panic when its
reasonable to return a Result instead. This particular method was used,
likely without considering that it may panic or validating that the
contents are utf-8, in ngx::http::NgxListIterator's Iterator impl,
which made it very easy to panic based on untrusted input.
In Rust, the convention is that `to_` methods take ownership of self
(e.g. fn to_string(self) -> Self) whereas `as_` methods borrow self
(e.g. fn as_str(&self) -> &str).
@pchickey pchickey force-pushed the pch/str_conversions branch from 67c967a to 479f08c Compare July 25, 2025 23:04
@pchickey
Copy link
Contributor Author

The NgxListIterator bug was discovered by @xeioex so I'll tag him for review

@@ -478,7 +478,7 @@ impl<'a> Iterator for NgxListIterator<'a> {
}
let header = &part.arr[self.i];
self.i += 1;
Some((header.key.to_str(), header.value.to_str()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather return a pair of NgxStr here, as these now provide Display, PartialEq and necessary conversions.

@xeioex
Copy link
Contributor

xeioex commented Jul 25, 2025

@pchickey

The ngx_str_items in the header name and value are often untrusted
input, and may not have utf-8 contents. The use of ngx_str_t::to_str
in this iterator will panic when the contents are not utf-8. So, instead
of yielding a pair of strs here, yield a pair of byte slices.

I believe we need more time to think for this Header iterator interface as there are a lot of things to consider.
For example I am looking into hyper:

Represents an HTTP header field value.

In practice, HTTP header field values are usually valid ASCII. However, the HTTP spec allows for a header value to contain opaque bytes as well. In this case, the header field value is not able to be represented as a string.

To handle this, the HeaderValue is useable as a type and can be compared with strings and implements Debug. A to_str fn is provided that returns an Err if the header value contains non visible ascii characters.

Also looking at RFC9110

  field-value    = *field-content
  field-content  = field-vchar
                   [ 1*( SP / HTAB / field-vchar ) field-vchar ]
  field-vchar    = VCHAR / obs-text
  obs-text       = %x80-FF
  
  
A field value does not include leading or trailing whitespace. When a specific version of HTTP allows such whitespace to appear in a message, a field parsing implementation MUST exclude such whitespace prior to evaluating the field value.

Field values are usually constrained to the range of US-ASCII characters [[USASCII](https://www.rfc-editor.org/rfc/rfc9110.html#USASCII)]. Fields needing a greater range of characters can use an encoding, such as the one defined in [[RFC8187](https://www.rfc-editor.org/rfc/rfc9110.html#RFC8187)]. Historically, HTTP allowed field content with text in the ISO-8859-1 charset [[ISO-8859-1](https://www.rfc-editor.org/rfc/rfc9110.html#ISO-8859-1)], supporting other charsets only through use of [[RFC2047](https://www.rfc-editor.org/rfc/rfc9110.html#RFC2047)] encoding. Specifications for newly defined fields SHOULD limit their values to visible US-ASCII octets (VCHAR), SP, and HTAB. A recipient SHOULD treat other allowed octets in field content (i.e., [obs-text](https://www.rfc-editor.org/rfc/rfc9110.html#fields.values)) as opaque data.

Field values containing CR, LF, or NUL characters are invalid and dangerous, due to the varying ways that implementations might parse and interpret those characters; a recipient of CR, LF, or NUL within a field value MUST either reject the message or replace each of those characters with SP before further processing or forwarding of that message. Field values containing other CTL characters are also invalid; however, recipients MAY retain such characters for the sake of robustness when they appear within a safe context (e.g., an application-specific quoted string that will not be processed by any downstream HTTP parser).

Treating the headers as bytes simplifies things for us in ngx-rust, but I am not sure it is very convenient for users.

Comment on lines +301 to +302
if let Ok(value) = std::str::from_utf8(value) {
headers.insert(http::header::HOST, value.parse().unwrap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let Ok(value) = std::str::from_utf8(value) {
headers.insert(http::header::HOST, value.parse().unwrap());
if let Ok(value) = http::HeaderValue::from_bytes(value) {
headers.insert(http::header::HOST, value);

@@ -282,7 +282,7 @@ http_request_handler!(awssigv4_header_handler, |request: &mut Request| {
}

let datetime = chrono::Utc::now();
let uri = match request.unparsed_uri().to_str() {
let uri = match request.unparsed_uri().as_str() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToStr trait seems pretty standard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at std, it seems to provide as_str for infallible conversions with guaranteed UTF-8 and to_str for maybe-not-utf types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants