Skip to content

Commit

Permalink
Tweak discussion of unicode; link sections together.
Browse files Browse the repository at this point in the history
  • Loading branch information
mnot committed Feb 12, 2024
1 parent 8cbfc3c commit 902ff45
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions draft-ietf-httpbis-sfbis.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ For example, field instance might contain a syntactically valid Date ({{date}}),

# Structured Data Types {#types}

This section defines the abstract types for Structured Fields, and summarizes how those types are serialized into textual HTTP fields.
This section provides an overview of the abstract types that Structured Fields use, and gives a brief description and examples of how each of those types are serialized into textual HTTP fields. {{text}} specifies the details of how they are parsed from and serialized into textual HTTP fields.

In summary:

Expand Down Expand Up @@ -498,7 +498,7 @@ Parsers MUST support Dates whose values include all days in years 1 to 9999 (i.e

### Display Strings {#displaystring}

Display Strings are similar to Strings, in that they consist of zero or more characters, but they allow Unicode content, unlike Strings.
Display Strings are similar to Strings, in that they consist of zero or more characters, but they allow Unicode scalar values (i.e., all Unicode code points except for surrogates), unlike Strings.

Display Strings are intended for use in cases where a value is displayed to end users, and therefore may need to carry non-ASCII content. It is NOT RECOMMENDED that they be used in situations where a String ({{string}}) or Token ({{token}}) would be adequate, because Unicode has processing considerations (e.g., normalization) and security considerations (e.g., homograph attacks) that make it more difficult to handle correctly.

Expand All @@ -516,7 +516,7 @@ See {{security}} for additional security considerations when handling Display St

# Working with Structured Fields in HTTP {#text}

This section defines how to serialize and parse Structured Fields in textual HTTP field values and other encodings compatible with them (e.g., in HTTP/2 {{RFC9113}} before compression with HPACK {{HPACK}}).
This section defines how to serialize and parse the abstract types defined by {{types}} into textual HTTP field values and other encodings compatible with them (e.g., in HTTP/2 {{RFC9113}} before compression with HPACK {{HPACK}}).

## Serializing Structured Fields {#text-serialize}

Expand Down Expand Up @@ -731,6 +731,8 @@ Given a sequence of Unicode codepoints as input_sequence, return an ASCII string
4. Append DQUOTE to encoded_string.
5. Return encoded_string.

Note that {{UTF8}} prohibits the encoding of codepoints between U+D800 and U+DFFF (surrogates); if they occur in input_sequence, serialization will fail.

This comment has been minimized.

Copy link
@reschke

reschke Feb 13, 2024

Contributor

Nit: we probably should explain the "U+" notation. I did that in an earlier commit that never made it into a PR (see cbfba24). That change also had examples for non-serializible strings.


## Parsing Structured Fields {#text-parse}

Expand Down Expand Up @@ -1056,7 +1058,7 @@ The size of most types defined by Structured Fields is not limited; as a result,
It is possible for parties with the ability to inject new HTTP fields to change the meaning
of a Structured Field. In some circumstances, this will cause parsing to fail, but it is not possible to reliably fail in all such circumstances.

The Display String type can convey any possible Unicode code point without sanitization; for example, they might contain unassigned code points, control points (including NUL), or noncharacters. Therefore, applications consuming Display Strings need to consider strategies such as filtering or escaping untrusted content before displaying it. See also {{UNICODE-SECURITY}} and {{?I-D.draft-bray-unichars}}.
The Display String type can convey any possible Unicode code point without sanitization; for example, they might contain unassigned code points, control points (including NUL), or noncharacters. Therefore, applications consuming Display Strings need to consider strategies such as filtering or escaping untrusted content before displaying it. See also {{UNICODE-SECURITY}}.

--- back

Expand Down

2 comments on commit 902ff45

@duerst
Copy link

@duerst duerst commented on 902ff45 Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mnot Looks good, except that I feel the need for an article in "if they occur in input_sequence, serialization will fail."
I think both "if they occur in the input_sequence, serialization will fail." and "if they occur in an input_sequence, serialization will fail." should be fine; what's best may depend on context.

@mnot
Copy link
Member Author

@mnot mnot commented on 902ff45 Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the style used throughout the rest of the spec, for brevity.

Please sign in to comment.