parse_unicode_url function #882

alandefreitas · 2024-11-11T18:58:35Z

We should have a function to handle unicode urls:

result<url> parse_unicode_url(...)

To convert to a valid URL, the host would be converted with punycode, and other components would percent-escape when possible. Errors are still possible.

We can identify components by identifying possible delimiters (like urls::format does) or we can provide functions that create a URL from its components:

result<url>
make_url(
  string_view scheme,
  utf8_string_view authority,
  utf8_string_view path,
  utf8_string_view query,
  utf8_string_view fragment)
{
    url u;
    u.reserve(...);
    u.set_scheme(scheme);
    u.set_authority(authority);
    u.set_path(path);
    // ...
}

result<url>
make_iri(
  string_view scheme,
  utf8_string_view authority,
  utf8_string_view path,
  utf8_string_view query,
  utf8_string_view fragment)
{
    url u;
    u.reserve(...);
    u.set_scheme(scheme);
    u.set_authority(detail::parse_punycode(authority));
    u.set_path(detail::pct_encode(path));
    // ...
}

The text was updated successfully, but these errors were encountered:

luz-arreola · 2024-11-18T18:46:17Z

It is probably a very good idea to use ICU for this. I currently use ICU to create and parse unicode URLs. It works perfectly with all types of Unicode characters. Adding ICU support for Boost.URL would be excellent! Alternatively, there are some very lightweight unicode libraries that I think would work for this, but ICU is the only fully featured unicode library. People who work with unicode very likely already use many of the features only provided by ICU I believe.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parse_unicode_url function #882

parse_unicode_url function #882

alandefreitas commented Nov 11, 2024

luz-arreola commented Nov 18, 2024

parse_unicode_url function #882

parse_unicode_url function #882

Comments

alandefreitas commented Nov 11, 2024

luz-arreola commented Nov 18, 2024