Skip to content

Commit

Permalink
document grammar rules
Browse files Browse the repository at this point in the history
  • Loading branch information
alandefreitas authored and vinniefalco committed Feb 21, 2022
1 parent 3dba527 commit 015da40
Show file tree
Hide file tree
Showing 7 changed files with 293 additions and 48 deletions.
1 change: 0 additions & 1 deletion Jamfile
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,3 @@ constant c11-requires :
;

build-project test ;
build-project example ;
10 changes: 4 additions & 6 deletions doc/qbk/0.main.qbk
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
[def __MoveConstructible__ [@https://en.cppreference.com/w/cpp/named_req/MoveConstructible ['MoveConstructible]]]
[def __SemiRegular__ [@https://en.cppreference.com/w/cpp/concepts/semiregular ['SemiRegular]]]
[def __Swappable__ [@https://en.cppreference.com/w/cpp/named_req/Swappable ['Swappable]]]
[def __CharSet__ [link url.grammar.charset ['CharSet]]]
[def __CharSet__ [link url.grammars_rules.charset ['CharSet]]]

[def __std_swap__ [@https://en.cppreference.com/w/cpp/algorithm/swap `std::swap`]]
[def __authority_view__ [link url.ref.boost__urls__authority_view `authority_view`]]
Expand Down Expand Up @@ -88,12 +88,10 @@

[include 4.0.modifying.qbk]

[section Allocators]
[endsect]

[section Grammars Rules]
[include 5.0.grammars.qbk]

[section Examples]
[include 5.1.customization.qbk]
[include 5.2.CharSet.qbk]
[endsect]

[section:ref Reference]
Expand Down
67 changes: 58 additions & 9 deletions doc/qbk/5.0.grammars.qbk
Original file line number Diff line number Diff line change
Expand Up @@ -8,24 +8,73 @@
Official repository: https://github.com/CPPAlliance/url
]

[section Grammar]

[heading Design of grammar rules]
[section Design]

The function [link url.ref.boost__urls__grammar__parse `parse`] implements the logic for parsing strings
according to grammar rules.

A grammar rule type, henceforth called a "rule", provides an algorithm for parsing an input string. An
instance of the rule is used to store the results.

[heading Customization points]
[table [[Code][Output]] [[
[c++]
[snippet_parse_1]
][
[teletype]
```
scheme: http
suffix: :after_scheme
```
]]]

In this example, the function [link url.ref.boost__urls__grammar__parse `parse`]
returns `true` if the specified range of characters begins with a scheme. When
the operation completes successfully, the rule instance holds the results.

The iterator is updated to the position where the rule ended, leaving the suffix
at the range between the new iterator and the old end iterator. This behavior is
useful when parsing a sequence of rules.

[table [[Code][Output]] [[
[c++]
[snippet_parse_2]
][
[teletype]
```
query: ?key=value
fragment: anchor
```
]]]

Parsing a sequence of rules is such a common pattern that a special overload is
provided:

[table [[Code][Output]] [[
[c++]
[snippet_parse_3]
][
[teletype]
```
query: ?key=value
fragment: anchor
```
]]]

Users can define a free function `parse` as a customization point defining how to parse their
grammar rules as part of the same architecture that might include arbitrary grammar rules in expressions.
If all the logic has been represented in a single rule, we often want to parse
a complete string as a rule.

These new function overloads may be defined in other namespaces. As with __std_swap__, the design relies
on [@https://en.cppreference.com/w/cpp/language/adl argument-dependent lookup] to find these overloads.
[table [[Code][Output]] [[
[c++]
[snippet_parse_4]
][
[teletype]
```
scheme: http
host: www.boost.org
```
]]]

[include CharSet.qbk]
The function [link url.ref.boost__urls__grammar__parse_string `parse_string`] only returns
true when the whole string matches the rule.

[endsect]
45 changes: 45 additions & 0 deletions doc/qbk/5.1.customization.qbk
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
[/
Copyright (c) 2019 Vinnie Falco ([email protected])
Copyright (c) 2021 Alan de Freitas ([email protected])

Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

Official repository: https://github.com/CPPAlliance/url
]

[section Customization points]

Users can define customization points defining the logic to parse and store the results of
grammar rules as part of the same library architecture.

This allows arbitrary grammar logic in expressions that interact with the existing rules.
Some use cases could include alternative or extended syntax for URLs and its components.

These new function overloads may be defined in other namespaces with the `tag_invoke`
customization point.

[snippet_customization_1]

The function [link url.ref.boost__urls__grammar__parse `parse`] relies on
[@https://en.cppreference.com/w/cpp/language/adl argument-dependent lookup] to find these function
overloads with the appropriate tag [link url.ref.boost__urls__grammar__parse_tag `grammar::parse_tag`].

At this point, the new rule can interact with existing rules in any of the parsing functions:

[table [[Code][Output]] [[
[c++]
[snippet_customization_2]
][
[teletype]
```
scheme: http
lower: somelowercase
```
]]]





[endsect]
50 changes: 26 additions & 24 deletions doc/qbk/CharSet.qbk → doc/qbk/5.2.CharSet.qbk
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,31 @@
A ['CharSet] is a unary predicate which accepts as its single
argument a value of type `char`. The return value of the predicate
is a `bool` whose value is true if the character is a member of the
notional character set, or false otherwise. A character set can be
used to specify which characters are unreserved and thus, do not
need to be escaped when used in percent-encoding algorithms.
Character sets may also be used by parsers; some character sets
notional character set, or false otherwise.

[snippet_charset_1]

The library provides a number of ['CharSet] predicates related to
URL components.

[snippet_charset_2]

A character set are used to specify which characters are unreserved
in a grammar rules. In URLs, they determine which characters do not
need to be escaped in percent-encoding algorithms.

[table [[Code][Output]] [[
[c++]
[snippet_charset_3]
][
[teletype]
```
query: key=the%20value
decoded size: 13
```
]]]

Character sets may also be used directly by parsers; some character sets
have optimized implementations for finding matching elements.

[heading Related Identifiers]
Expand Down Expand Up @@ -90,26 +111,7 @@ In this table:

[heading Exemplar]

```
struct CharSet
{
bool operator()( char c ) const noexcept;

char const* find_if ( char const* first, char const* last ) const noexcept;
char const* find_if_not ( char const* first, char const* last ) const noexcept;
};
```

[heading Example]
```
struct digit_chars_t
{
constexpr bool operator()( char c ) const noexcept
{
return c >= '0' && c <= '9';
}
};
```
[snippet_charset_4]

[heading Models]

Expand Down
2 changes: 1 addition & 1 deletion doc/qbk/quickref.xml
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@
</simplelist>
<bridgehead renderas="sect3">Concepts</bridgehead>
<simplelist type="vert" columns="1">
<member><link linkend="url.grammar.charset"><emphasis>CharSet</emphasis></link></member>
<member><link linkend="url.grammars_rules.charset"><emphasis>CharSet</emphasis></link></member>
</simplelist>
</entry>
</row></tbody>
Expand Down
Loading

0 comments on commit 015da40

Please sign in to comment.