-
Imagine a scenario where I have a file with a lot of key-value pairs (in the form of A simple example (omitting some details). struct variable_key : lexy::transparent_production
{
[[nodiscard]] CONSTEVAL static auto name() noexcept -> const char* { return "[key]"; }
struct invalid_key
{
constexpr static auto name = "a valid key was required here";
};
constexpr static auto rule = []
{
// begin with not '\r', '\n', '\r\n', whitespace or '='
constexpr auto begin_with_not_blank = dsl::unicode::print - dsl::unicode::newline - dsl::unicode::blank - dsl::equal_sign;
// continue with printable, but excluding '\r', '\n', '\r\n', whitespace and '='
constexpr auto continue_with_printable = dsl::unicode::print - dsl::unicode::newline - dsl::unicode::blank - dsl::equal_sign;
return dsl::peek(begin_with_not_blank) >>
(LEXY_DEBUG("parse variable key begin") +
dsl::identifier(begin_with_not_blank, continue_with_printable) +
LEXY_DEBUG("parse variable key end")) |
dsl::error<invalid_key>;
}();
constexpr static auto value = lexy::forward<lexeme_type>;
};
struct variable_value : lexy::transparent_production
{
[[nodiscard]] CONSTEVAL static auto name() noexcept -> const char* { return "[value]"; }
constexpr static auto rule = []
{
// begin with not '\r', '\n', '\r\n', whitespace or '='
constexpr auto begin_with_not_blank = dsl::unicode::print - dsl::unicode::newline - dsl::unicode::blank - dsl::equal_sign;
// continue with printable, but excluding '\r', '\n', '\r\n', whitespace and '='
constexpr auto continue_with_printable = dsl::unicode::print - dsl::unicode::newline - dsl::unicode::blank - dsl::equal_sign;
return dsl::peek(begin_with_not_blank) >>
(LEXY_DEBUG("parse variable value begin") +
dsl::identifier(begin_with_not_blank, continue_with_printable) +
LEXY_DEBUG("parse variable value end"));
}();
constexpr static auto value = lexy::forward<lexeme_type>;
}; Note that the above example ignores the details of the It looks fine, it can successfully parse the following example:
However, if the following content appears, the parsing will fail:
What's the reason for not allowing whitespace in key/value? We can change the code to look like the following: struct variable_key : lexy::transparent_production
{
[[nodiscard]] CONSTEVAL static auto name() noexcept -> const char* { return "[key]"; }
struct invalid_key
{
constexpr static auto name = "a valid key was required here";
};
constexpr static auto rule = []
{
// begin with not '\r', '\n', '\r\n', whitespace or '='
constexpr auto begin_with_not_blank = dsl::unicode::print - dsl::unicode::newline - dsl::unicode::blank - dsl::equal_sign;
// continue with printable, but excluding '\r', '\n', '\r\n' and '='
constexpr auto continue_with_printable = dsl::unicode::print - dsl::unicode::newline - dsl::equal_sign;
return dsl::peek(begin_with_not_blank) >>
(LEXY_DEBUG("parse variable key begin") +
dsl::identifier(begin_with_not_blank, continue_with_printable) +
LEXY_DEBUG("parse variable key end")) |
dsl::error<invalid_key>;
}();
constexpr static auto value = lexy::forward<lexeme_type>;
};
struct variable_value : lexy::transparent_production
{
[[nodiscard]] CONSTEVAL static auto name() noexcept -> const char* { return "[value]"; }
constexpr static auto rule = []
{
// begin with not '\r', '\n', '\r\n', whitespace or '='
constexpr auto begin_with_not_blank = dsl::unicode::print - dsl::unicode::newline - dsl::unicode::blank - dsl::equal_sign;
// continue with printable, but excluding '\r', '\n', '\r\n' and '='
constexpr auto continue_with_printable = dsl::unicode::print - dsl::unicode::newline - dsl::equal_sign;
return dsl::peek(begin_with_not_blank) >>
(LEXY_DEBUG("parse variable value begin") +
dsl::identifier(begin_with_not_blank, continue_with_printable) +
LEXY_DEBUG("parse variable value end"));
}();
constexpr static auto value = lexy::forward<lexeme_type>;
}; Almost the same code as before, except we remove But that doesn't satisfy us either, we do sometimes want to parse without whitespace, so here's an idea. Suppose the following is a file to be parsed:
We have added a new entity: class State
{
// ...
private:
bool allow_key_whitespace_;
bool allow_value_whitespace_;
}; These two values of state are then used to determine whether whitespace is allowed when parsing key/value. But |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 13 replies
-
Things are much more problematic than I thought. I also have an entity:
Before disallowing whitespace, it is easy to determine that the parsing of values ends at the space after But once whitespace is allowed, we can't be sure whether the content that follows needs to be parsed. And we can't use "no The expected rule should be as follows.
Parsed as:
Parsed as: Question 1: If an identifier (the key must be followed by I know that requiring values to be enclosed in quotes would solve this problem very well, but in practice, I can't do it. So I have to find another way to solve this problem. |
Beta Was this translation helpful? Give feedback.
-
This is context-sensitive parsing, which can't easily be done with pure lexy rules. But your idea with the state is in the right-track. You can use struct value : lexy::scan_production<lexeme_type>
{
template <typename Context, typename Reader>
static constexpr scan_result scan(lexy::rule_scanner<Context, Reader>& scanner, const State& state)
{
if (state.allow_whitespace())
return scanner.parse(value_with_whitespace_production{});
else
return scanner.parse(value_without_whitespace_production{});
}
}; Likewise, you can use a different scan production to set the members of the state. My partial C parser does it a lot: https://github.com/foonathan/clauf/blob/main/src/compiler.cpp#L1453 |
Beta Was this translation helpful? Give feedback.
-
Another point is, what if the two structures return different values? For example, if whitespace is not allowed, we can just return a lexeme_type (or std::string_view). |
Beta Was this translation helpful? Give feedback.
-
I'm not sure I understand correctly, I found this function in the link you gave me. I didn't understand at first that this is all that is needed for lexy::scan_production? How do you assume that you can parse and produce a production? Then I found this line of code, can I understand that |
Beta Was this translation helpful? Give feedback.
This is context-sensitive parsing, which can't easily be done with pure lexy rules.
But your idea with the state is in the right-track. You can use
dsl::scan
to dispatch between rules based on members of the state. Something like:Likewise, you can use a different scan p…