-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds LazyRawTextReader
support for reading symbols
#616
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR tour
fn match_long_string(self) -> IonParseResult<'data, MatchedString> { | ||
// TODO: implement long string matching | ||
// The `fail` parser is a nom builtin that never matches. | ||
fail(self) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This placeholder method was moved from further down in the file.
|
||
/// A helper method for matching bytes until the specified delimiter. Ignores any byte | ||
/// (including the delimiter) that is prefaced by the escape character `\`. | ||
fn match_text_until_unescaped(self, delimiter: u8) -> IonParseResult<'data, (Self, bool)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This method was previously match_short_string
, but it's generally useful for both strings and symbols. match_short_string
and match_quoted_symbol
now call this.
self.input, | ||
result | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Prior to this change, this unit test method would assert that there was no match. However, it was possible for the parser to match part of the input and report success. Now this method requires that the parser match the entire test input to be considered a successful match.
|
||
fn escape_short_string( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This method was also generally useful for text types and has been broken out into a helper method.
|
||
#[test] | ||
fn test_top_level() -> IonResult<()> { | ||
let data = r#" | ||
let mut data = String::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Previously, was just a &str
literal. However, some of the test cases require actual escaped bytes to appear in them, which isn't possible within a raw string (r#""#
). Now it's a mutable String
that we can append things to in bulk.
|
||
/// Like RawSymbolToken, but the Text variant holds a borrowed reference instead of a String. | ||
#[derive(Debug, Clone, PartialEq, Eq)] | ||
pub enum RawSymbolTokenRef<'a> { | ||
SymbolId(SymbolId), | ||
Text(&'a str), | ||
Text(Cow<'a, str>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ If the raw reader encounters a symbol like 'Hello\nworld!'
, it can't just return a reference to those bytes in the input buffer. It has to make a new String
with the \n
replaced by 0x0A
. Using Cow
allows the RawSymbolTokenRef
to hold either a borrowed &str
or an owned String
.
use std::fmt::{Debug, Formatter}; | ||
use std::hash::{Hash, Hasher}; | ||
|
||
/// A reference to a fully resolved symbol. Like `Symbol` (a fully resolved symbol with a | ||
/// static lifetime), a `SymbolRef` may have known or undefined text (i.e. `$0`). | ||
#[derive(PartialEq, Eq, PartialOrd, Ord, Clone)] | ||
pub struct SymbolRef<'a> { | ||
text: Option<&'a str>, | ||
text: Option<Cow<'a, str>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This change is analogous to the one in RawSymbolTokenRef
.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #616 +/- ##
==========================================
+ Coverage 81.64% 81.72% +0.08%
==========================================
Files 119 119
Lines 21547 21778 +231
Branches 21547 21778 +231
==========================================
+ Hits 17591 17799 +208
- Misses 2312 2331 +19
- Partials 1644 1648 +4
☔ View full report in Codecov by Sentry. |
// These inputs have leading/trailing whitespace to make them more readable, but the string | ||
// matcher doesn't accept whitespace. We'll trim each one before testing it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated comment?
|
||
sanitized.push(substitute); | ||
Ok(input_after_escape) | ||
fn write_escaped<'data>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc comment for this function would be appreciated. I believe this is responsible for rewriting escaped characters as their unescaped counterparts—in other words, it unescapes any escaped characters in a TextBufferView
—but the function name had me thinking the opposite at first.
/// The symbol is delimited by single quotes. Holds a `bool` indicating whether the | ||
/// matched input contained any escaped bytes. | ||
Quoted(bool), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious—any particular reason for having Quoted(bool)
instead of e.g. Quoted
and QuotedWithEscaped
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, that's better. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in #619.
Builds on #609, #612, #613, and #614.
Adds support for reading quoted symbols (
'foo'
), identifiers (foo
), and symbol IDs ($42
). Also modifies theSymbolRef
andRawSymbolToken
types to hold aCow<'a, str>
instead of a&str
to accommodate situations where the symbol's text in input contained escapes and so required allocating a new string.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.