-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy reader support for s-expressions #627
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #627 +/- ##
==========================================
- Coverage 81.54% 81.30% -0.25%
==========================================
Files 123 123
Lines 23158 23446 +288
Branches 23158 23446 +288
==========================================
+ Hits 18885 19062 +177
- Misses 2549 2649 +100
- Partials 1724 1735 +11
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR tour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ cargo fmt
rearranged the imports a bit in this file.
List(s) => count_sequence_children(s.iter())?, | ||
SExp(s) => count_sequence_children(s.iter())?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ In binary Ion, the parsing logic for the bodies of lists and for s-expressions is identical. However, in text the parsing is substantially different. Not only do they have different delimiters for opening ([
vs (
), closing (]
vs )
, and between values (,
vs whitespace-or-nothing), but s-expressions also allow for the special grammar production of operators.
In order to accommodate these differences without introducing runtime overhead via branching or dynamic dispatch, I had to breaking the LazySequence
type into LazyList
and LazySExp
types that could house their own logic. This change was also made in the raw level and accounts for the majority of this diff.
Meanwhile, the Sequence
type is still unified in the Element
API. I think this is reasonable, as the Element
API is intentionally divorced from the reading logic needed to deserialize a stream into Element
s.
fn count_sequence_children(lazy_sequence: &LazyBinarySequence) -> IonResult<usize> { | ||
fn count_sequence_children<'a, 'b>( | ||
lazy_sequence: impl Iterator<Item = IonResult<LazyBinaryValue<'a, 'b>>>, | ||
) -> IonResult<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This example program takes an iterator to generically handle either a list or sexp. We can always introduce a LazySequence
trait later if desired.
src/lazy/any_encoding.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ All of the changes in this file come from splitting the LazyRawAnySequence
type into LazyRawAnyList
and LazyRawAnySExp
. There is no interesting new logic.
@@ -188,7 +188,7 @@ mod tests { | |||
let lazy_list = reader.next()?.expect_value()?.read()?.expect_list()?; | |||
// Exercise the `Debug` impl | |||
println!("Lazy Raw Sequence: {:?}", lazy_list); | |||
let mut list_values = lazy_list.iter(); | |||
let mut list_values = lazy_list.sequence.iter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ In the binary reader, the LazyRawBinaryList
and LazyRawBinarySExp
types each wrap an instance of a LazyRawBinarySequence
helper type that houses their shared parsing logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The changes in this file all come from splitting the LazySequence
type into distinct LazyList
and LazySExp
types. There is no interesting new logic.
match self.ion_type() { | ||
IonType::SExp => { | ||
write!(f, "(")?; | ||
for value in self { | ||
write!( | ||
f, | ||
"{:?} ", | ||
value | ||
.map_err(|_| fmt::Error)? | ||
.read() | ||
.map_err(|_| fmt::Error)? | ||
)?; | ||
} | ||
write!(f, ")")?; | ||
} | ||
IonType::List => { | ||
write!(f, "[")?; | ||
for value in self { | ||
write!( | ||
f, | ||
"{:?},", | ||
value | ||
.map_err(|_| fmt::Error)? | ||
.read() | ||
.map_err(|_| fmt::Error)? | ||
)?; | ||
} | ||
write!(f, "]")?; | ||
} | ||
_ => unreachable!("LazySequence is only created for list and sexp"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This removal gives a taste of the sort of split logic a unified LazySequence
would need to house at each level of abstraction. Now the List
and SExp
types can simply house their respective logic.
/// Matches an optional annotation sequence and a trailing value. | ||
pub fn match_annotated_value(self) -> IonParseResult<'data, LazyRawTextValue<'data>> { | ||
pair( | ||
opt(Self::match_annotations), | ||
whitespace_and_then(Self::match_value), | ||
) | ||
.map(|(maybe_annotations, mut value)| { | ||
if let Some(annotations) = maybe_annotations { | ||
value.encoded_value = value | ||
.encoded_value | ||
.with_annotations_sequence(annotations.offset(), annotations.len()); | ||
// Rewind the value's input to include the annotations sequence. | ||
value.input = self.slice_to_end(annotations.offset() - self.offset()); | ||
} | ||
value | ||
}) | ||
.parse(self) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This method was moved, not changed. You'll see the removal immediately following.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Half of the changes in this file come from changind LazyRawTextSequence
into LazyRawTextList
. The other half come from introducing the new LazyRawTextSExp
type; these contain interesting new logic. I'll call out where the new logic begins.
@@ -171,6 +146,139 @@ impl<'data> Iterator for RawTextSequenceIterator<'data> { | |||
} | |||
} | |||
|
|||
// ===== S-Expressions ===== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ New logic begins here.
"(a)", | ||
"(a b)", | ||
"(a++)", | ||
"(++a)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test case with operator with different characters in it.
"(++a)", | |
"(++a)", | |
"(a+=b)", |
} | ||
|
||
impl MatchedSymbol { | ||
pub fn read<'data>( | ||
&self, | ||
matched_input: TextBufferView<'data>, | ||
) -> IonResult<RawSymbolTokenRef<'data>> { | ||
use MatchedSymbol::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a local use
right before the match expression really cleans it up a lot.
This is something new-ish for you, I think. I've noticed this in a few places in the lazy reader PRs, but I don't remember seeing it when I was working on Ion Rust 6-9 months ago. Anyway, it's very nice, and I'm going to be copying this from you.
Builds on outstanding PRs #612, #613, #614, #616, #617, #619, #620, #621, #622, and #623.
Adds support for reading text s-expressions in the lazy reader.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.