Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds the (annotate ...) form and bin read support for flex_uint parameters #801

Merged
merged 2 commits into from
Aug 11, 2024

Conversation

zslayton
Copy link
Contributor

@zslayton zslayton commented Jul 30, 2024

Adds:

  • An implementation of the (annotate) TDL form
  • Binary read support for flex_uint::-encoded template parameters
  • Very limited binary write support for flex_uint::-encoded template parameters to enable unit testing for the reader.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor Author

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ PR Tour 🧭

@@ -24,7 +24,7 @@ use std::ops::Deref;
/// * Decimal `0.0` and `-0.0` are mathematically equivalent but not Ion equivalent.
/// * Decimal `0.0` and `0.00` are mathematically equivalent but not Ion equivalent.
/// * Timestamps representing the same point in time at different precisions or at different
/// timezone offsets are not Ion equivalent.
/// timezone offsets are not Ion equivalent.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Rust v1.80.0 added some Clippy warnings around doc comment markdown formatting.

@@ -88,6 +90,8 @@ pub(crate) struct EncodedValue<HeaderType: EncodedHeader> {
pub annotations_encoding: AnnotationsEncoding,
// The offset of the type descriptor byte within the overall input stream.
pub header_offset: usize,
// If this value was written with a tagless encoding, this will be 0. Otherwise, it's 1.
pub opcode_length: u8,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Retrofitting support for encodings other than Tagged in EncodedValue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be derived (i.e. fn is_tagless() -> bool) rather than storing the opcode length in a field? Tagless encodings must have value_body_length + length_length == total_length, and anything else must have an opcode, so value_body_length + length_length < total_length is always true for anything that is not a tagless encoded argument. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had the same thought mid-PR. I've opened #805 to track refactoring this.

Comment on lines +242 to +245
// XXX: This *doesn't* slice `self` because FlexUInt::read() is faster if the input
// is at least the size of a u64.
let matched_input = self;
let remaining_input = self.slice_to_end(size_in_bytes);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ I have a fix for this waiting in the PR that follows this one.

}
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ cargo fmt moved some things around in this file.

Comment on lines +185 to +187
/// Confirms that the provided `value` is a symbol with known text. If so, returns `Ok(text)`.
/// If not, returns a decoding error containing the specified label.
fn expect_symbol<'a, Encoding: Decoder>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ I added some helper methods in the TemplateCompiler to make its main path a bit easier to read.

EExpArgGroupIterator, EExpressionArgGroup, MacroExpansion, MacroExpansionKind, MacroExpr,
MacroExprArgsIterator, MakeStringExpansion, RawEExpression, TemplateExpansion, ValueExpr,
ValuesExpansion,
AnnotateExpansion, EExpArgGroupIterator, EExpressionArgGroup, MacroExpansion,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Adding AnnotateExpansion to this caused a reflow.

@@ -9,7 +9,7 @@ use std::mem;
use std::ops::{Add, Neg};

/// Represents an unsigned integer of any size.
#[derive(Debug, Copy, Clone)]
#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Starting in v1.80, Clippy correctly identified that a lot of our manual implementations could be derived.

@zslayton zslayton marked this pull request as ready for review July 30, 2024 22:05
Base automatically changed from encoding-directives to main July 31, 2024 00:56
@@ -88,6 +90,8 @@ pub(crate) struct EncodedValue<HeaderType: EncodedHeader> {
pub annotations_encoding: AnnotationsEncoding,
// The offset of the type descriptor byte within the overall input stream.
pub header_offset: usize,
// If this value was written with a tagless encoding, this will be 0. Otherwise, it's 1.
pub opcode_length: u8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be derived (i.e. fn is_tagless() -> bool) rather than storing the opcode length in a field? Tagless encodings must have value_body_length + length_length == total_length, and anything else must have an opcode, so value_body_length + length_length < total_length is always true for anything that is not a tagless encoded argument. Right?

pub(crate) fn for_flex_uint(input: ImmutableBuffer<'top>) -> Self {
let encoded_value = EncodedValue {
encoding: ParameterEncoding::FlexUInt,
header: Header {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Header should actually be an enum so that it could have a NoHeader variant in addition to whatever other implementations you have going on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into this as part of #805.

@@ -236,6 +255,21 @@ impl TemplateCompiler {
}
}

fn encoding_for<Encoding: Decoder>(value: LazyValue<Encoding>) -> IonResult<ParameterEncoding> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check to make sure that there isn't a second annotation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should. Tracking this in #806 so I can return to it when my existing PR queue merged.

@@ -1091,6 +1202,25 @@ mod tests {
)
}

#[test]
fn annotate() -> IonResult<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea to have another test case where you invoke annotate as an e-expression rather than as a TDL macro invocation. (Just in case there is a regression, it would (hopefully) make it easier to isolate the problem.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking in #808.

@@ -1087,11 +1087,11 @@ mod tests {
// This directive defines two more.
assert_eq!(new_macro_table.len(), 2 + MacroTable::NUM_SYSTEM_MACROS);
assert_eq!(
new_macro_table.macro_with_id(3),
new_macro_table.macro_with_id(4),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work?

Suggested change
new_macro_table.macro_with_id(4),
new_macro_table.macro_with_id(MacroTable::NUM_SYSTEM_MACROS),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking in #807.

new_macro_table.macro_with_name("seventeen")
);
assert_eq!(
new_macro_table.macro_with_id(4),
new_macro_table.macro_with_id(5),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this?

Suggested change
new_macro_table.macro_with_id(5),
new_macro_table.macro_with_id(MacroTable::NUM_SYSTEM_MACROS + 1),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking in #807.

Copy link
Contributor Author

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have several open PRs building on one another, so I've opened tracking issues to incorporate your feedback after I get them all merged.

@@ -88,6 +90,8 @@ pub(crate) struct EncodedValue<HeaderType: EncodedHeader> {
pub annotations_encoding: AnnotationsEncoding,
// The offset of the type descriptor byte within the overall input stream.
pub header_offset: usize,
// If this value was written with a tagless encoding, this will be 0. Otherwise, it's 1.
pub opcode_length: u8,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had the same thought mid-PR. I've opened #805 to track refactoring this.

pub(crate) fn for_flex_uint(input: ImmutableBuffer<'top>) -> Self {
let encoded_value = EncodedValue {
encoding: ParameterEncoding::FlexUInt,
header: Header {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into this as part of #805.

@@ -236,6 +255,21 @@ impl TemplateCompiler {
}
}

fn encoding_for<Encoding: Decoder>(value: LazyValue<Encoding>) -> IonResult<ParameterEncoding> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should. Tracking this in #806 so I can return to it when my existing PR queue merged.

@@ -1087,11 +1087,11 @@ mod tests {
// This directive defines two more.
assert_eq!(new_macro_table.len(), 2 + MacroTable::NUM_SYSTEM_MACROS);
assert_eq!(
new_macro_table.macro_with_id(3),
new_macro_table.macro_with_id(4),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking in #807.

new_macro_table.macro_with_name("seventeen")
);
assert_eq!(
new_macro_table.macro_with_id(4),
new_macro_table.macro_with_id(5),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking in #807.

@@ -1091,6 +1202,25 @@ mod tests {
)
}

#[test]
fn annotate() -> IonResult<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking in #808.

@zslayton zslayton merged commit e436f96 into main Aug 11, 2024
29 of 32 checks passed
@zslayton zslayton deleted the read-flex_uint-annotate branch August 11, 2024 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants