Skip to content

v1.0.0-rc.4

Compare
Choose a tag to compare
@zslayton zslayton released this 31 May 20:06
· 71 commits to main since this release
80088b4

Caution

This release contains a bug that can cause it to emit invalid binary data if a value is explicitly assigned an empty annotations sequence. It has been fixed in v1.0.0-rc.5; users should upgrade to that version instead.

This release closes out the list of known blocking issues for version 1.0. It includes substantial changes to the experimental streaming reader and writer APIs, but only relatively small changes to the Element API being stabilized in ion-rs v1.0.

Breaking changes to the Element API

The minimum Rust version has been bumped to 1.67

Details

This allowed us to benefit from the stabilization of the ilog10 operation, which greatly simplified much of the code for our Decimal and Int types

Ints and Decimal coefficients are now limited to the i128 range

Details

The Ion data model does not impose any limitation on the range of integers that it can represent. Previously, the Int and Coefficient types would fall back to heap-allocated space to enable the representation of arbitrarily-sized integers. However, in practice there has been no call to support integers that require 17 or more bytes to represent. This simplification allowed us to remove our dependency on BigInt and to remove many branches from the codebase. We saw reading benchmark improvements in the 3-6% range across the board.

Element encoding methods have been replaced

Details

The following Element methods have been removed:

  • write_as (encode data as Ion and write it to an io::Write impl)
  • to_binary (encode data as binary Ion and return a newly allocated Vec<u8>)
  • to_text (encode data as text Ion and return a newly allocated String)

These methods did not offer a means to configure the way the data was encoded beyond a coarse-grained choice of format. In particular, there was no room in their signatures to allow users to specify a version of Ion to use, which will become necessary when Ion 1.1 is released.

To address this, we have added types that represent the available Ion encodings:

  • ion_rs::v1_0::Binary
  • ion_rs::v1_0::Text

v1_0 refers to the Ion specification version, not the crate's version.

These types can be passed to methods to specify an encoding to use. They also serve as entry points to a builder API for the new WriteConfig type which allows users to specify how their data is encoded.

Together with Element's new encode_as and encode_to methods, users will be able to fully specify how their data is encoded with a list of settings that will grow over time. WriteConfig's builder-style API gives us an evolution path.

encode_as

use ion_rs::v1_0::{Binary, Text};

let element: Element = Element::string("hello");

// Encode the element as binary Ion.
let binary_buffer: Vec<u8> = element.encode_as(Binary)?;
assert_eq!(element, Element::read_one(binary_buffer)?);

// Encode the element as text Ion, further specifying that the text should be generously spaced ("pretty").
let text_ion: String = element.encode_as(Text.with_format(TextFormat::Pretty))?;
assert_eq!(element, Element::read_one(text_ion)?);

Notice that using encode_as with a text encoding results in a String while using a binary encoding results in a Vec<u8>.

encode_to

use ion_rs::v1_0::{Binary, Text};

let element: Element = Element::string("hello");

// Encode the element as binary Ion and write it to an `io::Write` implementation
let mut buffer = Vec::new();
element.encode_to(&mut buffer, Binary)?;
assert_eq!(element, Element::read_one(buffer)?);

// Encode the element as pretty text Ion and write it to an `io::Write` implementation
let mut buffer = Vec::new();
let text_ion: String = element.encode_to(Text.with_format(TextFormat::Pretty))?;
assert_eq!(element, Element::read_one(text_ion)?);

Changes that do not affect the Element API

The experimental IonReader and IonWriter traits have been replaced

Details

The IonReader and IonWriter APIs mimicked the stateful streaming IonReader/IonWriter APIs used in ion-java. Each method call would modify the state of the reader or writer, potentially changing the operations that were legal to call afterward. On the reading side, it was nearly impossible to return to data that had already been visited, which made handling struct fields (which can arrive in any order) painful.

Because development of the IonReader and IonWriter traits predated the availability of GATs, there were also many places in the API where it was necessary to use Box<dyn> to generalize over different encodings and layers of abstraction. Box<dyn> requires heap allocation and vtable lookups to function, which negatively affected performance.

These traits have been replaced by a Reader type and a Writer type that are generic over the encoding you wish to use.

Reader

Details

A reader instance only offers a few methods, the most central of which is next(). Each call to next() returns a LazyValue representing the next top-level value in the stream.

let ion_data = "1 foo::true 2024T";
let mut reader = Reader::new(ion_data);

while let Some(value) = reader.next()? {
  println!("It's a(n) {:?}", value.ion_type());
}

In the above example, the reader visits each value in the stream but--because value is a LazyValue--does not read the value. A LazyValue can tell you the value's data type, whether it's null, its annotations, and upon request, its data.

The old IonReader trait had read_TYPE methods for each Ion type (read_bool, read_int, etc). These methods would fail if the reader was not positioned on a value of the correct type or if the value was null, requiring applications to inspect its state ahead of time. Once it was confirmed, however, there was not a way to avoid having to check the Result wrapping the read_TYPE method's output.

The StreamItem enum's Null and (non-null) Value variants were distinct to allow users to call read_TYPE with the confidence that the value returned was non-null.

This combination of characteristics required unwieldy code like the following, which recursively reads and counts the values at all levels of depth in the stream:

    fn read_all_values<R: IonReader<Item = StreamItem>>(reader: &mut R) -> IonResult<usize> {
        use IonType::*;
        use StreamItem::{Nothing, Null as NullValue, Value};
        let mut count: usize = 0;
        loop {
            match reader.next()? {
                NullValue(_ion_type) => { // null values get their own code path
                    count += 1;
                    continue;
                }
                Value(ion_type) => {
                    count += 1;
                    // We need to match against the IonType of this value to know what read_* method to call
                    match ion_type {
                        String => {
                            // Each scalar read method has a `?` that handles both invalid data and
                            // the case where the reader is on a valid value of an unexpected type.
                            let _string = reader.read_str()?;
                        }
                        Symbol => {
                            // This demonstration code would read the value and discard it for timing purposes.
                            let _symbol_id = reader.read_symbol()?;
                        }
                        Int => {
                            let _int = reader.read_i64()?;
                        }
                        Float => {
                            let _float = reader.read_f64()?;
                        }
                        Decimal => {
                            let _decimal = reader.read_decimal()?;
                        }
                        Timestamp => {
                            let _timestamp = reader.read_timestamp()?;
                        }
                        Bool => {
                            let _boolean = reader.read_bool()?;
                        }
                        Blob => {
                            let _blob = reader.read_blob()?;
                        }
                        Clob => {
                            let _clob = reader.read_clob()?;
                        }
                        Null => {
                            // Matching against the IonType requires us to handle `Null` even though our StreamItem
                            // variants make that impossible.
                        }
                        // Reading a container requires you to mutate the state of the reader and then continue the loop,
                        // which can be difficult for developers to mentally model.
                        Struct | List | SExp => reader.step_in()?,
                    }
                }
                Nothing if reader.depth() > 0 => {
                    reader.step_out()?;
                }
                _ => break,
            }
        }
        Ok(count)
    }

In contrast, when you call LazyValue::read()?, it returns a ValueRef--an enum of the possible types it can return. Here's updated code that does the same thing as the IonReader code above:

    fn count_value_and_children<D: Decoder>(lazy_value: &LazyValue<D>) -> IonResult<usize> {
        use ValueRef::*;
        // Calling `read()` on the lazy value returns a `ValueRef`
        let child_count = match lazy_value.read()? {
            // For the container types, we can pass the container to another method to process
            // its child values
            List(s) => count_sequence_children(s.iter())?,
            SExp(s) => count_sequence_children(s.iter())?,
            Struct(s) => count_struct_children(&s)?,
            // While the IonReader needed `match` arms for every Ion type to make sure that
            // the value was consumed for timing purposes, in this API we have already called
            // `read()` so there's no need to spell out all of the other cases.
            _ => 0,
        };
        Ok(1 + child_count)
    }

    fn count_sequence_children<'a, D: Decoder>(
        lazy_sequence: impl Iterator<Item = IonResult<LazyValue<'a, D>>>,
    ) -> IonResult<usize> {
        let mut count = 0;
        for value in lazy_sequence {
            count += count_value_and_children(&value?)?;
        }
        Ok(count)
    }

    fn count_struct_children<D: Decoder>(lazy_struct: &LazyStruct<D>) -> IonResult<usize> {
        let mut count = 0;
        for field in lazy_struct {
            count += count_value_and_children(&field?.value())?;
        }
        Ok(count)
    }

Lazy values hold a reference to a slice of the input stream. As such, the Reader cannot be advanced to the next top-level value while a LazyValue is still in use. However, it also means that a LazyValue--and its child values--can be read and re-read any number of times. This make working with structs much easier, as a LazyStruct (returned in a ValueRef::Struct(...)) provides a get() that will find the requested field for you.

let data = r#"{foo: "red", bar: "blue", baz: "green"}"#;
let mut reader = Reader::new(data);
let lazy_value = reader.expect_next()?; // If there isn't a next value, returns an error
let lazy_struct = lazy_value.read()?.expect_struct()?; // If it isn't a struct, returns an error

// Use `get()` to find and read the value associated with the request field name.
assert_eq!(lazy_struct.get("bar")?, Some("blue"));
assert_eq!(lazy_struct.get("baz")?, Some("green"));
assert_eq!(lazy_struct.get("waffle")?, None);
// `get_expected` will return an error if the field is not found
assert_eq!(lazy_struct.get_expected("foo")?, "red");

LazyList and LazySExp offer value iterators, and LazyStruct offers a field iterator for more manual operations.

Writer

Details

The Writer type can write any value of a type that implements the WriteAsIon trait, which includes the expected scalar types:

use ion_rs::{Writer, v1_0::Binary};
let mut writer = Writer::new(Binary, Vec::new())?;
writer.write(1);
writer.write(false.annotated_with("teeth"))?;
writer.write(Decimal::new(1999, -2).annotated_with("USD"))?;
let bytes = writer.close()?;

To write a container like a list or a struct, you can get a specialized container writer:

let mut list = writer.list_writer()?;
list
  .write(1)?
  .write(2)?
  .write(3)?;
list.close()?;

let mut struct_ = writer.struct_writer()?;
struct_
  .write("foo", 1)?
  .write("bar", 2)?
  .write("baz", 3)?;
struct_.close()?;

The WriteAsIon trait has a single method and can be implemented for custom user types. Here's an example:

struct Point2D {
  x: u32,
  y: u32,
}

impl Point2D {
    fn new(x: u32, y: u32) -> Self {
        Point2D {x, y}
    }
}

impl WriteAsIon for Point2D {
    fn write_as_ion<V: ValueWriter>(&self, writer: V) -> IonResult<()> {
        let mut struct_ = writer.struct_writer()?;
        struct_.write("x", self.x)?
               .write("y", self.y)?;
        struct_.close()
    }
}

let mut writer = Writer::new(Text, Vec::new())?;
writer.write(Point2D::new(1, 2))?;
writer.write(Point2D::new(3, 4))?;
writer.write(Point2D::new(5, 6))?;
let bytes = writer.close()?;

Other changes

  • Experimental Ion 1.1 feature implementations
  • Experimental tooling APIs to access the encoded bytes of item in the stream
  • The CI test matrix now includes Amazon Linux (both arm64 and x86_64)
  • serde serializes unit structs as a symbol. (struct Foo; is serialized as Foo instead of "Foo")

What's Changed

  • Adds Amazon Linux (arm64 and x86_64) to the build matrix by @popematt in #725
  • Implements writing e-expressions in binary 1.1 by @zslayton in #722
  • Adds a StreamingRawReader, switches Element to the LazyReader by @zslayton in #727
  • Removes experimental token stream, updates serde to use the lazy reader by @zslayton in #729
  • Adds trait implementations by @zslayton in #730
  • Updates dep versions, remove deprecated method use from example by @zslayton in #731
  • Simplifies the ValueWriter trait family by @zslayton in #732
  • Adds support for parameterized Rust enums in serde by @desaikd in #733
  • Update CodeBuild Runner integration to include the run id and attempt by @popematt in #742
  • Add support for binary 1.1 reader by @nirosys in #737
  • Update GHA workflow to eliminate some redundant jobs by @popematt in #744
  • Drops closure-based container writers by @zslayton in #741
  • Introduces an application-level lazy writer by @zslayton in #745
  • Removes the IonWriter trait and its implementations by @zslayton in #749
  • Adds a raw text writer for Ion v1.1 by @zslayton in #750
  • Values are no longer serialized with a type identifier annotation. by @zslayton in #751
  • Removes the IonReader trait and its implementations. by @zslayton in #752
  • Add 1.1 binary reader support for bools by @nirosys in #753
  • Add 1.1 binary reader support for strings and integers. by @nirosys in #754
  • Adds APIs for accessing encoding of raw stream items by @zslayton in #760
  • Ensure that FixedInt and FixedUInt handle zero-length integers by @popematt in #762
  • Text reader implementation cleanup by @zslayton in #763
  • Add 1.1 binary reader support for symbols by @nirosys in #758
  • Add 1.1 binary reader support for blobs and clobs by @nirosys in #755
  • Add 1.1 binary reader support for floats by @nirosys in #756
  • Add 1.1 binary reader support for decimals by @nirosys in #757
  • Add 1.1 binary reader support for length-prefixed Lists & S-Expressions by @nirosys in #769
  • Removes the crate's dependency on BigInt, BigUint by @zslayton in #767
  • Fix serde string deserialization for Ion symbols by @sajidanw in #771
  • Feature gating and renames by @zslayton in #772
  • Renames TextKind to TextFormat, adds with_format method by @zslayton in #773
  • Doc cleanup and 2x bug fixes by @zslayton in #774

New Contributors

Full Changelog: v1.0.0-rc.3...v1.0.0-rc.4