v1.0.0-rc.4
Caution
This release contains a bug that can cause it to emit invalid binary data if a value is explicitly assigned an empty annotations sequence. It has been fixed in v1.0.0-rc.5
; users should upgrade to that version instead.
This release closes out the list of known blocking issues for version 1.0. It includes substantial changes to the experimental streaming reader and writer APIs, but only relatively small changes to the Element
API being stabilized in ion-rs
v1.0.
Breaking changes to the Element
API
The minimum Rust version has been bumped to 1.67
Details
This allowed us to benefit from the stabilization of the ilog10
operation, which greatly simplified much of the code for our Decimal
and Int
types
Int
s and Decimal
coefficients are now limited to the i128
range
Details
The Ion data model does not impose any limitation on the range of integers that it can represent. Previously, the Int
and Coefficient
types would fall back to heap-allocated space to enable the representation of arbitrarily-sized integers. However, in practice there has been no call to support integers that require 17 or more bytes to represent. This simplification allowed us to remove our dependency on BigInt
and to remove many branches from the codebase. We saw reading benchmark improvements in the 3-6% range across the board.
Element
encoding methods have been replaced
Details
The following Element
methods have been removed:
write_as
(encode data as Ion and write it to anio::Write
impl)to_binary
(encode data as binary Ion and return a newly allocatedVec<u8>
)to_text
(encode data as text Ion and return a newly allocatedString
)
These methods did not offer a means to configure the way the data was encoded beyond a coarse-grained choice of format. In particular, there was no room in their signatures to allow users to specify a version of Ion to use, which will become necessary when Ion 1.1 is released.
To address this, we have added types that represent the available Ion encodings:
ion_rs::v1_0::Binary
ion_rs::v1_0::Text
v1_0
refers to the Ion specification version, not the crate's version.
These types can be passed to methods to specify an encoding to use. They also serve as entry points to a builder API for the new WriteConfig
type which allows users to specify how their data is encoded.
Together with Element
's new encode_as
and encode_to
methods, users will be able to fully specify how their data is encoded with a list of settings that will grow over time. WriteConfig
's builder-style API gives us an evolution path.
encode_as
use ion_rs::v1_0::{Binary, Text};
let element: Element = Element::string("hello");
// Encode the element as binary Ion.
let binary_buffer: Vec<u8> = element.encode_as(Binary)?;
assert_eq!(element, Element::read_one(binary_buffer)?);
// Encode the element as text Ion, further specifying that the text should be generously spaced ("pretty").
let text_ion: String = element.encode_as(Text.with_format(TextFormat::Pretty))?;
assert_eq!(element, Element::read_one(text_ion)?);
Notice that using encode_as
with a text encoding results in a String
while using a binary encoding results in a Vec<u8>
.
encode_to
use ion_rs::v1_0::{Binary, Text};
let element: Element = Element::string("hello");
// Encode the element as binary Ion and write it to an `io::Write` implementation
let mut buffer = Vec::new();
element.encode_to(&mut buffer, Binary)?;
assert_eq!(element, Element::read_one(buffer)?);
// Encode the element as pretty text Ion and write it to an `io::Write` implementation
let mut buffer = Vec::new();
let text_ion: String = element.encode_to(Text.with_format(TextFormat::Pretty))?;
assert_eq!(element, Element::read_one(text_ion)?);
Changes that do not affect the Element
API
The experimental IonReader
and IonWriter
traits have been replaced
Details
The IonReader
and IonWriter
APIs mimicked the stateful streaming IonReader
/IonWriter
APIs used in ion-java
. Each method call would modify the state of the reader or writer, potentially changing the operations that were legal to call afterward. On the reading side, it was nearly impossible to return to data that had already been visited, which made handling struct fields (which can arrive in any order) painful.
Because development of the IonReader
and IonWriter
traits predated the availability of GATs, there were also many places in the API where it was necessary to use Box<dyn>
to generalize over different encodings and layers of abstraction. Box<dyn>
requires heap allocation and vtable lookups to function, which negatively affected performance.
These traits have been replaced by a Reader
type and a Writer
type that are generic over the encoding you wish to use.
Reader
Details
A reader instance only offers a few methods, the most central of which is next()
. Each call to next()
returns a LazyValue
representing the next top-level value in the stream.
let ion_data = "1 foo::true 2024T";
let mut reader = Reader::new(ion_data);
while let Some(value) = reader.next()? {
println!("It's a(n) {:?}", value.ion_type());
}
In the above example, the reader visits each value in the stream but--because value
is a LazyValue
--does not read the value. A LazyValue
can tell you the value's data type, whether it's null
, its annotations, and upon request, its data.
The old IonReader
trait had read_TYPE
methods for each Ion type (read_bool
, read_int
, etc). These methods would fail if the reader was not positioned on a value of the correct type or if the value was null
, requiring applications to inspect its state ahead of time. Once it was confirmed, however, there was not a way to avoid having to check the Result
wrapping the read_TYPE
method's output.
The StreamItem
enum's Null
and (non-null) Value
variants were distinct to allow users to call read_TYPE
with the confidence that the value returned was non-null.
This combination of characteristics required unwieldy code like the following, which recursively reads and counts the values at all levels of depth in the stream:
fn read_all_values<R: IonReader<Item = StreamItem>>(reader: &mut R) -> IonResult<usize> {
use IonType::*;
use StreamItem::{Nothing, Null as NullValue, Value};
let mut count: usize = 0;
loop {
match reader.next()? {
NullValue(_ion_type) => { // null values get their own code path
count += 1;
continue;
}
Value(ion_type) => {
count += 1;
// We need to match against the IonType of this value to know what read_* method to call
match ion_type {
String => {
// Each scalar read method has a `?` that handles both invalid data and
// the case where the reader is on a valid value of an unexpected type.
let _string = reader.read_str()?;
}
Symbol => {
// This demonstration code would read the value and discard it for timing purposes.
let _symbol_id = reader.read_symbol()?;
}
Int => {
let _int = reader.read_i64()?;
}
Float => {
let _float = reader.read_f64()?;
}
Decimal => {
let _decimal = reader.read_decimal()?;
}
Timestamp => {
let _timestamp = reader.read_timestamp()?;
}
Bool => {
let _boolean = reader.read_bool()?;
}
Blob => {
let _blob = reader.read_blob()?;
}
Clob => {
let _clob = reader.read_clob()?;
}
Null => {
// Matching against the IonType requires us to handle `Null` even though our StreamItem
// variants make that impossible.
}
// Reading a container requires you to mutate the state of the reader and then continue the loop,
// which can be difficult for developers to mentally model.
Struct | List | SExp => reader.step_in()?,
}
}
Nothing if reader.depth() > 0 => {
reader.step_out()?;
}
_ => break,
}
}
Ok(count)
}
In contrast, when you call LazyValue::read()?
, it returns a ValueRef
--an enum of the possible types it can return. Here's updated code that does the same thing as the IonReader
code above:
fn count_value_and_children<D: Decoder>(lazy_value: &LazyValue<D>) -> IonResult<usize> {
use ValueRef::*;
// Calling `read()` on the lazy value returns a `ValueRef`
let child_count = match lazy_value.read()? {
// For the container types, we can pass the container to another method to process
// its child values
List(s) => count_sequence_children(s.iter())?,
SExp(s) => count_sequence_children(s.iter())?,
Struct(s) => count_struct_children(&s)?,
// While the IonReader needed `match` arms for every Ion type to make sure that
// the value was consumed for timing purposes, in this API we have already called
// `read()` so there's no need to spell out all of the other cases.
_ => 0,
};
Ok(1 + child_count)
}
fn count_sequence_children<'a, D: Decoder>(
lazy_sequence: impl Iterator<Item = IonResult<LazyValue<'a, D>>>,
) -> IonResult<usize> {
let mut count = 0;
for value in lazy_sequence {
count += count_value_and_children(&value?)?;
}
Ok(count)
}
fn count_struct_children<D: Decoder>(lazy_struct: &LazyStruct<D>) -> IonResult<usize> {
let mut count = 0;
for field in lazy_struct {
count += count_value_and_children(&field?.value())?;
}
Ok(count)
}
Lazy values hold a reference to a slice of the input stream. As such, the Reader
cannot be advanced to the next top-level value while a LazyValue
is still in use. However, it also means that a LazyValue
--and its child values--can be read and re-read any number of times. This make working with structs much easier, as a LazyStruct
(returned in a ValueRef::Struct(...)
) provides a get()
that will find the requested field for you.
let data = r#"{foo: "red", bar: "blue", baz: "green"}"#;
let mut reader = Reader::new(data);
let lazy_value = reader.expect_next()?; // If there isn't a next value, returns an error
let lazy_struct = lazy_value.read()?.expect_struct()?; // If it isn't a struct, returns an error
// Use `get()` to find and read the value associated with the request field name.
assert_eq!(lazy_struct.get("bar")?, Some("blue"));
assert_eq!(lazy_struct.get("baz")?, Some("green"));
assert_eq!(lazy_struct.get("waffle")?, None);
// `get_expected` will return an error if the field is not found
assert_eq!(lazy_struct.get_expected("foo")?, "red");
LazyList
and LazySExp
offer value iterators, and LazyStruct
offers a field iterator for more manual operations.
Writer
Details
The Writer
type can write any value of a type that implements the WriteAsIon
trait, which includes the expected scalar types:
use ion_rs::{Writer, v1_0::Binary};
let mut writer = Writer::new(Binary, Vec::new())?;
writer.write(1);
writer.write(false.annotated_with("teeth"))?;
writer.write(Decimal::new(1999, -2).annotated_with("USD"))?;
let bytes = writer.close()?;
To write a container like a list or a struct, you can get a specialized container writer:
let mut list = writer.list_writer()?;
list
.write(1)?
.write(2)?
.write(3)?;
list.close()?;
let mut struct_ = writer.struct_writer()?;
struct_
.write("foo", 1)?
.write("bar", 2)?
.write("baz", 3)?;
struct_.close()?;
The WriteAsIon
trait has a single method and can be implemented for custom user types. Here's an example:
struct Point2D {
x: u32,
y: u32,
}
impl Point2D {
fn new(x: u32, y: u32) -> Self {
Point2D {x, y}
}
}
impl WriteAsIon for Point2D {
fn write_as_ion<V: ValueWriter>(&self, writer: V) -> IonResult<()> {
let mut struct_ = writer.struct_writer()?;
struct_.write("x", self.x)?
.write("y", self.y)?;
struct_.close()
}
}
let mut writer = Writer::new(Text, Vec::new())?;
writer.write(Point2D::new(1, 2))?;
writer.write(Point2D::new(3, 4))?;
writer.write(Point2D::new(5, 6))?;
let bytes = writer.close()?;
Other changes
- Experimental Ion 1.1 feature implementations
- Experimental tooling APIs to access the encoded bytes of item in the stream
- The CI test matrix now includes Amazon Linux (both
arm64
andx86_64
) serde
serializes unit structs as a symbol. (struct Foo;
is serialized asFoo
instead of"Foo"
)
What's Changed
- Adds Amazon Linux (arm64 and x86_64) to the build matrix by @popematt in #725
- Implements writing e-expressions in binary 1.1 by @zslayton in #722
- Adds a
StreamingRawReader
, switchesElement
to theLazyReader
by @zslayton in #727 - Removes experimental token stream, updates serde to use the lazy reader by @zslayton in #729
- Adds trait implementations by @zslayton in #730
- Updates dep versions, remove deprecated method use from example by @zslayton in #731
- Simplifies the
ValueWriter
trait family by @zslayton in #732 - Adds support for parameterized Rust enums in serde by @desaikd in #733
- Update CodeBuild Runner integration to include the run id and attempt by @popematt in #742
- Add support for binary 1.1 reader by @nirosys in #737
- Update GHA workflow to eliminate some redundant jobs by @popematt in #744
- Drops closure-based container writers by @zslayton in #741
- Introduces an application-level lazy writer by @zslayton in #745
- Removes the IonWriter trait and its implementations by @zslayton in #749
- Adds a raw text writer for Ion v1.1 by @zslayton in #750
- Values are no longer serialized with a type identifier annotation. by @zslayton in #751
- Removes the
IonReader
trait and its implementations. by @zslayton in #752 - Add 1.1 binary reader support for bools by @nirosys in #753
- Add 1.1 binary reader support for strings and integers. by @nirosys in #754
- Adds APIs for accessing encoding of raw stream items by @zslayton in #760
- Ensure that FixedInt and FixedUInt handle zero-length integers by @popematt in #762
- Text reader implementation cleanup by @zslayton in #763
- Add 1.1 binary reader support for symbols by @nirosys in #758
- Add 1.1 binary reader support for blobs and clobs by @nirosys in #755
- Add 1.1 binary reader support for floats by @nirosys in #756
- Add 1.1 binary reader support for decimals by @nirosys in #757
- Add 1.1 binary reader support for length-prefixed Lists & S-Expressions by @nirosys in #769
- Removes the crate's dependency on
BigInt
,BigUint
by @zslayton in #767 - Fix serde string deserialization for Ion symbols by @sajidanw in #771
- Feature gating and renames by @zslayton in #772
- Renames
TextKind
toTextFormat
, addswith_format
method by @zslayton in #773 - Doc cleanup and 2x bug fixes by @zslayton in #774
New Contributors
Full Changelog: v1.0.0-rc.3...v1.0.0-rc.4