Verify that lazy containers require fully available bodies #739

zslayton · 2024-04-10T13:26:46Z

          This comment refers to a no-longer-true invariant.

At one time, the binary container types used to store however much of their body had been available in the input buffer at read time. Because 1.0 containers are always length-prefixed, only the header's encoding needed to be available in the input for the creation of the lazy container value (hereon: LCV) to succeed.

Conceptually, this allowed LCVs to avoid needing to buffer entire top-level values. The LCV could successfully visit and read however many of its child values' encodings were fully available and eventually fail at an incomplete value. In practice, however, this offers little benefit. It's much easier for applications (including the streaming reader wrapper) to handle early-bound incompleteness errors; writing data processing logic that can transactionally roll back and try again when more data is available is not fun.

Additionally, the introduction of delimited binary containers meant that skipping to the next value required that the end of the container be found via scanning. This could either happen when the container was first encountered (guaranteeing that the entire container was available in the process) or on demand when the container's next sibling was requested from the parent. Finding the end of the container at the outset means that:

We can guarantee the container is fully available
We can cache the lazy child values that we encountered
Future iterators over the container's contents can iterate over the cache instead of re-reading the data each time

Collectively, this model does mean that:

Truncated Ion data (i.e. partial data that will never be complete) cannot be read with this API. (We could extend the API to support this in the future with "advanced" methods.)
Top level values must fit in the buffer. This was once a problem in ion-java because Java uses signed 32-bit integers as its array indices, limiting arrays (and thus buffers) to a size of ~2GB. Rust does not have that limitation; applications wishing to structure their data this way are free to do so at the expense of RAM.

We need to add unit tests for the 1.0 and 1.1 binary readers that demonstrate early-bound availability errors for delimited and length-prefixed container types .

Originally posted by @zslayton in #737 (comment)

The text was updated successfully, but these errors were encountered:

zslayton · 2024-04-10T13:30:15Z

Related comment: #737 (comment)

zslayton mentioned this issue Apr 10, 2024

Add support for binary 1.1 reader #737

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verify that lazy containers require fully available bodies #739

Verify that lazy containers require fully available bodies #739

zslayton commented Apr 10, 2024

zslayton commented Apr 10, 2024

Verify that lazy containers require fully available bodies #739

Verify that lazy containers require fully available bodies #739

Comments

zslayton commented Apr 10, 2024

zslayton commented Apr 10, 2024