Add text reader support for iterating through incomplete/invalid containers #759

sajidanw · 2024-05-03T02:00:54Z

Description

Add support in the Ion text lazy readers to be able to step into and iterate through incomplete/invalid containers (i.e., the closing token is missing or there's a syntax error within).

Some Ion readers in other languages, as well as the ion-rust reader that's present in the pre-1.0.0 releases, can support this through the next/step_in/step_out methods, but they don't report spans/ranges of the source text (requirement for me) and this doesn't map onto the APIs of the lazy readers that are in the current codebase.

Background

I've been experimenting with ion-rust for use within a language server for Ion text. As a language server, the parser I'm looking to develop on top of ion-rust needs to be tolerant to parsing errors, since a user may be in the middle of writing code but still expects some amount of language server functionality (completions, find references, etc.) to be available.

For my purposes, that generally means that I need to be able to parse "incomplete" containers — as in, lists, sexps, and structs whose closing tokens aren't present in the document; or which have an Ion value in the container that is unparseable (due to a syntax error or its own incompleteness). This is because I want to be able to semantically analyze all of the parseable data before the error occurred.

For example, imagine the user is developing the Ion Schema below:
type::{
  name: Person,
  type: struct,
  fields: {
    age: {
       type: int,
       valid_values: range::[1, // <-- The user's cursor is here!
    }
  }
}
At this cursor location, the user requests completions to know which values are available for them to specify (such as max). To an Ion parser that only supports complete values, this whole document would fail to parse entirely, even though there's enough information prior to the parsing error to build semantic knowledge and know what values can be suggested.

The lazy reader that's currently in the codebase doesn't support incomplete containers, since it's designed to match the container's full span from the opening [/(/{ to the closing ]/)/}. This design allows a client of the library to know that a matched value is fully complete and syntactically valid, but prevents a tolerant language server use-case in its current state.

Suggested Proposals

I think we don't want to change the default behavior of the list/sexp/struct parsers to return the container without identifying it as being incomplete or invalid (such as by returning an Ok match). Most clients are likely just in the business of parsing valid Ion, or at most surfacing which position in the input stream caused the violation, so changing this default behavior would be disruptive to the majority of users.

A couple of ideas I've thought about so far:

Allow the lazy reader to be configured to match on incomplete/invalid containers.
For example, adding a with_match_incomplete_containers method on the reader that sets an internal flag on the TextBufferView (and is copied to any new slices of the buffer). When the list/sexp/struct parser methods try to find the full span of the container but fail to match (either due to an IncompleteError or a DecodingError), this flag being set to true will change the behavior from propagating that Err to instead returning an Ok indicating a matched span of (<beginning offset of container>, <position of the incomplete/decoding error>). This would allow the user to use the resulting list/sexp/struct iterator to themselves iterate up to the parsing error.
- I kind of like this one because it's easy to implement (I was able to get it working well enough for my needs — at least for all of the parse error cases I've thought of), but something about it doesn't feel "idiomatic". Probably because of the way that the "matched value" has completely different characteristics based on this single flag despite no changes in the data structures or result objects. But it does make it much easier to use as a client of the library?
Model the parsing errors of containers explicitly, allowing the user to retrieve (or construct) a container iterator for the incomplete/invalid container.
When the list/sexp/struct parser attempts to find the full span of the container but receives an IncompleteError or DecodingError, propagate a container-specific parsing error that includes details of the span that failed to match (of (<beginning offset of container>, <position of the incomplete/decoding error>)) and the option to get an iterator anyway to step through the span.
- This seems like the better design in terms of the API surface, but it's a little more opaque to me about how this would be implemented, mainly due to my unfamiliarity with the code base. Would we still be able to do things like get the annotations that were on the incomplete container (I think so?)? How would we actually get an error with the structured detail propagated up to the user through the nom combinators?

And I'd absolutely be happy to try to do this as a PR — given some initial guidance/discussion on what the best approach might be. 😄

The text was updated successfully, but these errors were encountered:

sajidanw added the enhancement New feature or request label May 3, 2024

zslayton mentioned this issue May 21, 2024

Removes the crate's dependency on BigInt, BigUint #767

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add text reader support for iterating through incomplete/invalid containers #759

Add text reader support for iterating through incomplete/invalid containers #759

sajidanw commented May 3, 2024 •

edited

Loading

Add text reader support for iterating through incomplete/invalid containers #759

Add text reader support for iterating through incomplete/invalid containers #759

Comments

sajidanw commented May 3, 2024 • edited Loading

Description

Background

Suggested Proposals

sajidanw commented May 3, 2024 •

edited

Loading