Skip to content

Architecture

V0ldek edited this page Apr 7, 2023 · 9 revisions

The rsonpath library has to be finely modular for at least three separate reasons. We need to be able to (not necessarily in order of priority):

  1. swap parts of the algorithm between generic portable versions and hyperoptimized architecture-specific implementations;
  2. swap classifier algorithms in-flight for best engine performance (we call this the State-driven Classifier Pipeline);
  3. manage the complexity of the solution, since it would otherwise drive anyone trying to grasp it insane.

If anything, the last point should be the one driving engineering decisions. Remember that Correctness is the core principle, and even the fastest algorithm is useless if it's written in such a complex way it's unmaintainable. Managing complexity is, after all, the primary objective of software engineering.

Overview

rsonpath simplified architecture

The query string is parsed and compiled into a DFA. The input is read from a file or stdin. The quote classifier handles strings and escapes in the JSON. The engine executes the DFA on the input by consuming the structural classifier and examining the stream of structural characters. It calls provided result implementations when a query match occurs. For performance it interacts with the classifier pipeline by switching between the structural and depth classifiers, and toggling some of the structural characters from classification.

  • The query module defines the JsonPathQuery structure. It has two submodules, the parser and the compiler.
    • The parser submodule implements the parser that turns a query string into the above structure.
    • The automaton submodule defines the Automaton, the final form of a query, and the compiler that turns it first into an NFA, and then "minimizes" to a DFA. Most of this compilation is internal, not public.
  • The classification module defines the classifier pipeline.
    • The quotes submodule defines the QuoteClassifiedIterator that serves as the first step of the pipeline, recognizing quoted strings and escape sequences.
    • The structural submodule defines the StructuralIterator that can be started or stopped on demand over a QuoteClassifiedIterator and lexes structural characters from the input.
    • The depth submodule defines the DepthIterator that can be started or stopped on demand over a QuoteClassifiedIterator and facilitates fast-forwarding based on document depth.
  • The engine module defines the core traits Compiler and Engine and the two engine implementations of rsonpath.
  • The result module defines the QueryResult trait for consuming query matches, and its implementations.
Clone this wiki locally