From 64be2b0aac80a373c02724df1b6ca83f13ed9b6d Mon Sep 17 00:00:00 2001 From: Mike Blaszczak Date: Sun, 8 May 2022 17:02:08 -0700 Subject: [PATCH] add some documentation and notes --- JankSQL/Engines/BTreeEngine/BTreeTable.cs | 2 +- README.md | 5 ++- docs/TableStructure.md | 42 +++++++++++++++++++++++ docs/index.md | 9 +++++ 4 files changed, 56 insertions(+), 2 deletions(-) create mode 100644 docs/TableStructure.md create mode 100644 docs/index.md diff --git a/JankSQL/Engines/BTreeEngine/BTreeTable.cs b/JankSQL/Engines/BTreeEngine/BTreeTable.cs index 182108c..d122214 100644 --- a/JankSQL/Engines/BTreeEngine/BTreeTable.cs +++ b/JankSQL/Engines/BTreeEngine/BTreeTable.cs @@ -75,7 +75,7 @@ internal BTreeTable(string tableName, ExpressionOperandType[] keyTypes, IEnumera /// /// Initializes a new instance of the class as a heap. /// Creates a "heap" table with no unique index. Our approach to this is a table that has a fake - /// "uniquifier" key as its bookmar_key. That single-column bookmark key maps to the values, + /// "uniquifier" key as its bookmark_key. That single-column bookmark key maps to the values, /// which are all the columns given. /// /// string with the name of our table. diff --git a/README.md b/README.md index c2a56b8..173ec27 100644 --- a/README.md +++ b/README.md @@ -28,7 +28,7 @@ On the other hand, we can expect that I'll want to extend the grammar to support ### Storage -The storage engine is based on the [CSharpTest.Net](https://github.com/csharptest/CSharpTest.Net.Collections) B-Tree implementation. The engines are pluggable through the `IEngine` and `IEngineTable` interfaces. Implementations for in-memory and on-disk storage against the CSharpTest B-Tree are supplied. A limited implementation against a CSV flat-file is also supplied. +The storage engine is based on the p. The engines are pluggable through the `IEngine` and `IEngineTable` interfaces. Implementations for in-memory and on-disk storage against the CSharpTest B-Tree are supplied. A limited implementation against a CSV flat-file is also supplied. ### Tests @@ -65,6 +65,9 @@ The project is buildable, and I intend that the main branch always has all of it There are lots of language features being added as I work, so the best way to see what's supported is to scan through the tests. +# Documentation + +I've started writing [documentation](docs/index.md). # Licensing diff --git a/docs/TableStructure.md b/docs/TableStructure.md new file mode 100644 index 0000000..76bf865 --- /dev/null +++ b/docs/TableStructure.md @@ -0,0 +1,42 @@ + +# Table Structure + +JankSQL uses the [CSharpTest.Net](https://github.com/csharptest/CSharpTest.Net.Collections) B-Tree implementation, which is some amazing software. It implements a simple interface in its `BTree` generic class so that we can make a BTree of keys and values over `BTree`. The class supports persistence and locking and gives enumerators that look up keys and walks the values available starting at a key. + +In JankSQL, The `BTree` class is used with the `Tuple` class that implements a tuple of typed values, each represented with the `ExpressionOperand` class. Tuple represents a set of values, so it's used for both the key and the value. Thus, JankSQL's use of `BTree` is always on `BTree`. Tuple has helpers that implement the comparison and persistence interfaces that `BTree` requires. + +## Tables + +JankSQL implements a table, then, with a key-value store built on a `BTree` object. The value `Tuple` contains all of the columns of the table. The key is a `Tuple` that contains a single integer which is used as a monotonically increasing row ID. + +Since the table has no index, any operation against it is a scan. Inserting a new row simply adds a one to the last used key and inserts the row as the value for that key. Deleting a row simply removes the row, and the key number is not re-used. + +For now, this approach is quite adequate, but it does mean that a table can't survive more than 232 operations because the row ID value will wrap-around. (This is tracked by [Issue #2](https://github.com/mikeblas/JankSQL/issues/2)). + +Conceptually, we can consider table's fundamental storage -- sometimes called it "heap", perhaps incorrectly -- to be a map between the row ID and the actual row payload: `BTree`. It's just that the row ID itself is implemented as a `Tuple`, too. + +## Unique Indexes + +A unique index in Jank augments the fundamental `BTree` with another access path. Each index is implemented a map from the keys of the index to the row ID. We can consider the table and the first index as an example: + +```csharp +BTree theTable; // key: row ID, value: rows +BTree firstIndex; // key: index key, value: row ID +``` + +To find a row, we can look it up by key in `firstIndex` to get a row ID. Then, to get the remaining columns, the row ID is used to probe `theTable` to get that payload. + +Any number of indexes can be created, all referencing back to `theTable` via the row ID key. + +## Non-unique Indexes + +Classically, BTrees implement only unique indexes: keys can't be duplicated. CSharpTest's implementation is no different, so Jank must provide some mechanism for handling duplicate key values in non-unique indexes. + +Jank's approach simply appends a unique ID to the key set. If a non-unique index is created with key columns `Col1` and `Col2`, the effective key becomes `(Col1, Col2, uniqueifier)`. A probe for a value into a non-unique index naturally is a scan, since there may be zero, one, or more values matching the key due to its non-unique nature. + +Jank is limited again by using a 32-bit integer here, so any non-unique index an have only 232 keys with the same value. + +## Index maintenance + +The addition or removal of a row to the table updates all indexes. Updating a value in an existing row must update the indexes that cover that column. + diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..cb35aee --- /dev/null +++ b/docs/index.md @@ -0,0 +1,9 @@ +# Documentation + +There are notes here about the implementation details, as well as information about writing code to use JankSQL. + + +## Implementation + +* [Table Structure](TableStructure.md) describes how tables and indexes are built. +