Skip to content

Commit

Permalink
add some documentation and notes
Browse files Browse the repository at this point in the history
  • Loading branch information
mikeblas committed May 9, 2022
1 parent 17d54c0 commit 64be2b0
Show file tree
Hide file tree
Showing 4 changed files with 56 additions and 2 deletions.
2 changes: 1 addition & 1 deletion JankSQL/Engines/BTreeEngine/BTreeTable.cs
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ internal BTreeTable(string tableName, ExpressionOperandType[] keyTypes, IEnumera
/// <summary>
/// Initializes a new instance of the <see cref="BTreeTable"/> class as a heap.
/// Creates a "heap" table with no unique index. Our approach to this is a table that has a fake
/// "uniquifier" key as its bookmar_key. That single-column bookmark key maps to the values,
/// "uniquifier" key as its bookmark_key. That single-column bookmark key maps to the values,
/// which are all the columns given.
/// </summary>
/// <param name="tableName">string with the name of our table.</param>
Expand Down
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ On the other hand, we can expect that I'll want to extend the grammar to support

### Storage

The storage engine is based on the [CSharpTest.Net](https://github.com/csharptest/CSharpTest.Net.Collections) B-Tree implementation. The engines are pluggable through the `IEngine` and `IEngineTable` interfaces. Implementations for in-memory and on-disk storage against the CSharpTest B-Tree are supplied. A limited implementation against a CSV flat-file is also supplied.
The storage engine is based on the p. The engines are pluggable through the `IEngine` and `IEngineTable` interfaces. Implementations for in-memory and on-disk storage against the CSharpTest B-Tree are supplied. A limited implementation against a CSV flat-file is also supplied.

### Tests

Expand Down Expand Up @@ -65,6 +65,9 @@ The project is buildable, and I intend that the main branch always has all of it

There are lots of language features being added as I work, so the best way to see what's supported is to scan through the tests.

# Documentation

I've started writing [documentation](docs/index.md).

# Licensing

Expand Down
42 changes: 42 additions & 0 deletions docs/TableStructure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@

# Table Structure

JankSQL uses the [CSharpTest.Net](https://github.com/csharptest/CSharpTest.Net.Collections) B-Tree implementation, which is some amazing software. It implements a simple interface in its `BTree` generic class so that we can make a BTree of keys and values over `BTree<Key, Value>`. The class supports persistence and locking and gives enumerators that look up keys and walks the values available starting at a key.

In JankSQL, The `BTree` class is used with the `Tuple` class that implements a tuple of typed values, each represented with the `ExpressionOperand` class. Tuple represents a set of values, so it's used for both the key and the value. Thus, JankSQL's use of `BTree` is always on `BTree<Tuple, Tuple>`. Tuple has helpers that implement the comparison and persistence interfaces that `BTree` requires.

## Tables

JankSQL implements a table, then, with a key-value store built on a `BTree<Tuple, Tuple>` object. The value `Tuple` contains all of the columns of the table. The key is a `Tuple` that contains a single integer which is used as a monotonically increasing row ID.

Since the table has no index, any operation against it is a scan. Inserting a new row simply adds a one to the last used key and inserts the row as the value for that key. Deleting a row simply removes the row, and the key number is not re-used.

For now, this approach is quite adequate, but it does mean that a table can't survive more than 2<sup>32</sup> operations because the row ID value will wrap-around. (This is tracked by [Issue #2](https://github.com/mikeblas/JankSQL/issues/2)).

Conceptually, we can consider table's fundamental storage -- sometimes called it "heap", perhaps incorrectly -- to be a map between the row ID and the actual row payload: `BTree<RowID, Tuple>`. It's just that the row ID itself is implemented as a `Tuple`, too.

## Unique Indexes

A unique index in Jank augments the fundamental `BTree` with another access path. Each index is implemented a map from the keys of the index to the row ID. We can consider the table and the first index as an example:

```csharp
BTree<Tuple, Tuple> theTable; // key: row ID, value: rows
BTree<Tuple, Tuple> firstIndex; // key: index key, value: row ID
```

To find a row, we can look it up by key in `firstIndex` to get a row ID. Then, to get the remaining columns, the row ID is used to probe `theTable` to get that payload.

Any number of indexes can be created, all referencing back to `theTable` via the row ID key.

## Non-unique Indexes

Classically, BTrees implement only unique indexes: keys can't be duplicated. CSharpTest's implementation is no different, so Jank must provide some mechanism for handling duplicate key values in non-unique indexes.

Jank's approach simply appends a unique ID to the key set. If a non-unique index is created with key columns `Col1` and `Col2`, the effective key becomes `(Col1, Col2, uniqueifier)`. A probe for a value into a non-unique index naturally is a scan, since there may be zero, one, or more values matching the key due to its non-unique nature.

Jank is limited again by using a 32-bit integer here, so any non-unique index an have only 2<sup>32</sup> keys with the same value.

## Index maintenance

The addition or removal of a row to the table updates all indexes. Updating a value in an existing row must update the indexes that cover that column.

9 changes: 9 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Documentation

There are notes here about the implementation details, as well as information about writing code to use JankSQL.


## Implementation

* [Table Structure](TableStructure.md) describes how tables and indexes are built.

0 comments on commit 64be2b0

Please sign in to comment.