-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
oops I forgot to commit half the docs
- Loading branch information
1 parent
f401dd8
commit d994921
Showing
2 changed files
with
61 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
.. _cartool: | ||
|
||
cartool | ||
======= | ||
|
||
``cartool`` aims to be a CLI swiss-army-knife for analysing and modifying atproto repos stored inside CAR files. | ||
|
||
.. code-block:: text | ||
USAGE: cartool COMMAND [args...] | ||
Available commands: | ||
info <car_path> : print CAR header and repo info | ||
list <car_path> : list all records in the CAR (values as CIDs) | ||
dump <car_path> : dump all records in the CAR (values as JSON) | ||
dump_record <car_path> <key> : dump a single record, keyed on ('collection/rkey') | ||
compact <car_in> <car_out> : rewrite the whole CAR, dropping any duplicated or unreferenced blocks | ||
diff <car_a> <car_b> : list the record diff between two CAR files | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
.. _overview: | ||
|
||
Library Overview | ||
================ | ||
|
||
If you have some `atproto repository <https://atproto.com/specs/repository>`_ data, and you want to operate on it with Python, you've come to the right place [1]_. The APIs offered here are rather low-level, but I'm planning on adding higher-level helper utilities in the future. | ||
|
||
.. [1] Maybe also check out `arroba <https://github.com/snarfed/arroba>`_! | ||
============= | ||
Block Storage | ||
============= | ||
|
||
The foundations of repos are content-addressed Blocks of data, as in the `IPLD <https://ipld.io/docs/motivation/benefits-of-content-addressing/>`_ data model. The abstract :meth:`~atmst.blockstore.BlockStore` interface facilitates access to blocks, agnostic of the underlying storage medium. The following implementations are available: | ||
|
||
* :meth:`~atmst.blockstore.MemoryBlockStore` - stores blocks in memory only (inside a dict) | ||
|
||
* :meth:`~atmst.blockstore.car_file.ReadOnlyCARBlockStore` - accesses the contents of a CAR file. | ||
|
||
* :meth:`~atmst.blockstore.SqliteBlockStore` - accesses blocks stored in a table of an sqlite database. | ||
|
||
Finally, the :meth:`~atmst.blockstore.OverlayBlockStore` class allows you to layer one BlockStore over another, with writes going to the top layer only. This is useful in several scenarios, for example, reading blocks from two CAR files at once so that you can diff them, or for staging modifications in memory ready to be committed to persistent storage. | ||
|
||
=================== | ||
Merkle Search Trees | ||
=================== | ||
|
||
With a BlockStore, we can read and write content-addressed blocks of data. Content-addressing is cool, but sometimes you want mutability. The `Merkle Search Tree <https://inria.hal.science/hal-02303490/document>`_ data structure builds on top of content-addressed Block storage, providing a mutable map of keys onto values. In atproto, the keys are arbitrary strings (under certain constraints), and the values are "records". | ||
|
||
Everything is still immutable under the hood, so modifying an MST results in a new root hash. | ||
|
||
:py:mod:`atmst` doesn't have a dedicated class to represent an MST (yet?), instead we just reference the root node by CID. | ||
|
||
===== | ||
Nodes | ||
===== | ||
|
||
An MST is comprised of one or more Nodes. :py:mod:`atmst` represents Nodes using :meth:`~atmst.mst.node.MSTNode`, an immutable dataclass. | ||
|
||
Nodes are ultimately stored in a BlockStore, serialised as `DAG-CBOR <https://ipld.io/docs/codecs/known/dag-cbor/>`_, and the :meth:`~atmst.mst.node_store.NodeStore` class facilitates this. A NodeStore also maintains an LRU cache, mapping CIDs to MSTNode objects, to reduce the impact of BlockStore read latency, hash verification, and deserialisation overheads. | ||
|
||
The :meth:`~atmst.mst.node_wrangler.NodeWrangler` class facilitates modifications to MSTs, and the :meth:`~atmst.mst.node_walker.NodeWalker` class facilitates access to MSTs, which the :meth:`~atmst.mst.diff.mst_diff` method makes use of. |