-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ANS-104 bundle indexing #24
Commits on Jul 14, 2023
-
feat(bundles): add bundle/data item GQL index schema PE-3769
Adds the DB schema required for indexing data items for GraphQL querying. Also includes a table for tracking bundle status (processed_at + data_item_count). Bundles use a separate SQLite DB (similar to data) to reduce lock contention and support greater bootstrapping flexibility.
Configuration menu - View commit details
-
Copy full SHA for b4418f7 - Browse repository at this point
Copy the full SHA b4418f7View commit details -
feat(sqlite): add bundle DB support to StandaloneSqlite PE-3769
Adds the wiring needed to use the new bundle DB in both the StandaloneSqlite class and the tests.
Configuration menu - View commit details
-
Copy full SHA for 4f0cee8 - Browse repository at this point
Copy the full SHA 4f0cee8View commit details -
refactor(sqlite): extract tx row construction helper functions PE-3769
Extracting some small helper functions so they can used when constructing data item rows too.
Configuration menu - View commit details
-
Copy full SHA for 8b4d8e2 - Browse repository at this point
Copy the full SHA 8b4d8e2View commit details -
feat(bundles): index ANS-104 bundles in new data item tables PE-3769
Records ANS-104 metadata in new data item tables. Flushing to stable data items tables is not yet implemented. Also implements propogation of a root parent transaction ID to the ANS-104 unbundler. A root parent transaction ID is needed to efficiently find and sort data items when executing GQL queries.
Configuration menu - View commit details
-
Copy full SHA for ab15e67 - Browse repository at this point
Copy the full SHA ab15e67View commit details -
feat(bundles): save stable ans-104 data items PE-3769
Adds SQL to flush stable data items to the stable data item and data item tags tables as well as remove flushed data from the new data item tables. This is still relatively unoptimized and is not yet exhaustive in its cleanup of stale data.
Configuration menu - View commit details
-
Copy full SHA for bb70d8c - Browse repository at this point
Copy the full SHA bb70d8cView commit details -
feat(bundles): improve data item height tracking + optimize stable fl…
…ushing PE-3769 Clears the heights on data items > fork height when forks occur and updates data items related to L1 TXs when L1 TX heights are set. Also adds a height condition to the query for data items to flush to stable to avoid unnecessary work when joining to L1 stable tables to retrieve cannonical heights. Note: further optimization may still be possible. It may be possible to eliminate one of the joins by replacing it with a join to stable_block_transactions if we add height to stable_block_transactions. Though, it's unclear how much performance improvement that would yield.
Configuration menu - View commit details
-
Copy full SHA for 98e76a9 - Browse repository at this point
Copy the full SHA 98e76a9View commit details -
fix(bundles): set data item heights even when L1 TX retrieval fails P…
…E-3769 Sometimes we can't fetch transactions when indexing a block. In those cases we still know the height, so we should ensure the height is set on any associated data items.
Configuration menu - View commit details
-
Copy full SHA for 4534830 - Browse repository at this point
Copy the full SHA 4534830View commit details -
perf(sqlite bundles): remove more data item flushing joins PE-3769
Further simplifies joins when copying new data items to the stable tables and cleaning up stale data items. Rather than getting height from stable L1 tables, we rely on height on the new data items and only join to stable L1 tables to get the block transaction index.
Configuration menu - View commit details
-
Copy full SHA for e5122f1 - Browse repository at this point
Copy the full SHA e5122f1View commit details -
feat(sqlite bundles): add ability to query stable data items PE-3769
Combines stable transactions and stable data items using a UNION in the SQL query. Each subquery in the UNION has its own ORDER BY and LIMIT. This allows the sub-selects to do most of the work before the union is computed. This change also implements returning parent/bundleIn for data items. However, filtering based on bundledIn and sorting data items by ID are not functional yet and will be implemented in future commits.
Configuration menu - View commit details
-
Copy full SHA for adad9a3 - Browse repository at this point
Copy the full SHA adad9a3View commit details -
feat(sqlite graphql): include data items in sorting and cursors PE-3769
Adds data items to GQL sorting and cursors. Data items are sorted by ID after block height and block TX index. ID was chosen as opposed to bundle offsets or indexes because we want duplicates of the same item sorted consistently where possible. Also, bundle data item indexes are potentially confusing when data item filtering is used. In order to accomplish this, the cursor condition in the query was changed from a simple numeric comparison to a set of comparisons against the cursor components. An OR is required in the comparisons to avoid comparing against irrelevant conditions (e.g. block TX index comparison when height > cursor height). This clutters the WHERE conditions, but is still fairly readable. Also it may perform better since it makes the height comparison legible to SQLite.
Configuration menu - View commit details
-
Copy full SHA for 3c904ca - Browse repository at this point
Copy the full SHA 3c904caView commit details -
feat(sqlite graphql): add bundledIn/parent filter support PE-3769
Implements the GraphQL bundledIn/parent filter (parent is depricated). Filtering on 'null' matches only L1 transactions. Data items queries are skipped in that case. This ensures users do not pay a performance penalty if they only want to query L1. Similarly, L1 transactions are skipped if a bundledIn filter is specified.
Configuration menu - View commit details
-
Copy full SHA for 3fdf15a - Browse repository at this point
Copy the full SHA 3fdf15aView commit details -
feat(sqlite graphql): support querying "new" data items PE-3769
Adds support for querying data items that have not yet been flushed to the stable (> 50 blocks old) tables. Note, there are still some edge cases to work out with this and new data querying in general. In particular, we don't currently support querying data that has not yet been associated with a block or data is technically stable but was indexed late (e.g. due to missing chunks) and has not yet been flushed.
Configuration menu - View commit details
-
Copy full SHA for 0f8894b - Browse repository at this point
Copy the full SHA 0f8894bView commit details -
feat(ans-104 bundles): add worker to index data items PE-3769
Adds a simple queue + worker index data items (similar to the one for indexing nested data). Currently there is no back pressure or other congestion control so if the queue gets too backed up it may crash the service. This issue will be address in a future commit.
Configuration menu - View commit details
-
Copy full SHA for f78b6da - Browse repository at this point
Copy the full SHA f78b6daView commit details -
fix(data): pause the cache stream after setting up internal handlers …
…PE-3769 We pause the stream to give the downstream consumer a chance to setup its own handler before data starts flowing. Of course, this still has to happen relatively quickly since node.js + the OS won't buffer indefinitely once data starts flowing over the network, but it should still prevent some obvious app level races.
Configuration menu - View commit details
-
Copy full SHA for 96d0680 - Browse repository at this point
Copy the full SHA 96d0680View commit details -
fix(bundles graphql): correctly return data items tags PE-3769
Add queries to retrieve data items tags and return them in GraphQL. In the SQLite DB implementation these are separate queries for convenience. If we were making requests to something like PostgreSQL we'd probably bundle this into the main query.
Configuration menu - View commit details
-
Copy full SHA for 6993449 - Browse repository at this point
Copy the full SHA 6993449View commit details -
perf(sqlite graphql): add new_data_item data_item_id index PE-3769
Since tags are retrieved in a second query by data_item_id, this significantly improves the performance of retrieving tags for data items that have not yet been flushed to the stable data items table (stable data items already have a similar index).
Configuration menu - View commit details
-
Copy full SHA for 1832d23 - Browse repository at this point
Copy the full SHA 1832d23View commit details -
feat(sqlite bundles): record all parent/child relationships for match…
…ing data items PE-3769 We don't want a data item with the same ID to appear multiple times in GraphQL, so we only insert unique IDs into new_data_items. However, we'd still like to have a record of all the bundles containing a particular ID. This is important if a bundle is removed (due to content moderation) or the parent association needs to be changed for any other unforeseen reason.
Configuration menu - View commit details
-
Copy full SHA for 9218dba - Browse repository at this point
Copy the full SHA 9218dbaView commit details -
fix(sqlite bundles): correct join condition for data item tags PE-3769
The wrong id column was being used for new data items and data item was missing from the stable data item join (not needed for transactions since height and block index are sufficient).
Configuration menu - View commit details
-
Copy full SHA for 29b2ee9 - Browse repository at this point
Copy the full SHA 29b2ee9View commit details -
chore(sqlite): improve worker error logging PE-3769
Adds a try/catch in the worker thread to log errors. Also alters the error handling in workers so that workers no longer immediately exit when an error occurs. Instead they wait till an error threshold is reached (currently 100 errors) and then exit. This preserves some level of "fail fast" error handling while reducing overhead of creating a new worker after every error.
Configuration menu - View commit details
-
Copy full SHA for 199bfe4 - Browse repository at this point
Copy the full SHA 199bfe4View commit details -
doc(sqlite): add WIP bundle schema docs PE-3769
Adds WIP bundle schema docs generated by SchemaSpy. Run ./scripts/schemaspy to generate the docs in ./docs/sqlite/bundles. SchemaSpy properties and schema metadata are stored in ./docs/sqlite/bundles.properties and ./docs/sqlite/bundles.meta.xml respectively.
Configuration menu - View commit details
-
Copy full SHA for 11f7e9d - Browse repository at this point
Copy the full SHA 11f7e9dView commit details -
feat(sqlite bundles): add filter_id and parent_index to bundle_data_i…
…tems PE-3769 Adds a parent_index and filter_id to bundle_data_items. parent_index (numeric index of the parent bundle in its parent bundle) distinguishs between data_items contained in duplicate parents in the same bundle. filter_id records the filter that caused the data item to be indexed (useful when determining what needs to potentially be reprocessed later).
Configuration menu - View commit details
-
Copy full SHA for 56a6dfb - Browse repository at this point
Copy the full SHA 56a6dfbView commit details -
refactor(bundles ans-104): push filtering down into worker PE-3769
This moves filtering down into the parser so that we can (in a future commit) emit an event that indicates how many data items within each bundle matched the filter. We want that in order to detect bundles that failed to import successfully. There are a couple of side benefits of this too - 1. it moves more work out of the main thread; 2. it reduces the amount of messages that go back to the main thread.
Configuration menu - View commit details
-
Copy full SHA for 2a6bf3e - Browse repository at this point
Copy the full SHA 2a6bf3eView commit details -
feat(bundles ans-104): emit unbundle complete events PE-3769
Adds unbundle complete events containing - filter string used to match data items, total data item count, matched data item count. These events will be used to index bundles in the DB. The filter string is included so that we know which bundles need reprocessing when it's changed.
Configuration menu - View commit details
-
Copy full SHA for db55aba - Browse repository at this point
Copy the full SHA db55abaView commit details -
feat(bundles filters): canonicalize bundle filter string PE-3769
Use a canonical JSON representation for filters to avoid storing the same filter multiple times in the DB.
Configuration menu - View commit details
-
Copy full SHA for 24b219e - Browse repository at this point
Copy the full SHA 24b219eView commit details -
feat(bundles filters): record data item filters in the DB PE-3769
Records the filter string used to determine which data items to match on the bundle_data_items table in the DB. This can be used when filters change to help determine what to re-index when filters changes.
Configuration menu - View commit details
-
Copy full SHA for 3e76b93 - Browse repository at this point
Copy the full SHA 3e76b93View commit details -
feat(bundles): add bundle process tracking PE-3769
Records bundle records that include first and last timestamps for queuing, skipping, unbundling, and indexing (note: indexing timestamp column is present, but not yet set). Data items counts, both total and matched by the index filter, are also recorded as well as the IDs of the filters used to match both the bundle and the data items in it. These can be used later to decide when to reprocess bundles. Note: 'last_fully_indexed_at' is handled slightly differently from other 'last_*' timestamps. Most are not overwritten if they're already set but 'last_fully_indexed_at' is. It's assumed that if the bundle record is being updated in some way it means the bundle is being reprocessed and it's indexing status should be cleared unless it's explicitly set as part of the update.
Configuration menu - View commit details
-
Copy full SHA for 38cb2c1 - Browse repository at this point
Copy the full SHA 38cb2c1View commit details -
fix(bundles data): fix infinite recursion when parent data is missing…
… PE-4054 The recursive case when getting parent data was incorrectly passing the original ID instead of the parent ID. That lead to infinite recursion since it was continually finding the same parent and then trying to download it. This change corrects that and fixed what appeared to be an issue with setting passing the size for nested bundles. The size should always be the original size. It's only the offset that should be added to during recursion.
Configuration menu - View commit details
-
Copy full SHA for ff07818 - Browse repository at this point
Copy the full SHA ff07818View commit details -
refactor(data cache): simplify and comment cache size logic PE-4054
Small change - removes one unnecessary fallback and adds a couple comments explaining the size logic.
Configuration menu - View commit details
-
Copy full SHA for 58665d9 - Browse repository at this point
Copy the full SHA 58665d9View commit details -
feat(bundles repair): add bundle repair worker PE-4041
Adds a bundle repair worker that queries `bundles` and `bundle_data_item` tables to determine which bundles have been fully imported. It does this by setting bundle `last_fully_indexed_at` based on a comparison of `bundle_data_items` for each bundle to `matched_data_item_count` on the bundles (taking filters into account) and then using those `last_fully_indexed_at` timestamps to determine if the bundle should be reprocessed.
Configuration menu - View commit details
-
Copy full SHA for 477511e - Browse repository at this point
Copy the full SHA 477511eView commit details -
feat(sqlite bundles): index nested ANS-104 bundles PE-3639
Adds ANS104_NESTED_BUNDLE_INDEXED and ANS104_BUNDLED_INDEXED events. ANS104_NESTED_BUNDLED_INDEXED is emitted when a nested ANS-104 bundle is indexed and ready for processing and ANS104_BUNDLE_INDEXED is a more general event that is emitted when either a nested ANS-104 or a L1 ANS-104 bundle is ready for processing. Also modifies existing bundle event handling logic to use the new combined event and handle both L1 TXs and data items.
Configuration menu - View commit details
-
Copy full SHA for 9405022 - Browse repository at this point
Copy the full SHA 9405022View commit details -
feat(bundles): add a process to reindex bundles after a filter change…
… PE-4115 Adds a process that resets bundle timestamps for bundles that were processed with different filters than are currenly in use. Since the process creates some DB load even if the filters are unchnaged, it is only enabled when the FILTER_CHANGE_REPROCESS environment variable is set to true. In the future we may optimize this further by keeping a log of filter changes. That would enable more efficient queries based on comparing timestamps (< filter change time) rather than filter IDs (using an inequality).
Configuration menu - View commit details
-
Copy full SHA for a75455d - Browse repository at this point
Copy the full SHA a75455dView commit details -
refactor(bundles ans-104): use owner address from data item instead o…
…f rehashing Prior to this change we were hashing the owner key to get the owner address. This change uses the owner address from the data item instead. These should always be the same value so rehashing is unnecessary. Note: I ran a test comparing the values and on the sample of data items I processed there were no differences.
Configuration menu - View commit details
-
Copy full SHA for 66181c7 - Browse repository at this point
Copy the full SHA 66181c7View commit details
Commits on Jul 17, 2023
-
feat(filters): support on-demand owner hashing PE-4214
In order to simplify filter construction, if owner_address is set in a filter, but only owner is present on the matchable item (L1 TXs don't include the address), hash owner on-demand to produce and owner_address and match against that.
Configuration menu - View commit details
-
Copy full SHA for d3e9457 - Browse repository at this point
Copy the full SHA d3e9457View commit details