Log filter cache design #349
Replies: 2 comments 1 reply
-
is block inclusion guaranteed to be continuous or no? as in if there are logs for a given filter in blocks 0-100, can it also be true that there are logs for that filter in blocks 200-300, without the data in blocks 100-200? |
Beta Was this translation helpful? Give feedback.
-
Expanding on an excellent schema design from @i-norden (https://gist.github.com/i-norden/72f322afa9f07de0df3e340782ce1d1d). SchemaCREATE TABLE log_filter_criteria (
id BIGSERIAL PRIMARY KEY,
contract VARCHAR(66),
topic0 VARCHAR(66),
topic1 VARCHAR(66),
topic2 VARCHAR(66),
topic3 VARCHAR(66),
UNIQUE (contract, topic0, topic1, topic2, topic3)
);
CREATE TABLE filter_ranges (
filter_criteria_id BIGINT NOT NULL,
start BIGINT NOT NULL,
stop BIGINT NOT NULL,
PRIMARY KEY (filter_criteria_id, start, stop),
FOREIGN KEY (filter_criteria_id) REFERENCES eth.filter_criteria (id)
); OperationsRecord a range as cachedLet's say we have inserted a batch of logs returned from this const filter = {
address: ["0xabc"],
topics: ["blah"],
fromBlock: 0,
toBlock: 100,
} First, we insert a INSERT INTO log_filter_criteria
VALUES (
contract = "0xabc",
topic0 = "blah",
topic1 = null,
topic2 = null,
topic3 = null,
)
ON CONFLICT ("primaryKey") DO NOTHING
RETURNING id; Then, we insert a row representing the block range that we just inserted for this filter: INSERT INTO filter_ranges VALUES (
filter_criteria_id = $id,
start = 0,
end = 100,
); Check if logs are availableNow, let's say we have another eth_getLogs request that looks like: const filter = {
address: ["0xabc"],
topics: ["blah", ["sandwiches", "cookies"]],
fromBlock: 0,
toBlock: 100,
} We've added two values to match for topic1. This query is more specific than the previous one that we already ran. In other words, all the logs matching this query are already present in the database. We can use the following query to get all ranges that are available for this filter: SELECT start, stop
FROM filter_ranges
INNER JOIN log_filter_criteria ON (filter_ranges.filter_criteria_id = log_filter_criteria.id)
WHERE ( contract IS NULL OR contract = "0xabc" )
AND ( topic0 IS NULL OR topic0 = "blah" )
AND ( topic1 IS NULL OR ( topic1 = "sandwiches" OR topic1 = "cookies" ) )
AND topic2 IS NULL
AND topic3 IS NULL; The range that was inserted for the broader query above will be included in the result because In a complex example, the result list might look like: const data = [
[0, 100],
[1, 105],
[50, 150],
[100, 105],
[500, 502],
] It's possible to have redundant and overlapping ranges. To resolve this, we can use a utility function in JS that merges the ranges to return a "reduced" list of ranges. It's likely possible to move this logic into the SQL query, which would be cool (if you know how to do this, pls share!). Once we have the cached ranges, we can again use a JS utility function to determine which block ranges need to be fetched for this log filter, queue up tasks for each missing range, and proceed. Merging/reducing filter rangesFor "hot" log filters that contain logs in every block, Ponder inserts filter ranges one block at a time. So, the Ideally, the merging procedure would also take into account the |
Beta Was this translation helpful? Give feedback.
-
Edit: A clear(er) articulation of the problem
Say I’m a Ponder instance, and I call eth_getLogs for contract 0xabc in block range (0, 100). I get 3 logs back from the node at blocks 10, 25, 65. I insert them into a “logs” table in postgres. (The database indexes don’t matter).
Now, I’m a different Ponder instance. I’m also interested in contract 0xabc in block range (0, 100). I go to that same database and want to know: are the logs that I care about already available and cached in this database? If they are that’s great, I can skip calling eth_getLogs myself and query the database instead.
So, I run a SQL query to get all logs for contract 0xabc in block range (0, 100), and I would get those 3 logs back. But I have no way of knowing that the previous client cared about the same log filter as me. Maybe actually their filter was for block range (0, 80), and if I queried an ethereum node, I would find that there is another log at block 85 that I’ve now missed.
Question
Consider a service that inserts EVM logs into a SQL database and keeps track of which logs have been inserted.
Log filters have some fancy inclusion rules, which are the key consideration here. One simplification is that it's safe to assume
fromBlock
andtoBlock
will always be block numbers (not tags like "latest") and they will always be defined.Examples
Here's an example of the expected behavior:
Another example:
How could the insertLogs and hasLogs functions be implemented?
If we solve this, it would simplify our design, speed up the historical sync, and potentially unlock multi-tenant remote caching - multiple Ponder instances sharing the same remote cache (like turborepo).
Beta Was this translation helpful? Give feedback.
All reactions