Log filter cache design #349

typedarray · 2023-09-14T21:49:02Z

typedarray
Sep 14, 2023
Maintainer

Edit: A clear(er) articulation of the problem

Say I’m a Ponder instance, and I call eth_getLogs for contract 0xabc in block range (0, 100). I get 3 logs back from the node at blocks 10, 25, 65. I insert them into a “logs” table in postgres. (The database indexes don’t matter).

Now, I’m a different Ponder instance. I’m also interested in contract 0xabc in block range (0, 100). I go to that same database and want to know: are the logs that I care about already available and cached in this database? If they are that’s great, I can skip calling eth_getLogs myself and query the database instead.

So, I run a SQL query to get all logs for contract 0xabc in block range (0, 100), and I would get those 3 logs back. But I have no way of knowing that the previous client cared about the same log filter as me. Maybe actually their filter was for block range (0, 80), and if I queried an ethereum node, I would find that there is another log at block 85 that I’ve now missed.

Question

Consider a service that inserts EVM logs into a SQL database and keeps track of which logs have been inserted.

type Service = {
  insertLogs(filter: LogFilter, logs: Log[]): void;
  hasLogs(filter: LogFilter): Boolean;
};

// This is the input object to eth_getLogs.
type LogFilter = {
  address?: Hex | Hex[]
  topics?: (Hex | Hex[] | undefined)[]
  fromBlock?: number
  toBlock?: number
}

Log filters have some fancy inclusion rules, which are the key consideration here. One simplification is that it's safe to assume fromBlock and toBlock will always be block numbers (not tags like "latest") and they will always be defined.

Examples

Here's an example of the expected behavior:

insertLogs(["0xabc", undefined, 50, 100], logs)
// Insert logs for contract "0xabc" for block range 50-100.

hasLogs([“0xabc”, undefined, 50, 250]) -> false
// We don't have logs for block range 101-250.

hasLogs([“0xabc”, ["def"], 0, 50]) -> true 
// We have all logs for the contract in this range, including any with topic_0 = "def".

insertLogs([undefined, undefined, 101, 150], logs)
// Insert all logs for all contracts for block range 101-150.

hasLogs([“0xabc”, ["def"], 50, 150]) -> true 
// Now, we have all logs from “0xabc” in this range!

Another example:

insertLogs([undefined, [undefined, "xyz"], 0, 10_000], logs)
// Inserts all logs (from any contract!) with topic_1 = "xyz" for block range 0-10_000.

hasLogs([undefined undefined, 0, 10_000]) -> false
// This is asking for all logs from all contracts, which we don't have.

hasLogs([["0x123", "0xfea"], ["kek", "xyz"], 0, 10_000]) -> true
// We have all logs from all contracts with topic_1 = "xyz". So, we have all logs from contracts
// "0x123" and "0xfea" that have topic_1 = "xyz" AND topic_0 = "kek".

How could the insertLogs and hasLogs functions be implemented?

If we solve this, it would simplify our design, speed up the historical sync, and potentially unlock multi-tenant remote caching - multiple Ponder instances sharing the same remote cache (like turborepo).

shrugs · 2023-09-14T22:38:11Z

shrugs
Sep 14, 2023

is block inclusion guaranteed to be continuous or no? as in if there are logs for a given filter in blocks 0-100, can it also be true that there are logs for that filter in blocks 200-300, without the data in blocks 100-200?

1 reply

typedarray Sep 14, 2023
Maintainer Author

Yes, that is possible. In practice it's actually quite common to (temporarily) have "gaps" like you describe, because Ponder fetches batches of logs concurrently, and the HTTP responses come back at different times.

For example, we might simultaneously initiate 3 eth_getLogs requests for the same filter with ranges (0, 100), (101, 200), and (201, 300), and insert each into the database as soon as we get a response.

typedarray · 2023-09-18T20:19:29Z

typedarray
Sep 18, 2023
Maintainer Author

Expanding on an excellent schema design from @i-norden (https://gist.github.com/i-norden/72f322afa9f07de0df3e340782ce1d1d).

Schema

CREATE TABLE log_filter_criteria (
    id BIGSERIAL PRIMARY KEY,
    contract VARCHAR(66),
    topic0 VARCHAR(66),
    topic1 VARCHAR(66),
    topic2 VARCHAR(66),
    topic3 VARCHAR(66),
    UNIQUE (contract, topic0, topic1, topic2, topic3)
);

CREATE TABLE filter_ranges (
    filter_criteria_id BIGINT NOT NULL,
    start BIGINT NOT NULL,
    stop BIGINT NOT NULL,
    PRIMARY KEY (filter_criteria_id, start, stop),
    FOREIGN KEY (filter_criteria_id) REFERENCES eth.filter_criteria (id)
);

Operations

Record a range as cached

Let's say we have inserted a batch of logs returned from this eth_getLogs request:

const filter = {
  address: ["0xabc"],
  topics: ["blah"],
  fromBlock: 0,
  toBlock: 100,
}

First, we insert a log_filter_criteria row to represent the filter (psuedo-sql):

INSERT INTO log_filter_criteria
VALUES (
    contract = "0xabc",
    topic0 = "blah",
    topic1 = null,
    topic2 = null,
    topic3 = null,
)
ON CONFLICT ("primaryKey") DO NOTHING
RETURNING id;

Then, we insert a row representing the block range that we just inserted for this filter:

INSERT INTO filter_ranges VALUES (
    filter_criteria_id = $id,
    start = 0,
    end = 100,
);

Check if logs are available

Now, let's say we have another eth_getLogs request that looks like:

const filter = {
  address: ["0xabc"],
  topics: ["blah", ["sandwiches", "cookies"]],
  fromBlock: 0,
  toBlock: 100,
}

We've added two values to match for topic1. This query is more specific than the previous one that we already ran. In other words, all the logs matching this query are already present in the database.

We can use the following query to get all ranges that are available for this filter:

SELECT start, stop
FROM filter_ranges
INNER JOIN log_filter_criteria ON (filter_ranges.filter_criteria_id = log_filter_criteria.id)
WHERE ( contract IS NULL OR contract = "0xabc" )
AND ( topic0 IS NULL OR topic0 = "blah" )
AND ( topic1 IS NULL OR ( topic1 = "sandwiches" OR topic1 = "cookies" ) )
AND topic2 IS NULL
AND topic3 IS NULL;

The range that was inserted for the broader query above will be included in the result because topic1 IS NULL for that filter range.

In a complex example, the result list might look like:

const data  = [
  [0, 100],
  [1, 105],
  [50, 150],
  [100, 105],
  [500, 502],
]

It's possible to have redundant and overlapping ranges. To resolve this, we can use a utility function in JS that merges the ranges to return a "reduced" list of ranges. It's likely possible to move this logic into the SQL query, which would be cool (if you know how to do this, pls share!).

Once we have the cached ranges, we can again use a JS utility function to determine which block ranges need to be fetched for this log filter, queue up tasks for each missing range, and proceed.

Merging/reducing filter ranges

For "hot" log filters that contain logs in every block, Ponder inserts filter ranges one block at a time. So, the filter_ranges table might end up looking like [ [start=1, end=1], [start=2, end=2], [start=3, end=3], ... ] which isn't great. Today, we occasionally merge adjacent/overlapping rows, but this relies on the simple logFilterKey string key that we're trying to move away from.

Ideally, the merging procedure would also take into account the contract/topics logic, where a NULL value encompasses all specified values.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log filter cache design #349

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Log filter cache design #349

typedarray Sep 14, 2023 Maintainer

Edit: A clear(er) articulation of the problem

Question

Examples

Replies: 2 comments · 1 reply

shrugs Sep 14, 2023

typedarray Sep 14, 2023 Maintainer Author

typedarray Sep 18, 2023 Maintainer Author

Schema

Operations

Record a range as cached

Check if logs are available

Merging/reducing filter ranges

typedarray
Sep 14, 2023
Maintainer

Replies: 2 comments 1 reply

shrugs
Sep 14, 2023

typedarray Sep 14, 2023
Maintainer Author

typedarray
Sep 18, 2023
Maintainer Author