Test to verify if filtering by merkleRoot works #1170

mateusz-sekara · 2024-07-09T13:54:12Z

Motivation

We've implemented Solution 2 from this #726 and it caused problems during the reorgs so it was reverted #1155

Caching block_timestamps of the CommitRoots that are not finalized is extremely tricky because during the reorg CommitReport can come back to the logpoller with a different timestamp. This causes issues when some of the CommitReports might be forever skipped because the cached timestamp is higher than a new value. Keeping consistency between the cache and LogPoller would be an overkill so we need to pursue an alternative solution.

Solution

After adding DSL to the LogPoller, Solution 1 #726 sounds like a very reasonable way to go. It doesn't require any changes to LogPoller, it could be implemented using the FilteredLogs and it's completely immune to all reorgs - we move in-memory skipping snoozedRoots to the database level. Therefore, we keep fetching a reasonable number of logs to the memory. It works only on finalized logs so no need to sync any state between LP and cache. Also, the snoozed roots cache is very simple again, I have to admit that #726 led to a very complex solution at some point.
On the other hand, it's critical to load test that change to verify how filtering out snoozed roots on the db level behaves.

Based on the &726 we want to pass executed roots to the database query to fetch only unexpired roots from the database. Unexpired roots can be determined from the snoozedRoots cache. This is probably the most accurate approach because it always fetches the minimal required set from the database.
You can think of the query like this

select * from evm.logs
where address = :addr
and event_sig = :event_sig
and substring(data, X, Y) not in (:executed_roots_array)
and block_timestamp > :permissionlessThresholdWindow

0xnogo · 2024-07-09T14:11:32Z

core/services/ocr2/plugins/ccip/internal/ccipdata/v1_2_0/commit_store_test.go

+
+	logs, err = lp.LogsDataWordRange(ctx, eventSig, commitStoreAddr, 4, merkleRoot, merkleRoot, 0)
+	require.NoError(t, err)
+	require.Len(t, logs, 3)


nit/ we could also verify we are getting the expect log by checking on the merkleRoot

) ## Motivation Detailed context is here #726, but long story short scanning the entire `permissionLessExecThreshold` / `messageIntervalVisibility` from every Commit plugin instance (think of it as a lane) puts a lot of pressure to the database. For most of the cases, we don't need the entire window but only Roots that are not executed yet. Unfortunately, the solution suggested in the mentioned PR was faulty because it relied on `block_timestamp` of unfinalized blocks. Re-orgs on Polygon exposed that problem, details and revert here #1155 ## Solution Divide the cache into two layers * finalized commit roots are kept in memory because we assume these never change * unfinalized commit roots are fetched from LogPoller, and finalized logs are promoted to the finalized part There should be no additional memory footprint because we kept all executed and finalized roots in cache memory anyway. The performance impact of this approach should be similar to the #726 , because we keep scanning only unfinalized chunks of the LogPoller so the number of logs fetched from the database would be very limited. However, there is one caveat here, LogPoller doesn't mark logs as finalized/not finalized by default so we need to ask for the latest finalized block in the Commit Reader directly and enrich logs with that information (similar approach to what we do for Offramp reader). Thankfully, asking the database for the latest log is extremely fast and (if needed) could be optimized by caching latest finalized block in the LogPoller/HeadTracker. There will be load tests conducted here to confirm performance assumptions. Please see the code and tests for more details chainlink-common PR smartcontractkit/chainlink-common#650 ### High level picture ![image](https://github.com/smartcontractkit/ccip/assets/18304098/5c7e04e7-9703-4228-ac10-d264ab6e184a) ### Alternative solutions: [I've tried implementing the solution in which the cache is simplified and keeps only executed/snoozed roots](#1170). Snoozed Roots are passed to the database query and used for filtering out executed and finalized roots. However, it works only for some number of snoozed roots. After adding more than 250 snoozed roots to the LogPoller query, the driver/database started to cut off some of the filters making it unreliable (erroring or not filtering properly). Also, there is a risk of hitting the Postgres query size limit (because every root has to be passed in SQL) which might lead to some limbo state which the plugin can't recover from. ## Tests ### Before - fetching all CommitReports always <img width="1293" alt="image" src="https://github.com/user-attachments/assets/ea32eccb-8d76-468c-9018-01d4989968f7"> ### After - fetching only unfinalized CommitReports and caching finalized ones <img width="1283" alt="image" src="https://github.com/user-attachments/assets/3732f438-063e-4e8f-86a8-849684e88970"> ### Summary Caching roots improved the performance, query is smaller in terms of the objects scanned and returned. Number of roots returned is constant almost during the entire test (around ~5 reports at once) Sightly higher memory footprint (around 800MB higher) but lower CPU utilization during tests (~15% drop) <img width="2500" alt="image" src="https://github.com/user-attachments/assets/027cb379-03ec-419c-a289-7b1df3e25cd8">

Test to verify if filtering by merkleRoot works

e2cd01f

mateusz-sekara requested a review from a team as a code owner July 9, 2024 13:54

mateusz-sekara temporarily deployed to sdlc July 9, 2024 13:54 — with GitHub Actions Inactive

0xnogo approved these changes Jul 9, 2024

View reviewed changes

Filtering out only snoozed roots directly from the LP query

77a7395

mateusz-sekara requested a review from a team as a code owner July 10, 2024 10:15

Tests to verify how query works

e094b17

mateusz-sekara mentioned this pull request Jul 11, 2024

CCIP-2791 Long term solution for snoozing and caching CommitRoots #1182

Merged

mateusz-sekara closed this Jul 15, 2024

mateusz-sekara deleted the commit-roots-cache-impro branch July 15, 2024 10:45

0xnogo mentioned this pull request Jul 31, 2024

Merge Chainlink v2.14.0 #1241

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test to verify if filtering by merkleRoot works #1170

Test to verify if filtering by merkleRoot works #1170

mateusz-sekara commented Jul 9, 2024 •

edited

Loading

0xnogo Jul 9, 2024

Test to verify if filtering by merkleRoot works #1170

Test to verify if filtering by merkleRoot works #1170

Conversation

mateusz-sekara commented Jul 9, 2024 • edited Loading

Motivation

Solution

0xnogo Jul 9, 2024

Choose a reason for hiding this comment

mateusz-sekara commented Jul 9, 2024 •

edited

Loading