Add batch and order data jobs #448

harisang · 2024-12-06T23:15:59Z

This PR adds some functionality that is disjoint from the rest of the payouts script, and can be used to set up a cronjob that syncs some batch and order data to the analytics db. A separate dune-sync job is meant to be introduced in order to sync these tables to the corresponding dune tables.

In order to avoid conflict with the current state of the script, the batch and order data queries, which are almost identical to the order rewards and batch rewards queries, are presented as separate files. The block intervals on which the queries work are still to be double-checked.

To trigger the scripts locally and test the functionality, one can run

python -m src.data_sync.sync_data --sync-table order_data

or

python -m src.data_sync.sync_data --sync-table batch_data

fhenneke

It seems that you want to merge the solver rewards code with the dune sync code. That would require some code restructuring to make the code consistent. Or massive clanup if merged in a state close to the current state of the PR.

Is the long term plan to make this repo into a solver-accounting repo? Would all code have to be compatible or should it be separate folders with independent code?

src/dbt/__init__.py

src/dbt/config.py

src/models/block_range.py

fhenneke · 2024-12-09T10:25:11Z

src/fetch/orderbook.py

@@ -0,0 +1,251 @@
+"""Basic client for connecting to postgres database with login credentials"""


If we really want to merge the two codebases, this needs to be combined with pg_client.py.

src/fetch/orderbook.py

fhenneke · 2024-12-09T10:39:40Z

src/dbt/common.py

+def compute_block_and_month_range(  # pylint: disable=too-many-locals
+    node: Web3, recompute_previous_month: bool
+) -> Tuple[List[Tuple[int, int]], List[str], List[bool]]:
+    """
+    This determines the block range and the relevant months
+    for which we will compute and upload data on Dune.
+    """


This function might benefit from documenting the return value in a bit more detail.

fhenneke · 2024-12-09T10:42:22Z

src/dbt/sync_data.py

+    for i, _ in enumerate(block_range_list):
+        start_block = block_range_list[i][0]
+        end_block = block_range_list[i][1]


Suggested change

for i, _ in enumerate(block_range_list):

start_block = block_range_list[i][0]

end_block = block_range_list[i][1]

for i, (start_block, end_block) in enumerate(block_range_list):

Or better yet, zip all those lists and avoid the index i.

src/fetch/orderbook.py

Co-authored-by: Felix Henneke <[email protected]>

harisang · 2024-12-11T09:50:21Z

So some code is currently commented out; it is meant to do proper type casting but doesn't work as expected. It is also not clear if type casting is really needed so will investigate further.

For now, i am creating a tag from this branch so that we can proceed with a test deployment

fhenneke

I checked the logic of get_batch_data and it looks fine.

src/fetch/orderbook.py

fhenneke · 2024-12-11T22:52:49Z

queries/orderbook/order_data.sql

+) -- Most efficient column order for sorting would be having tx_hash or order_uid first
+
+select
+    '{{env}}' as environment,


Looks like a good idea!

fhenneke · 2024-12-11T22:56:11Z

src/fetch/orderbook.py

+    def get_order_data(
+        cls, block_range: BlockRange, config: AccountingConfig
+    ) -> DataFrame:
+        """
+        Decomposes the block range into buckets of 10k blocks each,
+        so as to ensure the batch data query runs fast enough.
+        At the end, it concatenates everything into one data frame
+        """
+        load_dotenv()
+        start = block_range.block_from
+        end = block_range.block_to
+        bucket_size = config.data_processing_config.bucket_size
+        res = []
+        while start < end:
+            size = min(end - start + 1, bucket_size)
+            log.info(f"About to process block range ({start}, {start + size - 1})")
+            res.append(
+                cls.run_order_data_query(
+                    BlockRange(block_from=start, block_to=start + size - 1)
+                )
+            )
+            start = start + size
+        return pd.concat(res)


I know that this is not horrible code duplication. Not many lines can be saved by moving the chunking into its own function and one would have to awkwardly pass a function as argument to it.
But then again the logic of chunking always has potential for off by one errors. So I would still think it is worth it to refactor this at some point into one function for chunking which then can also be properly tested.

fhenneke · 2024-12-11T22:57:35Z

src/fetch/orderbook.py

+            .replace(
+                "{{EPSILON_UPPER}}", str(config.reward_config.batch_reward_cap_upper)
+            )
+            .replace("{{results}}", "dune_sync_batch_data_table")


Is it possible to also add an {{env}} to the batch rewards query?

harisang added 9 commits December 7, 2024 01:14

add batch and order data jobs

2c086e5

fix comment

2a06874

add order data files

f38147c

minor fixes

0f81ea0

some restructuring

5e59695

fix pylint error

00eaa2d

slight edit in orderbookfetcher class

c5784d6

fix typos

38e93ea

black fixes

b3c6441

harisang marked this pull request as ready for review December 9, 2024 01:20

harisang added 2 commits December 9, 2024 03:31

fix block intervals

60e5f58

lowercase queries

2337088

harisang requested a review from fhenneke December 9, 2024 01:55

fhenneke requested changes Dec 9, 2024

View reviewed changes

harisang and others added 16 commits December 9, 2024 13:48

Merge branch 'main' into add_data_processing_job

65cd3db

rename folder

48d079d

Update src/fetch/orderbook.py

81c9423

Co-authored-by: Felix Henneke <[email protected]>

use existing sql queries

c8cf090

remove unecessary sql query

e6b5984

update batch rewards run for solver rewards script

ebe9f71

more sql fixes

cac906e

remove unecessary import

ca56856

more sqlfluff fixes

5b58577

add missing comma

f038820

remove unused function

51e60c5

use current config file

7bfe2d1

remove redundant function

2a2add0

fix errors

f8a6660

fix typo

f81ad3c

type cast fix

3aa64c4

harisang added 2 commits December 11, 2024 02:55

fix intervals and switch to explicit table nameS

e392d02

minor changes

6b15638

fhenneke reviewed Dec 11, 2024

View reviewed changes

fhenneke self-requested a review December 11, 2024 23:00

harisang and others added 4 commits December 12, 2024 03:56

update column name in create table query

d6ea342

address minor comments

d2264c1

Merge branch 'main' into add_data_processing_job

01b9db4

small fixes

8744893

fhenneke approved these changes Dec 13, 2024

View reviewed changes

fhenneke merged commit 7fc0851 into main Dec 13, 2024
6 checks passed

fhenneke deleted the add_data_processing_job branch December 13, 2024 16:42

github-actions bot locked and limited conversation to collaborators Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add batch and order data jobs #448

Add batch and order data jobs #448

harisang commented Dec 6, 2024 •

edited

Loading

fhenneke left a comment

fhenneke Dec 9, 2024

fhenneke Dec 9, 2024

fhenneke Dec 9, 2024

fhenneke Dec 9, 2024

harisang commented Dec 11, 2024

fhenneke left a comment

fhenneke Dec 11, 2024

fhenneke Dec 11, 2024

fhenneke Dec 11, 2024

		@@ -0,0 +1,251 @@
		"""Basic client for connecting to postgres database with login credentials"""

Add batch and order data jobs #448

Add batch and order data jobs #448

Conversation

harisang commented Dec 6, 2024 • edited Loading

fhenneke left a comment

Choose a reason for hiding this comment

fhenneke Dec 9, 2024

Choose a reason for hiding this comment

fhenneke Dec 9, 2024

Choose a reason for hiding this comment

fhenneke Dec 9, 2024

Choose a reason for hiding this comment

fhenneke Dec 9, 2024

Choose a reason for hiding this comment

harisang commented Dec 11, 2024

fhenneke left a comment

Choose a reason for hiding this comment

fhenneke Dec 11, 2024

Choose a reason for hiding this comment

fhenneke Dec 11, 2024

Choose a reason for hiding this comment

fhenneke Dec 11, 2024

Choose a reason for hiding this comment

harisang commented Dec 6, 2024 •

edited

Loading