Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add raw batch data wrapper query #77

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

harisang
Copy link
Contributor

@harisang harisang commented Dec 2, 2024

This PR introduces query 4351957, that aims to provide a wrapper for all raw batch data tables we might upload on Dune, for all chains we are operating in.

It unifies the different tables that are already existent, and also finally introduces the auction id and environment columns for each batch/auction.

The plan is that this query will be updated in an automated fashion whenever a new table (beginning of each month) is introduced.

Copy link
Contributor

@fleupold fleupold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the folder structure is a good choice. This seems strictly accounting related so it should go under cowprotocol/accounting.

cast(capped_payment as decimal(38, 0)) as capped_payment,
cast(winning_score as decimal(38, 0)) as winning_score,
cast(reference_score as decimal(38, 0)) as reference_score
from dune.cowprotocol.dataset_batch_data_{{blockchain}}_2024_10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to add these 15 lines manually every month? Can we not come up with a better solution where this is automatically generated based on the current date?

If we go down this route, I'd at least like to see this query re-written in a way the the redundant part becomes just

union all
select * from <new month table>

Otherwise this file will be a horror in 6 months from now.

Copy link
Contributor Author

@harisang harisang Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is that no one touches this query and this is updated automatically by the script that uploads data on Dune.

Indeed, ideally we should have a select *. These were uploaded using dune-sync-v1. If we are able to do proper type casting with dune-sync-v2, this could go away.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this is updated automatically by the script that uploads data on Dune.

How would this work specifically? Would the script would make a pull request to this repository and automatically merge a change to the query on the first of each month?

Let's please think this process through to avoid a bad surprise 4 weeks from now when we think we are "done" with this project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok good point about how to sync with this repo. I hadn't thought about this. So what I have in mind is basically what Bram has done when testing some versions of the sync in the Prefect repo: https://github.com/cowprotocol/Prefect/blob/a233d2831e936aa05ff4a7984aa1116580402e11/config/tasks/dune_sync.py#L121

Basically, since the dune-sync job will take a timestamp in order to work, if the timestamp says that it is the first day of the month, it would update the dune query automatically by pushing a change directly to Dune. Which means that the version of the query in this repo would get outdated. Unless we allow the script to also push directly to main, but i am not sure if this is necessary, since again, this query is not to be actively maintained by anyone.

Copy link
Contributor Author

@harisang harisang Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative that I tried was to actually prefill the query with all tables for the next few years (!). But it seems that Dune complains, as expected, for non-existent tables. Not sure if there is a workaround to that (we could create dummy tables for the next 48 months for example but not sure if we want that)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having this be an automated job makes sense (likely it would still benefit from having the only redundant part be select * from table_x rather than all the select statements. Therefore this query should not live here I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants