Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ethereum_blobs exclude from prod #6707

Merged
merged 1 commit into from
Sep 11, 2024
Merged

ethereum_blobs exclude from prod #6707

merged 1 commit into from
Sep 11, 2024

Conversation

jeff-dude
Copy link
Member

fyi @lorenz234 @0xRobin @hildobby @MSilb7 @Jam516
tagging as history shows all involved
this has started to fail in prod on the following error:

19:54:30    Database Error in model ethereum_blobs (models/_sector/blobs/ethereum/ethereum_blobs.sql)
  TrinoUserError(type=USER_ERROR, name=GENERIC_USER_ERROR, message="Task 20240911_193845_06449_mhikh.5.1.0 is failed, due to containing long running stuck splits.", query_id=20240911_193845_06449_mhikh)

i raised to the platform team to see if it's query engine related, but looks like it's new data coming through which is breaking the query planning.
first thought revolves around:

The blobs table contains lots of fun UTF8ish data, right? And that file in particular has a very fun blob that appears to contain a single blob with 300,000 null characters

seems like lots of nulls coming through unexpectedly somewhere?

if we want this spell re-enabled, feel free to open PR to remove prod exclude tag and apply fixes as necessary (or continue the conversation on how to fix).

@jeff-dude jeff-dude merged commit df8530a into main Sep 11, 2024
1 of 2 checks passed
@jeff-dude jeff-dude deleted the exclude-failing-model branch September 11, 2024 20:28
@github-actions github-actions bot locked and limited conversation to collaborators Sep 11, 2024
@Pluies
Copy link

Pluies commented Sep 11, 2024

I've managed to reproduce this on dune.com, see query 4063379.

The:

,ceil(cast(length(regexp_replace(cast(blob as varchar), '0*$', '')) - 2 as double) /2 ) AS used_blob_byte_count

Is an absolute killer on one of these blocks (not sure which!). Getting the blocks themselves, without computing used_blob_byte_count, is instant.

@belen-pruvost
Copy link
Contributor

Here is an alternative for trimming empty characters in a performant, varbinary native way. You can find more data about these functions in our docs

@0xRobin
Copy link
Collaborator

0xRobin commented Sep 12, 2024

Thank you for the detailed suggestions @Pluies, @belen-pruvost!
PR with a fix is already up. ✔️

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants