Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AppView for aggregating block events with Pindexer #4752

Closed
wants to merge 7 commits into from

Conversation

ejmg
Copy link
Collaborator

@ejmg ejmg commented Jul 24, 2024

Describe your changes

This PR provides an AppView for aggregating block events using Pindexer.

This PR is a 180º of an 180º. Initially I was going to completely fan out a series of AppViews for every type relevant to block events but I ended up with a wide filter on the indexer that dispatches based on whether or not the event has a transaction id. From there, the function handler handle_block_event inserts the new block event.

Besides any concerns with the design, known issues/concerns:

  • comprehensive coverage of all known block_event types. The current set checked are drawn from my own archives of older test nets.
  • design: json operations aren't free but, in the case of a block explorer, the form allows a single, trivial query to pull all data efficiently from the AppView (SELECT * FROM block_events WHERE height=$1).
  • I realized that I have no clue where block as an event type originates. I couldn't find the emitting code in the code base of penumbra (esp the protos crate) so I assumed it's a pseudo event of some sort. If I'm wrong, that needs fixing.

Issue ticket number and link

Checklist before requesting a review

  • If this code contains consensus-breaking changes, I have added the "consensus-breaking" label. Otherwise, I declare my belief that there are not consensus-breaking changes, for the following reason:

This PR only adds an AppView to those already provided by the pindexer crate.

@ejmg ejmg added the A-indexing Area: Relates to event indexing. label Jul 24, 2024
@ejmg
Copy link
Collaborator Author

ejmg commented Jul 24, 2024

I didn't get a reply on the indexer channel so I'm sorry for taking this approach in the end.

That said, if this is concerned OK, I plan on doing the same with Transactions.

If not, then my best idea for a different design is to create a series of modules (like those @hdevalence did for staking, delegation, etc), that all aggregate each type that is relevant and then it will be up to the consuming app/client to join over all of them. Alternatively, using a raw indexer might not be too bad of a choice? Querying info specific to a single block is the simplest query of cuiloa currently and relies on only a single CTE for aggregating transactions.

@ejmg ejmg changed the title WIP AppView for aggregating block events with Pindexer AppView for aggregating block events with Pindexer Jul 25, 2024
@cronokirby
Copy link
Contributor

How will this data be presented in the block explorer? Will we just spit out JSON blobs?

@ejmg
Copy link
Collaborator Author

ejmg commented Jul 25, 2024

Yep. The strategy is fanning out wide with is_relevant and aggregating all associated block events into a single json array with objects with the shape of { $ProtoEventSchemaURI: $ProtoEventJsonString }. They key will let the consuming client (cuiloa, etc) be able to dispatch the correct ProtoBuff type for decoding.

@cronokirby
Copy link
Contributor

cronokirby commented Jul 25, 2024

What if we had a static table layout instead, with one table for each kind of event we want to display?

Basically, I'm not quite sure what the benefit of having this table is over reading the raw indexing database?

Also, storing an array of JSON objects in a column is not what we want, I think, instead we would want multiple events, as different rows.

@cronokirby
Copy link
Contributor

For example, instead of recording that BlockRoot was one of the events in the block, instead we want to use this information to record that the block had a particular root hash. Similarly, for fees, we don't want to record this generically, instead we want to record that this block had particular fees, etc.

@ejmg
Copy link
Collaborator Author

ejmg commented Jul 25, 2024

Basically, I'm not quite sure what the benefit of having this table is over reading the raw indexing database?

This, more than anything, is something I was thinking the entire time. I don't know if you have looked closely to the raw indexer query I use for pulling in block info on Cuiloa but I achieve the same thing in roughly 20 lines of SQL and that's including a small CTE used for aggregating any related transactions. block_events is a view, so it's hiding an additional join but nothing else.

Why I still went ahead with writing this table was to see how it would work in terms of general performance and allowing the consuming client (Cuiloa) to have an extremely simple API to consume. Using the same design to aggregate transactions hashes associated to a block_id for a block_txs table would basically allow for a three liner of SQL for grabbing all events and transactions hashes for a given height. It's really convenient for the consumer, that's the biggest and only real argument I have for it.

Also, storing an array of JSON objects in a column is not what we want, I think, instead we would want multiple events, as different rows.

I wrote this with only Cuiloa in mind and not as a custom indexer for others to consumer, per se, with use cases that differ from Cuiloa. That said, I'm not making a principled argument that my approach is good.

What if we had a static table layout instead, with one table for each kind of event we want to display?
...
For example, instead of recording that BlockRoot was one of the events in the block, instead we want to use this information to record that the block had a particular root hash. Similarly, for fees, we don't want to record this generically, instead we want to record that this block had particular fees, etc.

These are questions that I started with and the solution I was original working out. This PR ended up going to the very opposite end of the spectrum by shoving every event into a single column. I'm not opposed to spinning out a bunch of custom indexers for each event type but I need a list of all ABCI events that are valid block events. It would also mean that a consuming application would need to perform at least 18 joins to comprehensively aggregate all possible block events. For transactions, that will be at least 25 joins. Is there a way we could simplify that or is that an acceptable trade off?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-indexing Area: Relates to event indexing.
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants