Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FLIP proposal to expand Access Streaming API #229

Conversation

Guitarheroua
Copy link
Contributor

Draft proposal: Expanding the current Access Streaming API by introducing new subscription endpoints.

@Guitarheroua Guitarheroua changed the title Expand Access Streaming API FLIP proposal to expand Access Streaming API Dec 15, 2023
@Guitarheroua Guitarheroua marked this pull request as ready for review December 15, 2023 14:28
Copy link
Contributor

@peterargue peterargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the FLIP @Guitarheroua! Overall it's well outlined. I added a few comments where some more information would help.

@franklywatson franklywatson added the flip: protocol Protocol FLIP label Jan 11, 2024
@Guitarheroua
Copy link
Contributor Author

@peterargue I fixed comments, have one question. Would be great, if you could check it, when you will have some time. Thanks in advance!

@bjartek
Copy link
Contributor

bjartek commented Jan 12, 2024

Is it possible to have a streaming api that could give me sealed blocks with the following information:

  • Header: normal block header
  • Tx: the amount of tx in the block
  • HasSystemEvents: true/false

This would make the indexer in flowdiver a lot easier and more robust.

@Guitarheroua
Copy link
Contributor Author

Is it possible to have a streaming API that could give me sealed blocks with the following information:

* Header: normal block header

* Tx: the amount of tx in the block

* HasSystemEvents: true/false

This would make the indexer in Flowdiver a lot easier and more robust.

I'm unsure if introducing a specific endpoint for this case is the best approach. From my perspective, it could be handled by the general call described in this FLIP - SubscribeBlocks with the BlockStatusSealed argument. However, there is a concern that this approach would bring unnecessary data that needs to be filtered on the client side.

What solution I see here:

  1. Add a new endpoint for this specific case:
    • Pros: Access Node (AN) returns a ready result for this specific case.
    • Cons: Sets a precedent for adding an endpoint for specific cases on AN, potentially leading to a specialized rather than generalized subscribe API structure and overwhelming AN with unnecessary workload.
  2. Move responsibility for implementing specific cases to the "flow-go-sdk" or other third-party handler, that could filter out unnecessary information.
  3. Expand the described SubscribeBlockDigests endpoint and add 2 more fields to it - TxCount and HasSystemEvents, so this endpoint would be a kind of Metadata for the block. Then SubscribeBlockDigests and SubscribeBlockHeaderscould be used to retrieve necessary info.
  4. As was proposed in the forum discussion, leave only one SubscribeBlocks endpoint, but add a specific filter for fields, that the client wants to retrieve, for example: ID, Header, HasSystemEvents, TxCount, etc. But this will also add extra workload for the AN node to filter block fields.

@peterargue maybe you have some thoughts on this?

@bjartek
Copy link
Contributor

bjartek commented Jan 16, 2024

The flow of the "engine" i flowdiver atm is as follows:

  • GetTheBlock
  • GetTransactionsByBlockID (we call flowkit GetTransactionByBlockID)
  • GetTransactionsResultByBlockID

Then i have to massage the return data to extract out systemChunkTransactionEvents. If I have any of those i enqueue a job to index them separately since we don't want this job to slow down the main queue.

I am also very conscious to try to minimize the data that goes over the wire since it can be a lot, and with EVM stuff this will probably just increase.

Some of this is not directly related to this FLIP maybe, but having an endpoint that returns all user sent transactions with results that are sealed and information about if this block had systemChunkEvent would also be nice. It just limits the amount of calls that you have to do to the api.

Just this streaming api alone will limit the amount of calls our engine has to do to the api with hundreds of requests a minute.

@peterargue
Copy link
Contributor

I agree that we should keep this endpoint generalized. I think adding counts like txs and events is OK, but I don't think it would be practical to include them in a streaming block endpoints.

The block data only includes a list of collectionIDs, so the AN needs to first sync all of the collections from the network to count the number of tx. Generally this is done before the block is sealed, but not always. Event data is synced from execution nodes after the block is sealed. So they are never available until some time after the block is sealed (usually <1s but depends on the node's resources and location).

I think this type of data would work in an endpoint that notifies when data for a block is indexed. At that point, all of this data would be available. I haven't heard anyone else requesting it these, so it probably would come in a second iteration. We will need to add some endpoint(s) to expose this data anyway.

@bjartek
Copy link
Contributor

bjartek commented Jan 16, 2024

Knowing if it is empty or not is also useful.

protocol/20231215-accessnode-streaming-api-expansion.md Outdated Show resolved Hide resolved
protocol/20231215-accessnode-streaming-api-expansion.md Outdated Show resolved Hide resolved
protocol/20231215-accessnode-streaming-api-expansion.md Outdated Show resolved Hide resolved
protocol/20231215-accessnode-streaming-api-expansion.md Outdated Show resolved Hide resolved
**Expected response:**

- **ID** (The ID of the tracked transaction)
- **Status** (The status of the tracked transaction)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be helpful to define what the expected statuses will be e.g. (pending, finalized, executed, sealed, expired)


- **BlockId** (The block ID of the block containing the statuses)
- **Address** (The adress of the tracked account)
- **Status** (The status of the tracked account)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Statuses are effectively events right? Maybe this should just be called "Events" and return the full event objects. In the future if we want to include additional information, we could add other fields to the response.

@bjartek
Copy link
Contributor

bjartek commented Jan 17, 2024

After talking with @peterargue in dms i actually think most of my issues can be solved with a subscrtions on blocks, so header and payload. Since i can know if it has any transactions by looking at the absence of collection guarantees. And I can handle system events in a seperate subscription.

@bjartek
Copy link
Contributor

bjartek commented Jan 18, 2024

What are the semantics of the serviceEvent streaming endpoint?

If it is not covered an endpoint that only streams events emitted in the systemChunk transaction would be very nice to have. So not filtered on event but filtered on how they where emitted.

This would play nicely with streaming either transactions or blocks as you then have primitives to make a full ingestion system based on that.

@bjartek
Copy link
Contributor

bjartek commented Jan 20, 2024

In order to be 100% sure you got everything from a sub would it be possible to include some information in response about what the last message sent was? And have the ability to later get the missing data?

How would this architecture work on a flaky network for instance?

@Guitarheroua
Copy link
Contributor Author

Guitarheroua commented Jan 22, 2024

What are the semantics of the serviceEvent streaming endpoint?

If it is not covered an endpoint that only streams events emitted in the systemChunk transaction would be very nice to have. So not filtered on event but filtered on how they where emitted.

This would play nicely with streaming either transactions or blocks as you then have primitives to make a full ingestion system based on that.

I am sorry for not getting back to you sooner!

Yes, you are correct! Service events are not covered in this flip, this Idea could go to the next iteration of expanding streaming API.

@Guitarheroua
Copy link
Contributor Author

To be 100% sure you got everything from a sub would it be possible to include some information in response about what the last message sent was? And have the ability to later get the missing data?

How would this architecture work on a flaky network for instance?

I need to clarify: Are your questions aimed at the entire Acess API subscription system and include all subscription endpoints, or do you ask about some concrete endpoint?

@bjartek
Copy link
Contributor

bjartek commented Jan 22, 2024

Ideally all. Just some way of knowing that you have gotten it all and can resume for where you left off safely.

@franklywatson franklywatson self-requested a review January 22, 2024 18:05
@peterargue
Copy link
Contributor

Ideally all. Just some way of knowing that you have gotten it all and can resume for where you left off safely.

this is what the "sequence number" comments I made were about. I think it's only relevant for endpoints that don't have data in every block (e.g. events, tx, accounts). With this included, a client could using this identifier to detect when they missed a message, and restart from the last good height if one was missed.

for example. if we included a messageNum field in the events stream. If you were streaming events and got a message with a messageNum that was greater than expected, you could restart the stream using the height from the last successful message as the start height.

@Guitarheroua
Copy link
Contributor Author

Ideally all. Just some way of knowing that you have gotten it all and can resume where you left off safely.

Additionally, both gRPC and WebSocket ensure the preservation of message order. So, if there's a glitch in the network, likely resulting in a disconnect, you can check the last successful message and resume the subscription from that point, for example, starting a new subscription from the last block received, etc.

I didn't realize the number was supposed to be the GH Issue number mapping to this FLIP in this repo
@franklywatson
Copy link
Contributor

Hey everyone. Will be closing this FLIP as accepted on Friday. I think we're down to the last of the feedback by now, but heads up in case of stragglers

@bjartek
Copy link
Contributor

bjartek commented Jan 25, 2024

Hey. Feel free to close if you are ok with current scope.

It is still missing pieces that I need in order to use it. Ref streaming of events emitted in serviceGhunk transaction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flip: protocol Protocol FLIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants