Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new RpcDataIngestSettings for controlling split size #354

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

dvli2007
Copy link

@dvli2007 dvli2007 commented Nov 14, 2024

When fetching blocks from RPC, each iteration will only process 10 blocks. This is set with a magic number here.

This PR removes this splitSize magic number and instead makes it an RpcDataIngestSettings value called maxBlocksPerIteration. Users may now pass in an arbitrary non-negative value. If no value is set, then the default is 10 blocks.

@dvli2007 dvli2007 closed this Nov 14, 2024
@dvli2007 dvli2007 deleted the feat/add-maxBlocksPerIteration-rpc-parameter branch November 14, 2024 21:56
@dvli2007 dvli2007 restored the feat/add-maxBlocksPerIteration-rpc-parameter branch November 14, 2024 21:57
@dvli2007 dvli2007 reopened this Nov 14, 2024
@eldargab
Copy link
Collaborator

What's your use case for tweaking this parameter?

To increase data ingestion speed one most likely wants to increase concurrency.

If you are falling behind the head due to non-trivial per-batch processing cost, then you better to optimize your mapping code in such a way, that single block batch is handled faster than block production.

@dvli2007
Copy link
Author

What's your use case for tweaking this parameter?

To increase data ingestion speed one most likely wants to increase concurrency.

If you are falling behind the head due to non-trivial per-batch processing cost, then you better to optimize your mapping code in such a way, that single block batch is handled faster than block production.

We have no mapping logic in our code. This is just running the SQD indexer with the @subsquid/bigquery-store store on a chain with fast block productions -- no storage other than SQD processor internals updating the block height. The BigQuery height updates are not streamed so there's crazy latency, making the current configurations not really usable.

@rmcmk
Copy link

rmcmk commented Dec 9, 2024

We also have a similar use-case to @dvli2007. Our team would appreciate this tuning available in the SDK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants