-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexation from scratch seems abnormaly long #345
Comments
Both statements are true, but there's no causal relationship. The logic that causes these errors actually makes the situation better, it speeds up the processing. But if you insist, just make these 2 futures run sequentially by adding await on each of them: You can speed up the process by dropping some logic. E.g., if you don't need fungible token data or account changes table, just comment the collecting of this data. Be careful, not all the tables could be commented: e.g., most of the tables has FK on |
Hey @klefevre! Happy to hear you've managed to start indexing from the genesis! We do expect database errors to occur from time to time when we are trying to concurrently store data about receipts and transactions and it is auto-resolving. However, I am not sure you've shared the expected one. I'll ask @telezhnaya if she remembers.
Sorry you haven't expected it, but it's normal
That's actually a feature of this indexer (Indexer for Explorer). It stores the data in a relational database and we require it to be consistent. The easiest approach for us is to use a bunch of constraints in the database schema. Strict mode is a normal/regular way of running this indexer. And
Since you've started your indexer from genesis I assume you care about consistency and it is the right way to run it. Actually, we created the As a summary: Indexer for Explorer is made to serve the Explorer's needs. It requires all the data to be present and to be consistent. In most cases, if you're not running your own Explorer this indexer is overkill and perhaps you should consider creating your own, custom, indexer that handles the data you need and in the way you need it. Feel free to ask questions or close the issue if you got the answers :) |
First of all, thank you both for these these quick answers!
We indeed expected that the indexing would slow down but as it started like crazy, even after few days, we were surprised that it slowed down that much! :)
I fully understand this requirement. However my point about asking why Again, I need to dig into the code, but does the indexer actually use these fk errors to fetch missing data? Or is, somehow, just a way to wait before next block to be processed? If the last, then could we imagine dropping all fk and indexes during initial indexation i.e. from genesis block to last block and restoring them afterward?
As I said, I need to dig into the code, but that's actually pretty promising! I'll check what I can do. Also, I indeed don't need every tables but the one I need are the biggest therefore it was kind of superficial to remove them from a db size point of view but if it could accelerate significantly the indexing process I'll check that too. |
Which one do you need, btw? |
We need:
|
@klefevre take a look at https://github.com/near/near-microindexers
They have improved structure, additional columns |
@telezhnaya That's awesome! How did i not see this repo before!? If I follow you correctly, If I want all tables I indicated I'll need to merge logics from both Also, you didn't answer my question regarding potentially dropping foreign keys and indexes before starting indexing. Once finished, I would then add them. As I don't need to perform queries during initial indexation, this would speed up the process like crazy. However if indexers somehow use these constraints to dedup stuff then I'll keep them. WDYT? |
Not even merge, just run them in parallel :)
You need to keep primary keys and unique constraints/indexes. All the others (including all non-unique indexes, btree or whatever) you can drop and create in the end. |
@klefevre upd: be careful with dropping indexes It's hard to read |
Hi all,
After some quirks we finally started an indexation from block 1 (actually 9,820,210 but you get my point) since last week and it started quite gracefully. As I mentioned in a previous issue, we saw from time to time ForeignKeyViolation errors that the internal retry handles correctly:
ERROR explorer_database:Error occurred during DatabaseError(ForeignKeyViolation, "insert or update on table \"receipts\" violates foreign key constraint \"tx_receipt_fk\""):
However these errors got worse day after day. Currently we're approaching 40M blocks and we see these errors every ~5 blocks which slow down dramatically the entire process. We estimate that if it continues like this, we won't finish indexing until months. I don't know what's the average time to fully index Near but I believe something should be wrong on our side.
Is it the intended behavior or did we miss something?
Note that, as suggested, we run the indexer with
--concurrency 1
in strict mode to avoid any issue with the data.Please, forgive me but I didn't (yet) dig into the code to understand how it works and therefore I don't have all the ins and outs of why it works like.
While in sync mode it makes perfectly sens, could you explain why we do need to use the strict mode to index "old" data that, I presume, should be fetched from S3? If anybody has any hints to allow us to speed up the indexation while keeping data integrity it would be awesome!
Best,
The text was updated successfully, but these errors were encountered: