-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve header chain extension performance #1891
Improve header chain extension performance #1891
Conversation
eth/db/chain.py
Outdated
if current_canonical_head and header.parent_hash == current_canonical_head: | ||
cls._add_block_number_to_hash_lookup(db, header) | ||
db.set(SchemaV1.make_canonical_head_hash_lookup_key(), header.hash) | ||
return (header,), tuple() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, doing this optimization makes sense, but the logic duplication (adding block number lookup, and setting the canonical head hash) concerns me. If something updates further down in the method, it would be really easy to forget to do it here.
One alternative:
- In the if block, only set
new_canonical_headers, old_canonical_headers = (header,), ()
- Move the following 3 lines of
{old,new}_canonical_headers
generation into the else block
It does one extra db lookup at the _get_canonical_block_hash
to check for an old header.
If you think that extra lookup is critical, and worth some more PR thrash, then you could move the for h in new_canonical_headers
block into a cls._decanonicalize_old_headers(db, new_headers)
call, inside the else block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, great point! I don't think the extra lookup matters, I'll profile it to make sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly enough the extra lookup added ~9% to import time. I pushed a change which keeps it out but also removes some duplicated code.
eth/db/header.py
Outdated
try: | ||
canonical_head_hash = db[SchemaV1.make_canonical_head_hash_lookup_key()] | ||
return cast(Hash32, db[SchemaV1.make_canonical_head_hash_lookup_key()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Christoph was recommending doing the following instead. I guess type-checking is supposed to notice if you send a non-bytes value this way (but not with cast
):
return cast(Hash32, db[SchemaV1.make_canonical_head_hash_lookup_key()]) | |
return Hash32(db[SchemaV1.make_canonical_head_hash_lookup_key()]) |
YMMV, do what you like :)
eth/db/header.py
Outdated
@@ -83,11 +83,15 @@ def get_canonical_head(self) -> BlockHeaderAPI: | |||
|
|||
@classmethod | |||
def _get_canonical_head(cls, db: DatabaseAPI) -> BlockHeaderAPI: | |||
canonical_head_hash = cls._get_canonical_head_hash(db) | |||
return cls._get_block_header_by_hash(db, Hash32(canonical_head_hash)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Hash32
wrapper looks redundant.
@@ -173,7 +177,7 @@ def _persist_checkpoint_header( | |||
) | |||
previous_score = score - header.difficulty | |||
cls._set_hash_scores_to_db(db, header, previous_score) | |||
cls._set_as_canonical_chain_head(db, header.hash, header.parent_hash) | |||
cls._set_as_canonical_chain_head(db, header, header.parent_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay that this change has so straightforward, and didn't require a new header lookup anywhere!
Just addressed the rest of the outstanding PR feedback, ready for another review :) |
The lint failure is... interesting. My first guess is that mypy just upgraded, and circleci doesn't pin the version? EDIT: nope, mypy is on the right version, 0.701 |
@lithp interesting indeed! I looked into it. When installed with But when we install with So, my guess is something in |
Ahah! I was wondering why I could only reproduce linting issues with local tox runs, on certain issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, yeah the refactor looks good. I think it leans a but further into subclassing vs component architecture, but fixing that would be a giant change unrelated to this one.
👍 when linting passes
@@ -118,19 +108,16 @@ def _set_as_canonical_chain_head( | |||
old_header = cls._get_block_header_by_hash(db, old_hash) | |||
old_canonical_headers.append(old_header) | |||
try: | |||
for transaction_hash in cls._get_block_transaction_hashes(db, old_header): | |||
transaction_hashes = cls._get_block_transaction_hashes(db, old_header) | |||
for transaction_hash in transaction_hashes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This split was just to do fewer things on one line, right? (I'm fine with it, just wondering if there was something subtle I was missing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, no subtle reason, the change was to get us under the 90-character line width limit
Woah, thanks for looking into it @cburgdorf, that helped a lot! |
Also make an unrelated change which makes circleci happy, it uses an updated version of eth-utils.
6392266
to
92e575b
Compare
What was wrong? / How was it fixed?
Breaking this out from #1888, improves performance if the parent of the header being imported is the canonical chain tip.
_find_new_ancestors
is relatively expensive; if I remember right a lot of time is spent fetching and de-serializing the header which was just imported, when all that's wanted is its hash.To-Do
Cute Animal Picture