Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve header chain extension performance #1891

Merged

Conversation

lithp
Copy link
Contributor

@lithp lithp commented Dec 6, 2019

What was wrong? / How was it fixed?

Breaking this out from #1888, improves performance if the parent of the header being imported is the canonical chain tip. _find_new_ancestors is relatively expensive; if I remember right a lot of time is spent fetching and de-serializing the header which was just imported, when all that's wanted is its hash.

To-Do

  • Clean up commit history

Cute Animal Picture

panda-tree

@lithp lithp requested a review from carver December 6, 2019 01:43
@lithp lithp mentioned this pull request Dec 6, 2019
15 tasks
@lithp lithp changed the title Lithp/improve chain extension performance Improve header chain extension performance Dec 6, 2019
eth/db/chain.py Outdated
if current_canonical_head and header.parent_hash == current_canonical_head:
cls._add_block_number_to_hash_lookup(db, header)
db.set(SchemaV1.make_canonical_head_hash_lookup_key(), header.hash)
return (header,), tuple()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, doing this optimization makes sense, but the logic duplication (adding block number lookup, and setting the canonical head hash) concerns me. If something updates further down in the method, it would be really easy to forget to do it here.

One alternative:

  • In the if block, only set new_canonical_headers, old_canonical_headers = (header,), ()
  • Move the following 3 lines of {old,new}_canonical_headers generation into the else block

It does one extra db lookup at the _get_canonical_block_hash to check for an old header.

If you think that extra lookup is critical, and worth some more PR thrash, then you could move the for h in new_canonical_headers block into a cls._decanonicalize_old_headers(db, new_headers) call, inside the else block.

Copy link
Contributor Author

@lithp lithp Dec 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, great point! I don't think the extra lookup matters, I'll profile it to make sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly enough the extra lookup added ~9% to import time. I pushed a change which keeps it out but also removes some duplicated code.

eth/db/header.py Outdated
try:
canonical_head_hash = db[SchemaV1.make_canonical_head_hash_lookup_key()]
return cast(Hash32, db[SchemaV1.make_canonical_head_hash_lookup_key()])
Copy link
Contributor

@carver carver Dec 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Christoph was recommending doing the following instead. I guess type-checking is supposed to notice if you send a non-bytes value this way (but not with cast):

Suggested change
return cast(Hash32, db[SchemaV1.make_canonical_head_hash_lookup_key()])
return Hash32(db[SchemaV1.make_canonical_head_hash_lookup_key()])

YMMV, do what you like :)

eth/db/header.py Outdated
@@ -83,11 +83,15 @@ def get_canonical_head(self) -> BlockHeaderAPI:

@classmethod
def _get_canonical_head(cls, db: DatabaseAPI) -> BlockHeaderAPI:
canonical_head_hash = cls._get_canonical_head_hash(db)
return cls._get_block_header_by_hash(db, Hash32(canonical_head_hash))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Hash32 wrapper looks redundant.

@@ -173,7 +177,7 @@ def _persist_checkpoint_header(
)
previous_score = score - header.difficulty
cls._set_hash_scores_to_db(db, header, previous_score)
cls._set_as_canonical_chain_head(db, header.hash, header.parent_hash)
cls._set_as_canonical_chain_head(db, header, header.parent_hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay that this change has so straightforward, and didn't require a new header lookup anywhere!

eth/db/header.py Outdated Show resolved Hide resolved
@lithp
Copy link
Contributor Author

lithp commented Dec 10, 2019

Just addressed the rest of the outstanding PR feedback, ready for another review :)

@lithp
Copy link
Contributor Author

lithp commented Dec 10, 2019

The lint failure is... interesting. My first guess is that mypy just upgraded, and circleci doesn't pin the version?

EDIT: nope, mypy is on the right version, 0.701

@cburgdorf
Copy link
Contributor

cburgdorf commented Dec 10, 2019

@lithp interesting indeed! I looked into it.

When installed with .[dev] we end up with eth-utils version 1.7.0 which has some decorators that do not play nice with types.

But when we install with .[eth, lint] like the lint job does we end up with eth-utils version 1.8.4 which does play nicely with types (thanks @njgheorghita 👍 ) and hence the info comes up that the ignore isn't needed anymore.

So, my guess is something in dev forces us to a lower version of eth-utils.

@carver
Copy link
Contributor

carver commented Dec 10, 2019

When installed with .[dev] we end up with eth-utils version 1.7.0 which has some decorators that do not play nice with types.

But when we install with .[eth, lint] like the lint job does we end up with eth-utils version 1.8.4 which does play nicely with types (thanks @njgheorghita ) and hence the info comes up that the ignore isn't needed anymore.

Ahah! I was wondering why I could only reproduce linting issues with local tox runs, on certain issues.

Copy link
Contributor

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, yeah the refactor looks good. I think it leans a but further into subclassing vs component architecture, but fixing that would be a giant change unrelated to this one.

👍 when linting passes

@@ -118,19 +108,16 @@ def _set_as_canonical_chain_head(
old_header = cls._get_block_header_by_hash(db, old_hash)
old_canonical_headers.append(old_header)
try:
for transaction_hash in cls._get_block_transaction_hashes(db, old_header):
transaction_hashes = cls._get_block_transaction_hashes(db, old_header)
for transaction_hash in transaction_hashes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This split was just to do fewer things on one line, right? (I'm fine with it, just wondering if there was something subtle I was missing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, no subtle reason, the change was to get us under the 90-character line width limit

@lithp
Copy link
Contributor Author

lithp commented Dec 10, 2019

Woah, thanks for looking into it @cburgdorf, that helped a lot!

Also make an unrelated change which makes circleci happy, it uses an
updated version of eth-utils.
@lithp lithp force-pushed the lithp/improve-chain-extension-performance branch from 6392266 to 92e575b Compare December 10, 2019 23:46
@lithp lithp merged commit bfdcf12 into ethereum:master Dec 10, 2019
@lithp lithp deleted the lithp/improve-chain-extension-performance branch December 10, 2019 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants