Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some further cleanup in WriteBatchWithIndex::MultiGetFromBatchAndDB #12143

Closed
wants to merge 1 commit into from

Conversation

ltamasi
Copy link
Contributor

@ltamasi ltamasi commented Dec 13, 2023

Summary: #11982 changed WriteBatchWithIndex::MultiGetFromBatchDB to preallocate space in the autovectors key_contexts and merges in order to prevent any reallocations, both as an optimization and in order to prevent pointers into the container from being invalidated during subsequent insertions. On second thought, this preallocation can actually be a pessimization in cases when only a small subset of keys require querying the underlying database. To prevent any memory regressions, the PR reverts this preallocation. In addition, it makes some small code hygiene improvements like incorporating the PinnableWideColumns object into MergeTuple.

Differential Revision: D52136513

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52136513

@ltamasi ltamasi requested a review from jaykorean December 13, 2023 21:21
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 13, 2023
…acebook#12143)

Summary:

facebook#11982 changed `WriteBatchWithIndex::MultiGetFromBatchDB` to preallocate space in the `autovector`s `key_contexts` and `merges` in order to prevent any reallocations, both as an optimization and in order to prevent pointers into the container from being invalidated during subsequent insertions. On second thought, this preallocation can actually be a pessimization in cases when only a small subset of keys require querying the underlying database. To prevent any memory regressions, the PR reverts this preallocation. In addition, it makes some small code hygiene improvements like incorporating the `PinnableWideColumns` object into `MergeTuple`.

Differential Revision: D52136513
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52136513

…acebook#12143)

Summary:

facebook#11982 changed `WriteBatchWithIndex::MultiGetFromBatchDB` to preallocate space in the `autovector`s `key_contexts` and `merges` in order to prevent any reallocations, both as an optimization and in order to prevent pointers into the container from being invalidated during subsequent insertions. On second thought, this preallocation can actually be a pessimization in cases when only a small subset of keys require querying the underlying database. To prevent any memory regressions, the PR reverts this preallocation. In addition, it makes some small code hygiene improvements like incorporating the `PinnableWideColumns` object into `MergeTuple`.

Differential Revision: D52136513
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52136513

@@ -708,27 +722,35 @@ void WriteBatchWithIndex::MultiGetFromBatchAndDB(

// Note: we have to retrieve all columns if we have to merge KVs from the
// batch and the DB; otherwise, the default column is sufficient.
// The columns field will be populated by the loop below to prevent issues
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for my own learning. I'm trying to understand how columns field being populated here would cause dangling pointer issue.

I was wondering why not just adding the key_context to sorted_keys below line 731 and 738 to avoid extra loop of key_contexts in line 744. We then won't preallocate size for sorted_keys like the other two, though.

Copy link
Contributor Author

@ltamasi ltamasi Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for my own learning. I'm trying to understand how columns field being populated here would cause dangling pointer issue.

Right. So std::vector internally uses a contiguous heap-allocated buffer. As more and more elements are added to the vector, eventually we exceed the capacity of this buffer, at which point the vector has to allocate a bigger buffer (typical implementations double the capacity), and copy/move every item into the new buffer. This invalidates any pointer or iterator that points to the old buffer.

I was wondering why not just adding the key_context to sorted_keys below line 731 and 738 to avoid extra loop of key_contexts in line 744. We then won't preallocate size for sorted_keys like the other two, though.

It's actually another manifestation of the above issue; the internal buffer of key_contexts may get reallocated as further KeyContexts are added, which would render any pointer already added to sorted_keys invalid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so if the pointer to the tuple's columns (in the merges vector) is set in key_context here, then as soon as merges vector gets resized, that pointer is invalidated because the items are copied/moved to the new vector and the key_context no longer has the right value for the columns pointer. The same problem can happen if we make the sorted_key reallocated during population.

Please let me know if my understanding is correct. I think the change now makes perfect sense to me :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly!

Copy link
Contributor

@jaykorean jaykorean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in cd21e4e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants