-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some further cleanup in WriteBatchWithIndex::MultiGetFromBatchAndDB #12143
Conversation
This pull request was exported from Phabricator. Differential Revision: D52136513 |
0aa17f4
to
6fda6a0
Compare
…acebook#12143) Summary: facebook#11982 changed `WriteBatchWithIndex::MultiGetFromBatchDB` to preallocate space in the `autovector`s `key_contexts` and `merges` in order to prevent any reallocations, both as an optimization and in order to prevent pointers into the container from being invalidated during subsequent insertions. On second thought, this preallocation can actually be a pessimization in cases when only a small subset of keys require querying the underlying database. To prevent any memory regressions, the PR reverts this preallocation. In addition, it makes some small code hygiene improvements like incorporating the `PinnableWideColumns` object into `MergeTuple`. Differential Revision: D52136513
This pull request was exported from Phabricator. Differential Revision: D52136513 |
…acebook#12143) Summary: facebook#11982 changed `WriteBatchWithIndex::MultiGetFromBatchDB` to preallocate space in the `autovector`s `key_contexts` and `merges` in order to prevent any reallocations, both as an optimization and in order to prevent pointers into the container from being invalidated during subsequent insertions. On second thought, this preallocation can actually be a pessimization in cases when only a small subset of keys require querying the underlying database. To prevent any memory regressions, the PR reverts this preallocation. In addition, it makes some small code hygiene improvements like incorporating the `PinnableWideColumns` object into `MergeTuple`. Differential Revision: D52136513
6fda6a0
to
9fe7637
Compare
This pull request was exported from Phabricator. Differential Revision: D52136513 |
@@ -708,27 +722,35 @@ void WriteBatchWithIndex::MultiGetFromBatchAndDB( | |||
|
|||
// Note: we have to retrieve all columns if we have to merge KVs from the | |||
// batch and the DB; otherwise, the default column is sufficient. | |||
// The columns field will be populated by the loop below to prevent issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for my own learning. I'm trying to understand how columns field being populated here would cause dangling pointer issue.
I was wondering why not just adding the key_context
to sorted_keys
below line 731 and 738 to avoid extra loop of key_contexts
in line 744. We then won't preallocate size for sorted_keys
like the other two, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for my own learning. I'm trying to understand how columns field being populated here would cause dangling pointer issue.
Right. So std::vector
internally uses a contiguous heap-allocated buffer. As more and more elements are added to the vector, eventually we exceed the capacity of this buffer, at which point the vector has to allocate a bigger buffer (typical implementations double the capacity), and copy/move every item into the new buffer. This invalidates any pointer or iterator that points to the old buffer.
I was wondering why not just adding the
key_context
tosorted_keys
below line 731 and 738 to avoid extra loop ofkey_contexts
in line 744. We then won't preallocate size forsorted_keys
like the other two, though.
It's actually another manifestation of the above issue; the internal buffer of key_contexts
may get reallocated as further KeyContext
s are added, which would render any pointer already added to sorted_keys
invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so if the pointer to the tuple's columns
(in the merges
vector) is set in key_context
here, then as soon as merges
vector gets resized, that pointer is invalidated because the items are copied/moved to the new vector and the key_context
no longer has the right value for the columns pointer. The same problem can happen if we make the sorted_key
reallocated during population.
Please let me know if my understanding is correct. I think the change now makes perfect sense to me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, exactly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
This pull request has been merged in cd21e4e. |
Summary: #11982 changed
WriteBatchWithIndex::MultiGetFromBatchDB
to preallocate space in theautovector
skey_contexts
andmerges
in order to prevent any reallocations, both as an optimization and in order to prevent pointers into the container from being invalidated during subsequent insertions. On second thought, this preallocation can actually be a pessimization in cases when only a small subset of keys require querying the underlying database. To prevent any memory regressions, the PR reverts this preallocation. In addition, it makes some small code hygiene improvements like incorporating thePinnableWideColumns
object intoMergeTuple
.Differential Revision: D52136513