Some further cleanup in WriteBatchWithIndex::MultiGetFromBatchAndDB #12143

ltamasi · 2023-12-13T21:21:29Z

Summary: #11982 changed WriteBatchWithIndex::MultiGetFromBatchDB to preallocate space in the autovectors key_contexts and merges in order to prevent any reallocations, both as an optimization and in order to prevent pointers into the container from being invalidated during subsequent insertions. On second thought, this preallocation can actually be a pessimization in cases when only a small subset of keys require querying the underlying database. To prevent any memory regressions, the PR reverts this preallocation. In addition, it makes some small code hygiene improvements like incorporating the PinnableWideColumns object into MergeTuple.

Differential Revision: D52136513

facebook-github-bot · 2023-12-13T21:21:38Z

This pull request was exported from Phabricator. Differential Revision: D52136513

…acebook#12143) Summary: facebook#11982 changed `WriteBatchWithIndex::MultiGetFromBatchDB` to preallocate space in the `autovector`s `key_contexts` and `merges` in order to prevent any reallocations, both as an optimization and in order to prevent pointers into the container from being invalidated during subsequent insertions. On second thought, this preallocation can actually be a pessimization in cases when only a small subset of keys require querying the underlying database. To prevent any memory regressions, the PR reverts this preallocation. In addition, it makes some small code hygiene improvements like incorporating the `PinnableWideColumns` object into `MergeTuple`. Differential Revision: D52136513

facebook-github-bot · 2023-12-13T21:22:05Z

This pull request was exported from Phabricator. Differential Revision: D52136513

…acebook#12143) Summary: facebook#11982 changed `WriteBatchWithIndex::MultiGetFromBatchDB` to preallocate space in the `autovector`s `key_contexts` and `merges` in order to prevent any reallocations, both as an optimization and in order to prevent pointers into the container from being invalidated during subsequent insertions. On second thought, this preallocation can actually be a pessimization in cases when only a small subset of keys require querying the underlying database. To prevent any memory regressions, the PR reverts this preallocation. In addition, it makes some small code hygiene improvements like incorporating the `PinnableWideColumns` object into `MergeTuple`. Differential Revision: D52136513

facebook-github-bot · 2023-12-13T21:38:20Z

This pull request was exported from Phabricator. Differential Revision: D52136513

jaykorean · 2023-12-13T22:38:17Z

utilities/write_batch_with_index/write_batch_with_index.cc

@@ -708,27 +722,35 @@ void WriteBatchWithIndex::MultiGetFromBatchAndDB(

    // Note: we have to retrieve all columns if we have to merge KVs from the
    // batch and the DB; otherwise, the default column is sufficient.
+    // The columns field will be populated by the loop below to prevent issues


Question for my own learning. I'm trying to understand how columns field being populated here would cause dangling pointer issue.

I was wondering why not just adding the key_context to sorted_keys below line 731 and 738 to avoid extra loop of key_contexts in line 744. We then won't preallocate size for sorted_keys like the other two, though.

Question for my own learning. I'm trying to understand how columns field being populated here would cause dangling pointer issue.

Right. So std::vector internally uses a contiguous heap-allocated buffer. As more and more elements are added to the vector, eventually we exceed the capacity of this buffer, at which point the vector has to allocate a bigger buffer (typical implementations double the capacity), and copy/move every item into the new buffer. This invalidates any pointer or iterator that points to the old buffer.

I was wondering why not just adding the key_context to sorted_keys below line 731 and 738 to avoid extra loop of key_contexts in line 744. We then won't preallocate size for sorted_keys like the other two, though.

It's actually another manifestation of the above issue; the internal buffer of key_contexts may get reallocated as further KeyContexts are added, which would render any pointer already added to sorted_keys invalid.

Ah, so if the pointer to the tuple's columns (in the merges vector) is set in key_context here, then as soon as merges vector gets resized, that pointer is invalidated because the items are copied/moved to the new vector and the key_context no longer has the right value for the columns pointer. The same problem can happen if we make the sorted_key reallocated during population.

Please let me know if my understanding is correct. I think the change now makes perfect sense to me :)

Yep, exactly!

jaykorean

Thank you!

facebook-github-bot · 2023-12-14T01:40:02Z

This pull request has been merged in cd21e4e.

facebook-github-bot added the CLA Signed label Dec 13, 2023

facebook-github-bot added the fb-exported label Dec 13, 2023

ltamasi requested a review from jaykorean December 13, 2023 21:21

ltamasi force-pushed the export-D52136513 branch from 0aa17f4 to 6fda6a0 Compare December 13, 2023 21:21

ltamasi force-pushed the export-D52136513 branch from 6fda6a0 to 9fe7637 Compare December 13, 2023 21:38

jaykorean reviewed Dec 13, 2023

View reviewed changes

jaykorean approved these changes Dec 14, 2023

View reviewed changes

facebook-github-bot closed this in cd21e4e Dec 14, 2023

facebook-github-bot added the Merged label Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some further cleanup in WriteBatchWithIndex::MultiGetFromBatchAndDB #12143

Some further cleanup in WriteBatchWithIndex::MultiGetFromBatchAndDB #12143

ltamasi commented Dec 13, 2023

facebook-github-bot commented Dec 13, 2023

facebook-github-bot commented Dec 13, 2023

facebook-github-bot commented Dec 13, 2023

jaykorean Dec 13, 2023

ltamasi Dec 13, 2023 •

edited

Loading

jaykorean Dec 14, 2023

ltamasi Dec 14, 2023

jaykorean left a comment

facebook-github-bot commented Dec 14, 2023

Some further cleanup in WriteBatchWithIndex::MultiGetFromBatchAndDB #12143

Some further cleanup in WriteBatchWithIndex::MultiGetFromBatchAndDB #12143

Conversation

ltamasi commented Dec 13, 2023

facebook-github-bot commented Dec 13, 2023

facebook-github-bot commented Dec 13, 2023

facebook-github-bot commented Dec 13, 2023

jaykorean Dec 13, 2023

Choose a reason for hiding this comment

ltamasi Dec 13, 2023 • edited Loading

Choose a reason for hiding this comment

jaykorean Dec 14, 2023

Choose a reason for hiding this comment

ltamasi Dec 14, 2023

Choose a reason for hiding this comment

jaykorean left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Dec 14, 2023

ltamasi Dec 13, 2023 •

edited

Loading