Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): refactor commit_epoch code #17235

Merged
merged 26 commits into from
Jul 11, 2024
Merged

feat(storage): refactor commit_epoch code #17235

merged 26 commits into from
Jul 11, 2024

Conversation

Li0k
Copy link
Contributor

@Li0k Li0k commented Jun 13, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR removes the group payload of SharedBuffer compaction. At the same time, remove the mapping of Compaction Groups to ssts from being processed on the CN side, and delay the execution until commit_epoch, thus reducing the CN upload ops. Simplified and refactored the commit_epoch code to make it more readable and removed useless and hard to understand code

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@Li0k Li0k changed the title feat(storage): remove group payload WIP: feat(storage): remove group payload Jun 14, 2024
@Li0k Li0k marked this pull request as ready for review June 14, 2024 16:47
Copy link

gitguardian bot commented Jun 17, 2024

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
9425213 Triggered Generic Password 5599167 ci/scripts/e2e-sink-test.sh View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@Li0k Li0k changed the title WIP: feat(storage): remove group payload feat(storage): remove group payload and refactor commit_epoch Jun 17, 2024
Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

src/meta/src/hummock/manager/commit_epoch.rs Outdated Show resolved Hide resolved
src/meta/src/hummock/manager/commit_epoch.rs Outdated Show resolved Hide resolved
// Don't trigger compactions if we enable deterministic compaction
if !self.env.opts.compaction_deterministic_test {
// commit_epoch may contains SSTs from any compaction group
for id in &modified_compaction_groups {
self.try_send_compaction_request(*id, compact_task::TaskType::Dynamic);
}
if !table_stats_change.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where will we do this stats cleaning logic after this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

purge_prost_table_stats

src/meta/src/hummock/manager/commit_epoch.rs Show resolved Hide resolved

for (mut sst, group_table_ids) in sst_to_cg_vec {
for (group_id, _match_ids) in group_table_ids {
let branch_sst = split_sst(&mut sst.sst_info, &mut new_sst_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also modify the table_ids of the SstableInfo, but this matched_ids is not passed to the split_sst method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be some historical legacy and I'd like to deal with it in a separate pr

) -> Result<BTreeMap<u64, Vec<SstableInfo>>> {
let mut new_sst_id_number = 0;
let mut sst_to_cg_vec = Vec::with_capacity(sstables.len());
for commit_sst in sstables {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this for loop, we can collect these per group information

HashMap<
   CompactionGroupId,
   (
        Vec<(
          SstableInfo, 
          u64 /* sstable_id_offset from new_sst_id,
so that we don't have to increment new_sst_id, 
but instead get the sstable id by new_sst_id + sstable_id_offset */
        )>, 
        HashSet<TableId>
    )
>

And then in the next round of processing, we can handle the ssts of a compaction group all at once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried refactoring the code with this complex structure, but the code was not simplified.

Since next_sstable_object_id does not implement prefetching under sql backend, this may become burdensome when we can't estimate the total sst_number (which can't be done using only one loop), so I'm biased towards leaving this code untouched for now, what do you think?

Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

epoch,
commit_sstables,
new_table_ids,
&mut modified_compaction_groups,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to explicitly pass modified_compaction_groups and update it inside pre_commit_sst because it can be obtained from committ_sstables. Example:

let commit_sstables = self
            .correct_commit_ssts(sstables, &table_compaction_group_mapping)
            .await?;
let modified_compaction_groups = commit_sstables.keys();

version.pre_commit_sst(...);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Li0k Li0k changed the title feat(storage): remove group payload and refactor commit_epoch feat(storage): refactor commit_epoch code Jul 9, 2024
@Li0k
Copy link
Contributor Author

Li0k commented Jul 9, 2024

In this PR, we can't accurately identify the size of the per table in the commit epoch info, so we can't assign a specific size to the sst when splitting the sst, which lead to compaction exceptions. Therefore, I decided to keep only the commit_epoch code refactor in this PR.

Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM!

// Generate new SST IDs for each compaction group
// `next_sstable_object_id` will update the global SST ID and reserve the new SST IDs
// So we need to get the new SST ID first and then split the SSTs
let mut new_sst_id = next_sstable_object_id(&self.env, new_sst_id_number).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May add a sanity check that the new_sst_id won't be incremented to larger than the max-prefetched sst_id

@Li0k Li0k added this pull request to the merge queue Jul 11, 2024
Merged via the queue into main with commit df5cc5f Jul 11, 2024
31 of 33 checks passed
@Li0k Li0k deleted the li0k/storage_commit_epoch branch July 11, 2024 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants