-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(storage): refactor commit_epoch code #17235
Conversation
…nto li0k/storage_commit_epoch
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
9425213 | Triggered | Generic Password | 5599167 | ci/scripts/e2e-sink-test.sh | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
…nto li0k/storage_commit_epoch
…nto li0k/storage_commit_epoch
…nto li0k/storage_commit_epoch
…nto li0k/storage_commit_epoch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
// Don't trigger compactions if we enable deterministic compaction | ||
if !self.env.opts.compaction_deterministic_test { | ||
// commit_epoch may contains SSTs from any compaction group | ||
for id in &modified_compaction_groups { | ||
self.try_send_compaction_request(*id, compact_task::TaskType::Dynamic); | ||
} | ||
if !table_stats_change.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where will we do this stats cleaning logic after this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
purge_prost_table_stats
|
||
for (mut sst, group_table_ids) in sst_to_cg_vec { | ||
for (group_id, _match_ids) in group_table_ids { | ||
let branch_sst = split_sst(&mut sst.sst_info, &mut new_sst_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also modify the table_ids
of the SstableInfo
, but this matched_ids
is not passed to the split_sst
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be some historical legacy and I'd like to deal with it in a separate pr
) -> Result<BTreeMap<u64, Vec<SstableInfo>>> { | ||
let mut new_sst_id_number = 0; | ||
let mut sst_to_cg_vec = Vec::with_capacity(sstables.len()); | ||
for commit_sst in sstables { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this for loop, we can collect these per group information
HashMap<
CompactionGroupId,
(
Vec<(
SstableInfo,
u64 /* sstable_id_offset from new_sst_id,
so that we don't have to increment new_sst_id,
but instead get the sstable id by new_sst_id + sstable_id_offset */
)>,
HashSet<TableId>
)
>
And then in the next round of processing, we can handle the ssts of a compaction group all at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried refactoring the code with this complex structure, but the code was not simplified.
Since next_sstable_object_id
does not implement prefetching under sql backend, this may become burdensome when we can't estimate the total sst_number (which can't be done using only one loop), so I'm biased towards leaving this code untouched for now, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
epoch, | ||
commit_sstables, | ||
new_table_ids, | ||
&mut modified_compaction_groups, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to explicitly pass modified_compaction_groups
and update it inside pre_commit_sst
because it can be obtained from committ_sstables
. Example:
let commit_sstables = self
.correct_commit_ssts(sstables, &table_compaction_group_mapping)
.await?;
let modified_compaction_groups = commit_sstables.keys();
version.pre_commit_sst(...);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
…nto li0k/storage_commit_epoch
…nto li0k/storage_commit_epoch
5af37b8
to
c0e1dc0
Compare
…nto li0k/storage_commit_epoch
In this PR, we can't accurately identify the size of the per table in the commit epoch info, so we can't assign a specific size to the sst when splitting the sst, which lead to compaction exceptions. Therefore, I decided to keep only the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM!
// Generate new SST IDs for each compaction group | ||
// `next_sstable_object_id` will update the global SST ID and reserve the new SST IDs | ||
// So we need to get the new SST ID first and then split the SSTs | ||
let mut new_sst_id = next_sstable_object_id(&self.env, new_sst_id_number).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May add a sanity check that the new_sst_id
won't be incremented to larger than the max-prefetched sst_id
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
This PR removes the group payload of SharedBuffer compaction. At the same time, remove the mapping of Compaction Groups to ssts from being processed on the CN side, and delay the execution until commit_epoch, thus reducing the CN upload ops.Simplified and refactored thecommit_epoch
code to make it more readable and removed useless and hard to understand codeChecklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.