-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE-8161] storage
: add min.cleanable.dirty.ratio
, schedule compaction by dirty_ratio
#24991
base: dev
Are you sure you want to change the base?
[CORE-8161] storage
: add min.cleanable.dirty.ratio
, schedule compaction by dirty_ratio
#24991
Conversation
Adds the book-keeping variables `_dirty/closed_segment_bytes` to `disk_log_impl`, as well as some getter/setter functions. These functions will be used throughout `disk_log_impl` where required (segment rolling, compaction, segment eviction) to track the bytes contained in dirty and closed segments.
Uses the added functions `update_dirty/closed_segment_bytes()` in the required locations within `disk_log_impl` in order to bookkeep the dirty ratio. Bytes can be either removed or added by rolling new segments, compaction, and retention enforcement.
db600d4
to
78b2928
Compare
Retry command for Build#61464please wait until all jobs are finished before running the slash command
|
CI test resultstest results on build#61464
test results on build#61480
|
78b2928
to
7a6aab3
Compare
storage
: schedule compaction by dirty_ratio
storage
: add min.cleanable.dirty.ratio
, schedule compaction by dirty_ratio
storage
: add min.cleanable.dirty.ratio
, schedule compaction by dirty_ratio
storage
: add min.cleanable.dirty.ratio
, schedule compaction by dirty_ratio
Retry command for Build#61480please wait until all jobs are finished before running the slash command
|
std::optional<double> | ||
metadata_cache::get_default_min_cleanable_dirty_ratio() const { | ||
return config::shard_local_cfg().min_cleanable_dirty_ratio(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm wondering if this is something that we want to expose as a topic config right now, or we keep it internal for a while?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you would like to keep this config internal for the time being?
I could see an argument for releasing all of min.cleanable.dirty.ratio
/min.compaction.lag.ms
/max.compaction.lag.ms
as configs all at once, but I also think there is use for min.cleanable.dirty.ratio
as a standalone property for now.
Is this your reasoning as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you would like to keep this config internal for the time being?
just that no one is asking for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There have been some internal discussions around the benefits of min.cleanable.dirty.ratio
reducing read/write amplification, but fair point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, makes sense. presumably you could bail out of the loop early based on some condition. for example, the while loop already does this in the while
condition. maybe there isn't a natural way to do that, i'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can just check again for the existence of the ntp in the ntp_by_compaction_heuristic
map, shift the list and continue without compacting so we eventually hit the "front" of the list again. Sub-optimal but least intrusive change.
@@ -260,29 +293,28 @@ log_manager::housekeeping_scan(model::timestamp collection_threshold) { | |||
co_await compaction_map->initialize(compaction_mem_bytes); | |||
_compaction_hash_key_map = std::move(compaction_map); | |||
} | |||
while (!_logs_list.empty() | |||
&& is_not_set(_logs_list.front().flags, bflags::compacted)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't see us checking/changing the compacted flags now. is that changed?
alternatively, did you consider just sorting _logs_list? you could leave this while loop unchanged, and before it, iterate over the btree, use get() to access the intrusive hook, and then in O(1) move the entries to the head of the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea, thanks. Wasn't sure of the best way to manage both of the housekeeping meta as well as the sorted ntps. This sounds like a good solution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your changes look ok too, but we've had a bunch of concurrency bugs related to that intrusive list in the past. i think it'd be nice to just sort it how we want, and otherwise leave it untouched. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was under the impression that it woudl be fine to compact any partition in that loop, we were only interested in which order compaction occurred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this PR, the changes (as they stand) do two things:
- The housekeeping scan no longer indiscriminately compacts every partitions, but instead only compacts partitions with a
dirty_ratio
abovemin.cleanable.dirty.ratio
(if it is set at the cluster or topic level). - The ordering of compaction for partitions is decided using the
dirty_ratio
as a heuristic.
So, we aren't looking to compact every partition in that loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it looks like my original comment got deleted accidentally, for context I wrote:
I agree with your proposed changes, I'm just wondering how the while loop would remain unchanged when we no longer are looking to compact every log in
_logs_list
(i.e, the logic ofis_not_set()
for the front entry of_logs_list
returningfalse
no longer applies.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeh i think we could bail out of the loop (or skip items)?
, min_cleanable_dirty_ratio( | ||
*this, | ||
"min_cleanable_dirty_ratio", | ||
"The minimum ratio between between the number of bytes in \"dirty\" " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The minimum ratio between between the number of bytes in \"dirty\" " | |
"The minimum ratio between the number of bytes in \"dirty\" " |
"min_cleanable_dirty_ratio", | ||
"The minimum ratio between between the number of bytes in \"dirty\" " | ||
"segments and the total number of bytes in closed segments that must be " | ||
"reached before a partition's log is considered eligible for compaction, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"reached before a partition's log is considered eligible for compaction, " | |
"reached before a partition's log is eligible for compaction " |
config_type="DOUBLE", | ||
value="-1", | ||
doc_string= | ||
"The minimum ratio between between the number of bytes in \"dirty\" " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The minimum ratio between between the number of bytes in \"dirty\" " | |
"The minimum ratio between the number of bytes in \"dirty\" " |
doc_string= | ||
"The minimum ratio between between the number of bytes in \"dirty\" " | ||
"segments and the total number of bytes in closed segments that must be " | ||
"reached before a partition's log is considered eligible for compaction, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"reached before a partition's log is considered eligible for compaction, " | |
"reached before a partition's log is eligible for compaction, " |
"reached before a partition's log is considered eligible for compaction, " | |
"reached before a partition's log is considered eligible for compaction " |
Based on PR #24649.
WIP, tests to be added.
Instead of unconditionally compacting all logs during a round of housekeeping, users may now optionally schedule log compaction in the
log_manager
using the cluster/topic propertymin_cleanable_dirty_ratio
/min.cleanable.dirty.ratio
.As mentioned in the above PR,
By setting the
min.cleanable.dirty.ratio
on a per topic basis, users can avoid unnecessary read/write amplification during compaction as the log grows in size.A housekeeping scan will still be performed every
log_compaction_interval_ms
, and the log'sdirty_ratio
will be tested againstmin.cleanable.dirty.ratio
in determining it's eligibility for compaction. Additionally, logs are now compacted in descending order according to their dirty ratio, offering a better "bang for buck" heuristic for compaction scheduling.Backports Required
Release Notes
Improvements
min_cleanable_dirty_ratio