-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse entry indices #309
Comments
It's a good idea. I already have a PR that merge the file location field of multiple entries into one u32 (#208). We just need to move the entry offset into it as well. The reason I didn't merge that PR is because it apparently causes regression to both memory usage and performance in some corner cases (e.g. user writes only one entry in a log batch). I prefer we can at least make it gated by a feature. |
Glad to see that feature is on the road map. Maybe we can list the work needs to do and I'd be happy if I can help. We're going to build a quorum-based log storage like Aurora's storage service. I think raft-engine should be suitable as the local storage. |
@v0y4g3r Sorry for the delay. To move forward this feature, I'll first need to (1) polish #208 with feature gate, then (2) you can propose a change that fully embeds entry offset as well. Because we don't currently need this feature in TiKV, I would anticipate the review process could take a fair amount of time before we reach step 2. I'm curious though, is this a real problem you encountered while using the engine? For the case you mentioned ("The smaller the avg entry size is, the more entry indices will reside in memory."), we can also add another memory metrics other than the existing file size metrics |
In #303 we can see some users complain about the memory usage.
The memory footprint of entry indices largely depends on the average size of entries. File purge and compaction are based on log file total size. The smaller the avg entry size is, the more entry indices will reside in memory.
In some cases, point-get of log entries is less frequent than bulk loads that reads a range of entries and applies those entries to some state machine. So we can use sparse indices to further reduce the memory footprint of raft-engine, for example, we can bookkeep the entry id to offset mapping every N entries thus the memory consumption can be reduced to 1/N.
Sparse indices means we can no longer get the exact offset of some arbitrary entry id, the overhead of reading those needless entries can be amortized in bulk load cases.
The text was updated successfully, but these errors were encountered: