Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leveled compaction support for heavy DGM #35

Open
hisundar opened this issue Jun 5, 2017 · 2 comments
Open

Leveled compaction support for heavy DGM #35

hisundar opened this issue Jun 5, 2017 · 2 comments

Comments

@hisundar
Copy link
Contributor

hisundar commented Jun 5, 2017

The Data Greater than Memory (DGM) performance of moss store's Log-Structured Merge Arrays is limited by the speed of the merge operation during compaction.
Currently this compaction is done in a single level which comes from the default setting of 0 as the compaction threshold. This can result in heavy write amplification.
Based on discussion with @steveyen, to mitigate this situation, moss store persistence can follow this simple approach:
maxSmallSegments=3, maxBigSegments=2
Initially, maxSmallSegments=0, maxBigSegments=0
Persistence appends small segments to end of the file...
|-seg0-||-seg1-||-seg2-| (maxSmallSegments=3)
On the next round of persistence, the above 3 segments can be compacted into a new file
|====seg0===| (maxSmallSegments=0, maxBigSegments=1)
Following this further persistence rounds simply append smaller segments
|====seg0====||-seg0-||-seg1-||-seg2-|
Now the next round of persistence, only compacts the small segments making the file look as follows..
|====seg0====||...seg0...||...seg1...||...seg2...||=====seg1=====|
The rationality behind this is that compacting fewer segments would be faster than constantly rewriting the file on every delta.

Later to support efficient persistence to disk, we can adopt a simple size-tiered leveled compaction support in mossStore by splitting the Footer across multiple levels:
data-L0-0000xx.moss: most recent segments.
data-L1-0000xx.moss: segments merged from L0
data-L2-0000xx.moss: segments merged from L1
We can then size tier these on the levels to achieve good tradeoff between space, read and write efficiencies.

@steveyen
Copy link
Member

steveyen commented Jun 5, 2017

Hi @hisundar,
One thought is how to have the configuration more general than maxSmallSegments and maxBigSegments, as it leads me to think you'd end up one future day adding things like maxBiggerSegments, maxBiggerThanBiggerSegments, etc. (Unless I misunderstand.)

Also, I took a look at rocksdb, and they seem to have multiple, concurrent files per level, so that's something to consider on the pros/cos of that.

@hisundar
Copy link
Contributor Author

hisundar commented Jun 5, 2017

I agree @steveyen, the above is just an example. We should definitely have something like an array representing each level for a footer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants