-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: serialise write buffer when creating a snapshot #478
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks promising!
I think this works for writing out the write buffer. It would be good to add a test that exercises writeWriteBuffer
directly. Maybe we could test that the WriteBufferWriter
and RunBuilder
produce the same keyops and blob files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the write buffer writer only has two files, I don't think we need the abstractions that we have for runs, like ForRunFiles
. To give you an idea of what I mean, I made a small patch that you can download and apply locally using git apply write-buffer-paths.patch
, and go from there (if you agree with my assessment)
Gist: https://gist.github.com/jorisdral/cae5737e7b201ed3372d27145d5d7d73
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I just liked the consistency.
data WriteBufferWriter m h = WriteBufferWriter | ||
{ -- | The file system paths for all the files used by the serialised write buffer. | ||
writeBufferWriterFsPaths :: !WriteBufferFsPaths, | ||
-- | The page accumulator. | ||
writeBufferWriterPageAcc :: !(PageAcc (PrimState m)), | ||
-- | The byte offset within the blob file for the next blob to be written. | ||
writeBufferWriterBlobOffset :: !(PrimVar (PrimState m) Word64), | ||
-- | The (write mode) file handles. | ||
writeBufferWriterHandles :: !(ForWriteBufferFiles (ChecksumHandle (PrimState m) h)), | ||
writeBufferWriterHasFS :: !(HasFS m h), | ||
writeBufferWriterHasBlockIO :: !(HasBlockIO m h) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd pick a smaller name prefix to make these fields less verbose
Is there any testing infrastructure that would allow me to easily set up the required inputs? |
Do you have any initial feelings regarding the amount of duplication? |
^ contains some definitions for generating data for runs, which is equivalent to data for write buffers. It's just sorted key/value data without duplicate keys lsm-tree/test/Test/Database/LSMTree/Internal/Run.hs Lines 179 to 212 in d11c709
^ this test could give you an idea of how to use the run data. |
Oops, yes, I meant to write about that but I forgot. I think it's okay to have the duplication for now, as long as we
|
Fair! I still think the generalisation of the run infrastructure to permit empty filters and indices might be worthwhile, especially with SPJ's recent "specialise with value" extension. |
True, though we wouldn't be able to take advantage of that on older GHCs... unless that stuff is backported |
We could do something a bit unhinged and encode the flag using instances of a type class. That way it's easily backwards compatible. |
Added the test
|
The assertion at the specified line of the code asserts that keys should be longer than 8 bytes. The |
bd27cdb
to
8e1f2bf
Compare
8e1f2bf
to
6a7eee5
Compare
This is a WIP PR that serialises the write buffer when creating a snapshot.
I'd love some feedback :)