-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast parallel compressed wksp checkpt/restore #3034
Open
kbowers-jump
wants to merge
7
commits into
main
Choose a base branch
from
kbowers-jump/wksp-checkpt
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Commits on Oct 3, 2024
-
- Type and comment cleanup fd_checkpt.h - Eliminated redundant test in fd_restore.c.
Configuration menu - View commit details
-
Copy full SHA for a8c9de5 - Browse repository at this point
Copy the full SHA a8c9de5View commit details -
- fd_wksp.h brings in fd_checkpt.h in anticipation of checkpt based wksp checkpointing. - swept through and cleaned up other util includes in the process.
Configuration menu - View commit details
-
Copy full SHA for 07c45ea - Browse repository at this point
Copy the full SHA 07c45eaView commit details -
Added pipelined alloc/free test coverage
Provides coverage of a case that has long been missing from the test_alloc (that was already well covered in application level testing). Not run by default. This was made a few months ago to help FD devs isolate an allocation issue (that were not in fd_alloc alas). Doesn't really belong in this PR but also isn't really worth a separate PR. But I'm tired of it lying around in my local copy. And it probably shouldn't be thrown away as it is a very stringent stress tester when the free matching an alloc happens on a different thread (e.g. pipelining with alloc on the "source" thread and matching free on the "sink" thread, potentially in a different process). So here it is.
Configuration menu - View commit details
-
Copy full SHA for df124a3 - Browse repository at this point
Copy the full SHA df124a3View commit details -
Needed for writing parallel compressed restore from a file descriptor.
Configuration menu - View commit details
-
Copy full SHA for b8b0230 - Browse repository at this point
Copy the full SHA b8b0230View commit details -
Low level portable memory mapped I/O API
Useful for all sorts of things, including parallel wksp checkpt/restore implementations.
Configuration menu - View commit details
-
Copy full SHA for f558a0d - Browse repository at this point
Copy the full SHA f558a0dView commit details -
fd_checkpt API improvements and cleanups
Useful for writing robust high level functionality. - Split fd_checkpt_buf into two functions, fd_checkpt_{meta,data}, and similarly for fd_restore_buf. The meta functions are optimized for metadata / control while the data functions are optimized for bulk data. That is, fd_{checkpt,restore}_meta are meant for small often temporary buffers formed on the fly when creating a checkpt and that are needed immediately when executing a restore (e.g. the byte size of the next data buffer in a checkpt frame, a control signal to tell the restore there are no more data buffers in the current frame, ...). Accordingly, the size of these buffers is limited to at most FD_{CHECKPT,RESTORE}_META_MAX (64 KiB) and these buffers can be read / written / freed immediately on return. Conversely, fd_{checkpt,restore}_data are meant for large persistent buffers used after the restore completes. These can have (practically) arbitrary size. Buffers passed to these cannot be read / written / freed until the corresponding frame is closed. Splitting these functions makes it much simpler to implement non-trivial object level checkpt/restore functions while retaining zero copy efficiency and high compression ratio. (E.g. it is much easier to write an optimized parallel compressed wksp checkpt/restore with these semantics.) Under the hood, this piggybacks on the small buffer gather/scatter optimizations already done to improve the LZ4 compression ratio when checkpt a lot of tiny metadata buffers. Other frame styles are free to use this distinction as they wish (just have to respect the buffer lifetime rules). - Renamed frame_{open,close} to just {open,close} to make API easier to call. - Added fd_restore_sz and fd_restore_seek to help with parallel checkpt/restore. - Added fd_restore_{open,close}_advanced APIs that mirror the existing checkpt advanced APIs. These expose the restore frame offsets to support better high level validation of restores. As part of this, restore tracks offsets under the hood and has strict semantics about the meaning of the offset between mmio, streaming mode with seekable files and streaming mode with streams / pipes. - Added a frame_style_is_supported API to help with cross-platform restores. - Added is_mmio and varous accessors to make it easier to clone checkpt and restore objects for thread parallelization. - Made can_open and in_frame public to help with cleaning up after a deeply nested error. - Other minor cleanups (gbuf_cursor init, checkpt/restore public APIs grouped together). - Updated unit tests coverage accordingly and also added tests to stress out the compressor doing non-trivial gather/scatter operations (e.g. contiguous regions on checkpt to discontiguous regions on restore and vice versa) and use the new-fangled fd_io_seek API. - Updated documentation (typo corrections, etc).
Configuration menu - View commit details
-
Copy full SHA for 9346e8e - Browse repository at this point
Copy the full SHA 9346e8eView commit details
Commits on Oct 4, 2024
-
Fast parallel compressed wksp checkpt/restore
Very few top level changes: - The raw style is now called the v1 style (the raw style macro still exists for backward compat) but is otherwise unchanged (i.e. should backward compatible with existing wksp checkpts). - Added v2 (uncompressed) and v3 (compressed) styles. - Preview function API refined for more general usages across all versions (this required minor changes to the places outside wksp where preview was getting called and tweaking the number of minimal part_max used by topo). Updated fd_firedancer.c accordingly. Under the hood, v2 and v3 formats have many useful properties for fast checkpt / restore performance and for long term archival purposes (these semantics are also usable for ultra high performance snapshot distribution and recovery). - v2/v3 support writing a checkpt with an arbitrary number parallel threads and restoring with an arbitrary and potentially different number of parallel threads. Thus performance can be scaled out to theoretical memory or network bandwidth (v2) and compression library (v3) limits. (Currently only thread parallelizatio of v2 restore is implemented but that is by far the most important case practically.) - The v2 and v3 wksp allocation data frames will further be bit level identical regardless of the number of threads used on checkpt / restore. - While v2/v3 metadata (which store information about the environment in which the checkpt was made among other things) obviously can vary from run-to-run and host-to-host, this information can quickly identified and ignored without having to process the whole checkpt. - These two features make it much easier to have multiple hosts create what should be bit-level identical checkpt files and then distribute them torrent style from multiple servers to multiple clients concurrently (and thus avoid having a network hot spot on a single server with the "blessed" checkpt). - Thus, at one extreme, a huge v3 (compressed) checkpoint can be written directly out a network socket zero copy / single pass / single threaded and read from an archival copy of the checkpoint in DRAM via zero copy memory mapped I/O with as many parallel threads as it takes to restore. And similarly for the other extreme (and all intermediate combinations). - Current checkpt implementation load balances over multiple parallel restore threads via a high performance approximation to a greedy load balance algorithm. Current restore uses a task queue to dynamic load balance further. Note that parallelization is at partition granularity. If an application just allocates the entire wksp, all checkpt/ restore will behave single threaded regardless of number of threads available to checkpt/restore. - Tweaked wksp allocation to always use fully trimmed partitions for allocations such that there is minimal waste in a wksp checkpt. (Previous behavior would allow an allocation request to use an untrimmed or partially trimmed partition if part_max was inadequate. But these can be arbitrarily sized which then can bloat a checkpt if checkpt a completly full wksp.) - Added a supported-styles command to fd_wksp_ctl to identify which styles are supported on the target. - Based on the recent checkpt API additions. - Minor whitespace cleanups and fixed missing return in fd_wksp_usage.
Configuration menu - View commit details
-
Copy full SHA for 4924d6e - Browse repository at this point
Copy the full SHA 4924d6eView commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.