-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Triedent Refactor #533
Open
bytemaster
wants to merge
22
commits into
main
Choose a base branch
from
triedent-refactor
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Triedent Refactor #533
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. use alignas() to prevent false sharing 2. use stack allocated buffer for temp key6 during look ups (13% perf gain) 3. updated big test to support read only mode 4. updated big test to support reads 5. increase the ringbuffer space from 32M to 128m 6. added soem coments for review
1. put temp base6 key on stack instead of heap 2. disable copy-to-hot
1. new block allocator doesn't require remapping the entire range to grow 2. new id allocator that *should be* thread safe for multiple writers by treading the ID space as a hash table and growing it when collision rate starts to slow down alloc (this to be changed in the future as it consumes 25% of write thread) 3. new database API abstraction on top of database 4. replace global/generalized GC with one based upon seg manager 5. enforce that the session lock is in place by putting the necessary function calls on the "lock object" so it is impossible to use the API without maintaining the invariants. Currently maintains 6M reads/sec across 10 threads while wirting 185/items per second while 280M items are in the database and with no mlocking on the database except for the object id table.
Used Thread Sanatiser to remove all detected data races Uses fetch_or/and for locking and fetch_add/sub for retaining/releasing
updated treidentdb (tdb.cpp) to have more options to configure how agressive data is synced, cached, etc.
- added % free to db dump - fixed double-check lock on object id
Allocate object id before allocating space Set the object_header before advancing the alloc_ptr Change alloc_ptr to 32 bit
- fixing bugs in alloc - making compact optional / manual call from main thread for deterministic testing
updated release() to not require lock by having compactor check to see if the object was released after it was moved.
- add release() background thread - fixed bugs with compactor moving objects
git add include/triedent/xxhash.h
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request is to start the code review process and is not yet ready to be merged as it has not been tested with psibase integration.
The primary changes a in the following areas:
It maintains the existing API for database so shouldn't require any major changes to the rest of psibase
Motivation
The ring buffer system was a fixed size cache which required a lot of pinned memory. Under heavy load, especially once data no longer fits in RAM, the old system would have the write thread waiting on the background thread which in turn was waiting on the read threads. Transaction rates fell very low and the majority of the time was spent waiting on mutex. There was no good way to know how to size the ring buffers which meant that the region allocator did most of the heavy lifting.
The old system was fragile, requiring sessions to unlock on certain allocations and invalidating the cached reads. Aside from the pinning of Hot/Warm there was no good way to tell the OS how to page. To make matters worse, the hot rings were filled with mostly dead data caused by the churn of allocating and freeing. It took a long period of time for the ring allocator to get around to reusing that RAM causing a waste of scarce pinned pages.
Results
The code in this branch can sustain 2M random reads from 4 threads while doing 200k random writes on a database that is 272GB with 22GB of IDs holding 338M records. The vast majority of segments end up being 99.9% full and there was limited wasted space. At the end of the insertion of 272GB there were only 6GB of segments ready to be reused and a large part of that was each of the 6 threads personal 128MB write segments. Overall less than 5% wasted space. Future updates could easily trim the database down in size if there were too many empty segments. This was on a M3 Macbook Pro with 128GB of RAM.
After creating that large database, I was able to perform 3.8M sequential inserts per second from a single thread, followed by 6M sequential queries per second. I could update sequential keys at 5M keys per second. Doing single-threaded random inserts achieved over 350k per second.
Block Allocator
Allocates data in chunks of 128MB (configurable compile time)
Chunks have independent mmap address ranges so new chunks can be allocated without having to remap the entire file
Responsible for converting a "location" in a logical range into a segment/offset and resolving the pointer
ID Allocator
Uses the block allocator to reserve space for a growing ID database
mlock's the blocks provided
Responsible for allocating new IDs in a thread-safe manner and recycling unused ids using similar linked list to old version
Seg Allocator
This is the work horse that builds on the block allocator and id allocator to allocate large segments when any thread needs a new place to write. The segments do not use mlock and use madvise to tune paging based upon whether the segment is being used for allocation or being compacted and can factor in other things such as object density.
The seg_allocator implements sessions which allow a thread to request a read_lock to prevent the allocator from reusing a segment. Requests to access data can only be made via the read_lock which returns an object_ref.
Testing
The code was mostly tested via programs/tdb.cpp and it was built with Thread Sanitizer to remove all detectible data races.
Design