-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify tracking of deleted objects #64
Conversation
DeleteTrackerCountVar: newMap("delete_tracker_count"), | ||
RevisionVar: newMap("revision"), | ||
LockContentionVar: newMap("lock_contention"), | ||
GraveyardLowWatermarkVar: newMap("graveyard_low_watermark"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keeping the graveyard naming makes sense here as deleted_object_count
would be rather confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it would be objects_pending_deletion_count
, but yeah, graveyard here is more concise.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of nits, but haven't spotted bugs so far. it seems pretty easy to miss a if obj.deleted
somewhere, but I'd hope tests catch it
) | ||
|
||
// object is the format in which data is stored in the tables. | ||
type object struct { | ||
revision uint64 | ||
data any | ||
deleted bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fieldalignment would probably enjoy an order of data
first so that GC doesn't have to scan further (as there can't be more pointers)
In general, can we avoid this bool in the struct somehow?
Alternatively, do you envision it makes sense to make this a bitmask for future flags on all objs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played a bit with having data
point to type struct deleted { data any }
, but that just caused a further allocation so was no good. If we mandated that T
is always a pointer then we could store data unsafe.Pointer
here and use the bytes saved from the not having the type information around, but that's probably a bit annoying to work with.
But yeah, probably worth checking the field alignments, especially with the object embedded into a part node. Will take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could also halve the revision space and use the bit 😇
but I think you mentioned somewhere that the size didn't increase? that's surprising 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we could do that, but need to make sure the "deleted bit" is the least significant one which is a bit annoying (shift left by one and set). But yeah since the size didn't increase (most likely because there were padding in the node structs) not worth optimizing.
@@ -440,28 +435,44 @@ type indexEntry struct { | |||
unique bool | |||
} | |||
|
|||
type revisionRange struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on whether this is a half-open interval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah but it's not half-open, the end
is included. commenting :D
types.go
Outdated
// deleted objects. | ||
deletedRange revisionRange | ||
|
||
numDeletedObjects int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be uint, but probably makes arithmetic annoying because of casts
types.go
Outdated
// deletedRange holds the revisions of the oldest and | ||
// newest deleted objects. Used to short-cut GCing of | ||
// deleted objects. | ||
deletedRange revisionRange |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deletedRange
is somehow hard to understand, but I'm struggling to come up with a better name. gcRange
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like gcRange
iterator.go
Outdated
updateIter := &iterator[Obj]{indexTxn.LowerBound(index.Uint64(it.revision + 1))} | ||
deleteIter := it.dt.deleted(itxn, it.deleteRevision+1) | ||
it.iter = NewDualIterator(deleteIter, updateIter) | ||
it.iter = indexTxn.LowerBound(index.Uint64(it.revision + 1)) | ||
|
||
// It is enough to watch the revision index and not the graveyard since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no more graveyard, so comment can go
generally probably grep for graveyard; there's some more comment references to it which are stale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good point. bunch of outdated docs in type RWTable
...
iterator.go
Outdated
if obj.deleted { | ||
if rev <= it.startRevision { | ||
// Ignore objects that were marked deleted before this | ||
// changge iterator was created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changge typo
It's somehow asymmetric to me that the iterator has a startRevision, but only cares about it for deleted objects. It seems correct, but irks me for some reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let me rename that!
DeleteTrackerCountVar: newMap("delete_tracker_count"), | ||
RevisionVar: newMap("revision"), | ||
LockContentionVar: newMap("lock_contention"), | ||
GraveyardLowWatermarkVar: newMap("graveyard_low_watermark"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it would be objects_pending_deletion_count
, but yeah, graveyard here is more concise.
@@ -388,7 +391,7 @@ func (t *genTable[Obj]) ListWatch(txn ReadTxn, q Query[Obj]) (iter.Seq2[Obj, Rev | |||
// Doing a Get() is more efficient than constructing an iterator. | |||
value, watch, ok := indexTxn.Get(q.key) | |||
seq := func(yield func(Obj, Revision) bool) { | |||
if ok { | |||
if ok && !value.deleted { | |||
yield(value.data.(Obj), value.revision) | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in the CompareAndSwap impl; should the compare not first check that the compared to obj is not deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean. We can't look into value
unless ok
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uhh, I think I was trying to comment on CompareAndDelete below but couldn't because GitHub UI
but the inner delete
handles the deleted
objects so I think this is just fine
txn.go
Outdated
if !hadOld { | ||
panic("BUG: Object to be deleted not found from primary index") | ||
} | ||
table.numDeletedObjects-- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you could call it numObjsPendingDeletion
instead, which would look more intuitive here, but it's fine either way
Test the memory usage overhead per object to make sure changes we make don't drastically increase our memory consumption. Signed-off-by: Jussi Maki <[email protected]>
Add a benchmark for Delete() to test impact of the graveyard index removal. Signed-off-by: Jussi Maki <[email protected]>
c127d1b
to
ac41f4c
Compare
The use of separate graveyard primary and revision indexes and a background GC job to remove observed objects from them was unnecessarily complicated. We can implement this in a much simpler way by keeping the "soft-deleted" objects in the normal primary & revision indexes, but just marking them as deleted. The garbage collection of observed dead objects can be performed as part of the Commit(), which has the benefit that we can operate on already cloned mutated nodes, allowing in-place modification of the trees. This way we also don't need any background jobs in StateDB. This does have the semantic difference that observed deleted objects are only dropped from indexes on a subsequent WriteTxn, but I don't we'd have an issue with the delayed deletions in practice. If needed we can add back a background job to essentially do a 'WriteTxn(allTables...).Commit()' periodically. There's not a huge difference in benchmarks in terms of time since we're essentially saving a lookup of the graveyard tree during inserts, but this does allow for more compact indexing due to fewer and more optimal radix tree nodes due to the merged indexes. The code for iterating over changes is also a bit more efficient and simpler as it's now just an iteration over a single index. Signed-off-by: Jussi Maki <[email protected]>
624c07d
to
5ee5d23
Compare
While I would've guessed that this approach would've performed better and used less memory the statistics from the load-balancer benchmark show otherwise: With this change:
Before this:
Because of that I won't move forward this at least in this form and will close this PR for now. |
This reworks how deleted objects are stored by removing the separate graveyard indexes and instead storing the deleted objects in the normal primary and revision indexes. This simplifies the code fair bit, especially for Changes(), and allows for more efficient cleanup of deleted objects as part of Commit() rather than in a separate graveyard worker.