Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: has operation could be more efficient than get #7

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ func ExecuteBlock(

The main deviations from the paper are:

### Optimisation
### Suspend On Estimate Mark

We applied the optimization described in section 4 of the paper:

Expand All @@ -27,11 +27,21 @@ Block-STM calls add_dependency from the VM itself, and can thus re-read and cont
When the VM execution reads an `ESTIMATE` mark, it'll hang on a `CondVar`, so it can resume execution after the dependency is resolved,
much more efficient than abortion and rerun.

### Support Deletion, Iteration, and MultiStore
### Support Deletion

cosmos-sdk don't allow setting a `nil` value, so we reuse the `nil` for tombstone value, so `Delete` can be implemented as a special case of `Set` with `nil` value.

### Support Iteration, and MultiStore

These features are necessary for integration with cosmos-sdk.

The multi-version data structure is implemented with nested btree for easier iteration support,
the `WriteSet` is also implemented with a btree, and it takes advantage of ordered property to optimize some logic.

The internal data structures are also adapted with multiple stores in mind.

### Concurrency Friendly `Has` Operation

The `Has(key)` operation is usually implemeneted as `Get(key) != nil` naively, but it can be implemented more friendly
to concurrency than `Get` operation, because it only observes the existence status of the key rather than the value content, so we can take advantage of that.
We validates `Get` operation by checking if the value is updated by a different version, but `Has` operation is validated by checking whether the existence of the key is changed. So for example, if a key is updated with a different version, it won't abort the transaction that only observed the key with `Has` operation, because the existence status is not changed.
13 changes: 13 additions & 0 deletions mvdata.go
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,19 @@ func (d *GMVData[V]) Iterator(
// ValidateReadSet validates the read descriptors,
// returns true if valid.
func (d *GMVData[V]) ValidateReadSet(txn TxnIndex, rs *ReadSet) bool {
for _, desc := range rs.Hases {
value, _, estimate := d.Read(desc.Key, txn)
if estimate {
// previously read entry from data, now ESTIMATE
return false
}
exists := !d.isZero(value)
if exists != desc.Exists {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and not sure why we need loop and compare exist

// existance status changed
return false
}
}

for _, desc := range rs.Reads {
_, version, estimate := d.Read(desc.Key, txn)
if estimate {
Expand Down
33 changes: 32 additions & 1 deletion mvview.go
Original file line number Diff line number Diff line change
Expand Up @@ -98,15 +98,46 @@ func (s *GMVMemoryView[V]) Get(key []byte) V {
// record the read version, invalid version is ⊥.
// if not found, record version ⊥ when reading from storage.
s.readSet.Reads = append(s.readSet.Reads, ReadDescriptor{key, version})

if !version.Valid() {
return s.storage.Get(key)
}

return value
}
}

func (s *GMVMemoryView[V]) Has(key []byte) bool {
return !s.mvData.isZero(s.Get(key))
if s.writeSet != nil {
if value, found := s.writeSet.OverlayGet(key); found {
// value written by this txn
// nil value means deleted
return !s.mvData.isZero(value)
}
}

for {
value, version, estimate := s.mvData.Read(key, s.txn)
if estimate {
// read ESTIMATE mark, wait for the blocking txn to finish
s.waitFor(version.Index)
continue
}

var exists bool
if !version.Valid() {
exists = s.storage.Has(key)
} else {
exists = !s.mvData.isZero(value)
}

// record the has descriptor, has operation is validated by value rather than version.
s.readSet.Hases = append(s.readSet.Hases, HasDescriptor{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we record has, won't we have append bigger readSet when call has operation?

Key: key, Exists: exists,
})

return exists
}
}

func (s *GMVMemoryView[V]) Set(key []byte, value V) {
Expand Down
6 changes: 6 additions & 0 deletions types.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,11 @@ type ReadDescriptor struct {
Version TxnVersion
}

type HasDescriptor struct {
Key Key
Exists bool
}

type IteratorOptions struct {
// [Start, End) is the range of the iterator
Start Key
Expand All @@ -49,6 +54,7 @@ type IteratorDescriptor struct {

type ReadSet struct {
Reads []ReadDescriptor
Hases []HasDescriptor
Iterators []IteratorDescriptor
}

Expand Down
Loading