Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stem based flat db for verkle #7778

Merged
merged 19 commits into from
Dec 20, 2024
Merged

Conversation

matkt
Copy link
Contributor

@matkt matkt commented Oct 17, 2024

PR description

This PR introduces a new strategy for managing Verkle accounts using a stem-based flat database structure. Instead of saving by account, slot, and code like in the legacy approach, this implementation stores data by stem and code. This shift allows for direct access to the leaves of the Verkle tree, significantly reducing data duplication and improving storage efficiency.

The stem-based flat database strategy is designed to align better with Verkle tree structures, enabling more streamlined storage and access patterns. This should make it more effective in scenarios that require quick access to account data without the overhead of managing redundant information.

Key Changes:

Introduces the VerkleStemFlatDbStrategy class, which handles storage using a stem-based approach.

Reduces duplication by focusing on storing data at the stem level, improving overall efficiency.

Next Steps:

Once this PR is merged, the optimization of Pedersen hash calculations will be handled in a subsequent PR.

Testing will be conducted after the pre-image is available to determine the usability of this implementation in its current state.

Considerations:

This new approach is expected to be necessary for stateless clients, but further verification will be needed for regular nodes to ensure that it meets the requirements effectively.

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest
  • integration tests: ./gradlew integrationTest
  • reference tests: ./gradlew ethereum:referenceTests:referenceTests

matkt added 9 commits June 19, 2024 10:46
Signed-off-by: Karim Taam <[email protected]>
Signed-off-by: Karim Taam <[email protected]>
Signed-off-by: Karim Taam <[email protected]>
Signed-off-by: Karim Taam <[email protected]>
Signed-off-by: Karim Taam <[email protected]>
Signed-off-by: Karim Taam <[email protected]>
matkt and others added 4 commits October 30, 2024 14:53
@matkt matkt changed the title Fork/stem based flat db stem based flat db for verkle Nov 27, 2024
@matkt matkt added the verkle label Nov 27, 2024
Signed-off-by: Karim Taam <[email protected]>
Signed-off-by: Karim Taam <[email protected]>
@matkt matkt marked this pull request as ready for review November 27, 2024 14:49
@matkt matkt deleted the branch hyperledger:verkle November 28, 2024 15:45
@matkt matkt closed this Nov 28, 2024
@matkt matkt reopened this Nov 28, 2024
@lu-pinto lu-pinto force-pushed the fork/stem-based-flat-db branch from df4ba5b to 5b452ae Compare December 6, 2024 10:40
@@ -76,6 +79,7 @@ public static DataStorageOptions create() {
*/
public void validate(final CommandLine commandLine) {
diffBasedSubStorageOptions.validate(commandLine, dataStorageFormat);
verkleSubStorageOptions.validate(commandLine, dataStorageFormat);
Copy link
Contributor

@lu-pinto lu-pinto Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you call validate? It's a noop, not even part of an interface or abstract

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just to ensure that all the logic is coded in the same way so that if tomorrow someone adds a flag that requires a check, they don’t have to wonder why it doesn’t work or how to do it. They just need to follow the same approach as in other parts of the code and simply implement the check in validate. But if you think it's better to remove why not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no strong opinions, you can leave it

interface VerkleUnstable {

VerkleSubStorageConfiguration.VerkleUnstable DEFAULT =
ImmutableVerkleSubStorageConfiguration.VerkleUnstable.builder().build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT is essentially == DISABLED right? So why have both? Just do:

  @Value.Default
  default VerkleUnstable getUnstable() {
    return VerkleUnstable.DISABLED;
  }

And remove the DEFAULT field.

@Override
public Optional<Long> getCodeSize() {
return Optional.empty();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can codeHash != Hash.EMPTY and code size be empty? It doesn't seem to make sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional.empty is not == 0 it's a way to say that the codesize is not available but I modified the code and this method is now abstract and in PmtStateTrieAccountValue I throw an exception

@Override
  public Optional<Long> getCodeSize() {
    throw new UnsupportedOperationException("Retrieving code size from state trie account value is not possible with Patricia Merkle Trie");
  }

noOpSegmentedTx, noOpTx, worldStateKeyValueStorage.getFlatDbStrategy())),
noOpSegmentedTx,
noOpTx,
(BonsaiFlatDbStrategy) worldStateKeyValueStorage.getFlatDbStrategy())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you really need to cast it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that we might not need a specialized type. What about doing this?

diff --git a/ethereum/core/src/main/java/org/hyperledger/besu/ethereum/trie/diffbased/bonsai/storage/BonsaiWorldStateKeyValueStorage.java b/ethereum/core/src/main/java/org/hyperledger/besu/ethereum/trie/diffbased/bonsai/storage/BonsaiWorldStateKeyValueStorage.java
index e55bc7175b..605a8a50de 100644
--- a/ethereum/core/src/main/java/org/hyperledger/besu/ethereum/trie/diffbased/bonsai/storage/BonsaiWorldStateKeyValueStorage.java
+++ b/ethereum/core/src/main/java/org/hyperledger/besu/ethereum/trie/diffbased/bonsai/storage/BonsaiWorldStateKeyValueStorage.java
@@ -28,6 +28,7 @@ import org.hyperledger.besu.ethereum.trie.common.PmtStateTrieAccountValue;
 import org.hyperledger.besu.ethereum.trie.diffbased.bonsai.storage.flat.BonsaiFlatDbStrategy;
 import org.hyperledger.besu.ethereum.trie.diffbased.bonsai.storage.flat.BonsaiFlatDbStrategyProvider;
 import org.hyperledger.besu.ethereum.trie.diffbased.common.storage.DiffBasedWorldStateKeyValueStorage;
+import org.hyperledger.besu.ethereum.trie.diffbased.common.storage.flat.FlatDbStrategy;
 import org.hyperledger.besu.ethereum.worldstate.DataStorageConfiguration;
 import org.hyperledger.besu.ethereum.worldstate.FlatDbMode;
 import org.hyperledger.besu.ethereum.worldstate.WorldStateKeyValueStorage;
@@ -192,7 +193,7 @@ public class BonsaiWorldStateKeyValueStorage extends DiffBasedWorldStateKeyValue

     private final SegmentedKeyValueStorageTransaction composedWorldStateTransaction;
     private final KeyValueStorageTransaction trieLogStorageTransaction;
-    private final BonsaiFlatDbStrategy flatDbStrategy;
+    private final FlatDbStrategy flatDbStrategy;

     public Updater(
         final SegmentedKeyValueStorageTransaction composedWorldStateTransaction,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or are you building this for the future, to use more specialized methods? I don't get any compilation error if I do that

Copy link
Contributor Author

@matkt matkt Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I prefer the current method. everything related to Diffbased or the Bonsai/Verkle world. if I know we're in a Bonsai class, I directly specify it and the subclasses should also be Bonsai as they will be required to follow that structure. This ensures that in the future, there's no need to question when a specific Bonsai method needs to be called, and it also avoids others having to question it later if needed. The right types are already in place.

* <p>This class uses a thread-safe {@link ConcurrentMap} to store data, allowing concurrent
* modifications.
*/
public class CodeConsumingMap extends ForwardingMap<Address, DiffBasedValue<Bytes>> {
Copy link
Contributor

@lu-pinto lu-pinto Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like AccountConsumingMap is almost a complete copy of this class and you can achieve the same reusing that class. Why not renaming AccountConsumingMap -> ConsumingMap and declare it like:

codeToUpdate = new ConsumingMap<DiffBasedValue<Bytes>>(new ConcurrentHashMap<>(), codePreloader);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not completly the same, there is a difference in the consumer.process part (put method)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah true! missed that, in CodeConsumingMap only the updated value is processed by the consumer. But is there any reason for the Consumer not to do that itself and only use the updated value, rather than doing it inside the CodeConsumingMap ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really , more because I didn't wanted to push the logic here

public DiffBasedWorldStateUpdateAccumulator(
      final DiffBasedWorldView world,
      final Consumer<DiffBasedValue<ACCOUNT>> accountPreloader,
      final Consumer<StorageSlotKey> storagePreloader,
      final Consumer<Bytes> codePreloader,
      final EvmConfiguration evmConfiguration) {
    super(world, evmConfiguration);
    this.accountsToUpdate = new AccountConsumingMap<>(new ConcurrentHashMap<>(), accountPreloader);
    this.codeToUpdate = new CodeConsumingMap(new ConcurrentHashMap<>(), codePreloader);
    this.accountPreloader = accountPreloader;
    this.storagePreloader = storagePreloader;
    this.codePreloader = codePreloader;
    this.isAccumulatorStateChanged = false;
    this.evmConfiguration = evmConfiguration;

but I don't have a strong opinion about that

out.writeUInt256Scalar(balance);
out.writeBytes(storageRoot);
out.writeLongScalar(nonce);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you swap nonce with balance for both reading and writing? Any reason in particular?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it was to be compatible with the VerkleStateTrieAccountValue encoding part as I decided to implement it like that. no more reason than that I think I can revert this change

} else {
this.codeSize = code.size();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there is a kind of code smell to me here. When you construct VerkleAccount you have a codeSize != 0 but getCode() == null this is not clear at all unless you are totally familiar with Verkle. And even then...

I understand that code and codeSize are 2 different regions of the DB and so is codeHash and that performance wise we should avoid touching slots unnecessarily but why not having a code provider with a lambda function instead of the field Code code that only populates it and goes to the DB if someone calls getCode? That would be more intuitive to me and consistent. Don't you know at least a way to get the code from the DB when constructing a VerkleAccount ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already working like that, it's using the same code as Bonsai, VerkleAccount is calling the context (MutableWorldstate) to retrieve the code when we call getCode

See in DiffBasedAccount

 public Bytes getCode() {
    if (code == null) {
      code = context.getCode(address, codeHash).orElse(Bytes.EMPTY);
    }
    return code;
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I understand, this is not something just with Verkle but Bonsai also has the same problem. Maybe nothing to do for now, but it would be better to clarify.

Right now, there's so many ways you can set code in both: using a context, calling setCode directly, doing a copy constructor way. I feel this has potential to create problems and at least it's hard for me to deterministically say how the code is set into an account. Why can't you have code set always at construct time? Agree needs some thought though.

But maybe it's because we're saving on object allocation and reusing *Account objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's an optmization, we don't have to read a code if we don't need it . the idea is to read in the database only when we really want to access the code

also I don't see a potential problem with that. The context is used when you read the code from the database, after when you modify the code you will always pass via the account. the worldstate (via accumulator) will check the code in the account and save in the database

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will have in the futur to explain all of this logic in a doc (accumulator, worldstate, updater etc)

@@ -440,7 +429,15 @@ public Map<Bytes32, Bytes> getAllAccountStorage(final Address address, final Has
}

private VerkleTrie createTrie(final NodeLoader nodeLoader, final Bytes32 rootHash) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rootHash unused var, can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks done

this.dataStorageConfiguration = dataStorageConfiguration;
this.flatDbStrategyProvider =
new VerkleFlatDbStrategyProvider(metricsSystem, dataStorageConfiguration);
flatDbStrategyProvider.loadFlatDbStrategy(composedWorldStateStorage);
Copy link
Contributor

@lu-pinto lu-pinto Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that the FlatDbStrategyProvider constructor not calling loadFlatDbStrategy ? Is the object of any use without anything loaded on it?

Found out later that getFlatDbStategy needs the same argument as the load which looks weird. After checking the getter I can see that it calls loadFlatDbStrategy. Any particular reason for that? Why can't we call load in the constructor and make sure `flatDbStrategy is never null inside? Do we need lazy initialization? Only if it makes sense to create a provider and not use it...

BTW in the case of Bonsai we construct and load sequentially all the same: https://github.com/hyperledger/besu/blob/main/ethereum/core/src/main/java/org/hyperledger/besu/ethereum/trie/diffbased/bonsai/storage/BonsaiWorldStateKeyValueStorage.java#L64-L66

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it was needed before but I think we had some refactoring of this part and it's not needed anymore in bonsai and verkle, I updated the code

flatDbStrategy.removeFlatAccountStorageValueByStorageSlotHash(
composedWorldStateTransaction, accountHash, slotHash);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putStateTrieNode and removeStateTrieNode are unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -164,6 +207,19 @@ public Updater putAccountInfoState(final Hash accountHash, final Bytes accountVa
return this;
}

public synchronized Updater putStorageValueBySlotHash(
Copy link
Contributor

@lu-pinto lu-pinto Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these methods synchronized? Then others are not. Does it make sense to have synchronized at all? If so others are also writing to the same object as these.

Actually I suspect it's not needed since this is called to compute the root hash which should only happen at the end on a single thread

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just a copy/paste from Bonsai code, looking at the code I also don't see a need for a synchronized for the moment

@@ -107,7 +108,21 @@ public static void writeTo(final TrieLog layer, final RLPOutput output) {
if (accountChange == null || accountChange.isUnchanged()) {
output.writeNull();
} else {
writeRlp(accountChange, output, (o, sta) -> sta.writeTo(o));
writeRlp(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there's 2 TrieLogFactoryImpl classes that share big part of the code. Maybe there's some code reuse we can do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do some refactoring and avoid as much as possible duplication for this class

noOpSegmentedTx, noOpTx, worldStateKeyValueStorage.getFlatDbStrategy())),
noOpSegmentedTx,
noOpTx,
(VerkleStemFlatDbStrategy) worldStateKeyValueStorage.getFlatDbStrategy())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to define frontierRootHash? From what I can see this is only used to define transaction receipts compliant with Frontier but Berlin is already overriding that. I can't see why we need to have verkle with Frontier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep it this way for now because I’d like to sync the mainnet with Verkle for testing purposes, and I might need this method. We can remove it later when it’s no longer useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment that you want it for that?

public boolean accountExists(final Address address) {
return worldUpdater.get(address) != null;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused, I would remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why it's here 🤔 I will remove it


protected final Counter getStorageValueNotFoundInFlatDatabaseCounter;

public VerkleLegacyFlatDbStrategy(
Copy link
Contributor

@lu-pinto lu-pinto Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could use some unit tests to make sure metrics work as expected at least

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did

* and code. This allows direct access to the leaves of the Verkle trie for stems, reducing
* duplication of data.
*/
public class VerkleStemFlatDbStrategy extends FlatDbStrategy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could use some unit tests to make sure metrics work as expected at least

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

} else {
sta.writeTo(o);
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a unit test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add tests for that

ImmutableVerkleSubStorageConfiguration.VerkleUnstable.builder()
.stemFlatDbEnabled(true)
.build();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DISABLED is not used and it seems DEFAULT == DISABLED. Maybe remove DEFAULT and just do:

@Value.Default
  default VerkleUnstable getUnstable() {
    return VerkleUnstable.DISABLED;
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@lu-pinto lu-pinto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blockers but it could use some cleanups

lu-pinto added a commit to lu-pinto/besu that referenced this pull request Dec 9, 2024
lu-pinto added a commit to lu-pinto/besu that referenced this pull request Dec 9, 2024
lu-pinto added a commit to lu-pinto/besu that referenced this pull request Dec 10, 2024
@lu-pinto lu-pinto mentioned this pull request Dec 10, 2024
8 tasks
lu-pinto added a commit to lu-pinto/besu that referenced this pull request Dec 10, 2024
Signed-off-by: Karim Taam <[email protected]>
@matkt matkt requested a review from lu-pinto December 19, 2024 08:54
Copy link
Contributor

@lu-pinto lu-pinto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@matkt matkt merged commit 1412aef into hyperledger:verkle Dec 20, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants