Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task]: endpoint for account state retrieval (FPI) #485

Closed
Mirko-von-Leipzig opened this issue Sep 6, 2024 · 11 comments
Closed

[Task]: endpoint for account state retrieval (FPI) #485

Mirko-von-Leipzig opened this issue Sep 6, 2024 · 11 comments
Assignees
Milestone

Comments

@Mirko-von-Leipzig
Copy link
Contributor

Mirko-von-Leipzig commented Sep 6, 2024

Node's components affected by this task

  • RPC
  • Store
  • Protobuf messages

What should be done?

Support FPI (foreign procedure invocation) in the node by adding a new endpoint which outputs the state of a given public account.

FPI is the ability to invoke procedures from other accounts. This requires knowledge of that accounts state to perform, so we need to provide access to this information.

Currently the only way to access random public account data is by staying in sync with the chain via diffs. This is impractical as it requires syncing with all public account (iiuc) and staying in sync. This endpoint will allow fetching the account state once-off.

How should it be done?

  • The endpoint should be added to rpc and store.
  • The endpoint should include the requested account data and the proof thereof.

What data to include is still up for discussion but a starting point might be to always include the flat state items and to only include the map items specified by the caller.

When is this task done?

New endpoint has been added.

Additional context

See miden-base#847 for the original foreign procedure invocation context.

@bobbinth
Copy link
Contributor

bobbinth commented Sep 6, 2024

This is impractical as it requires syncing with all public account (iiuc) and staying in sync.

We don't need to do this for all public accounts - only for ones we are interested in involving in out transactions. But we do have to stay in sync with them.

Overall, I think the endpoint could look something like this:

message GetAccountStateRequest {
    // ID of the account for which we'd like to retrieve the state.
    AccountId account_id = 1;
    // Keys of storage maps for which we'd like to get values.
    repeat StorageMapKey storage_map_keys = 2;
}

message StorageMapKey {
    // Index of the storage slot containing the storage map.
    uint32 slot_index = 1;
    // Key for which we want to request the value.
    Digest key = 2;
}

message GetAccountStateResponse {
    // Block number at which the state of the account was returned.
    fixed32 block_num = 1;
    // Account header consisting of account_id, vault_root, storage_root, code_root, and nonce.
    AccountHeader header = 2;
    // Authentication path from the account_root for the bloke header to the account.
    MerklePath account_proof = 3;
    /// Values of all account storage slots (max 256).
    repeat Digest storage_slots = 4;
    // A list of key-value pairs (and their corresponding proofs) for the requested keys.
    repeat StorageMapItem map_items = 5;
}

message StorageMapItem {
    // Index of the storage slot containing the storage map.
    uint32 slot_index = 1;
    // Opening containing key, value, and a proof attesting that the key opens to the value.
    SmtOpening opening = 2;
}

Some open questions:

  • We don't send any underlying info about the assets stored in the account's vault. Sending all assets may be prohibitively expensive for accounts which store lots of assets. But at the same time, it is not clear how we can request only a small subset of assets.
  • The client may want to requests states of multiple accounts (e.g., if they want to execute a transaction which reads states of multiple oracles). Doing this with the endpoint proposed above would be difficult because we return the latest known account states, and so, if the blockchain advances between between the request for account A and the request for account B, the client may end up with inconsistent data. One way to solve it is to request states of multiple accounts in a single request.
  • Should we include full block header instead of just block_num in the response? The assumption here was that the client would probably first sync to the tip and then get the account states it is interested in, and so, the client would already have the block header. But maybe that's not always the case?

@bobbinth bobbinth added this to the v0.6 milestone Sep 9, 2024
@polydez
Copy link
Contributor

polydez commented Sep 23, 2024

I propose the next interface:

import "smt.proto";
import "merkle.proto";

message GetAccountStateRequest {
    // List of account state requests.
    repeated AccountStateRequest account_state_requests = 1;
}

message AccountStateRequest {
    // ID of the account for which we'd like to retrieve the state.
    account.AccountId account_id = 1;
    // Keys of storage maps for which we'd like to get values.
    repeated StorageMapKey storage_map_keys = 2;
    // List of asset IDs (asset keys in vault SMT) for which we'd like to get values.
    // If empty, all assets will be returned.
    repeated digest.Digest asset_ids = 3;
}

message StorageMapKey {
    // Index of the storage slot containing the storage map.
    uint32 slot_index = 1;
    // Key for which we want to request the value.
    digest.Digest key = 2;
}

message GetAccountStateResponse {
    // Block number at which the state of the account was returned.
    fixed32 block_num = 1;
    // List of account state infos for the requested account keys.
    repeated AccountStateResponse account_state_infos = 2;
}

message AccountStateResponse {
    // Account header consisting of account_id, vault_root, storage_root, code_root, and nonce.
    /*account.*/AccountHeader header = 1;
    // Authentication path from the account_root for the block header to the account.
    merkle.MerklePath account_proof = 2;
    /// Values of all account storage slots (max 255).
    repeated digest.Digest storage_slots = 3;
    // A list of key-value pairs (and their corresponding proofs) for the requested keys.
    repeated StorageMapItem map_items = 4;
    // A list of assets (and their corresponding proofs) for the requested asset IDs.
    repeated smt.SmtOpening assets = 5;
}

message AccountHeader {
    // Account ID.
    account.AccountId account_id = 1;
    // Vault root hash.
    digest.Digest vault_root = 2;
    // Storage root hash.
    digest.Digest storage_root = 3;
    // Code root hash.
    digest.Digest code_root = 4;
    // Account nonce.
    uint32 nonce = 5;
}

message StorageMapItem {
    // Index of the storage slot containing the storage map.
    uint32 slot_index = 1;
    // Opening containing key, value, and a proof attesting that the key opens to the value.
    smt.SmtOpening opening = 2;
}

I took into account all the questions below:

  • We don't send any underlying info about the assets stored in the account's vault. Sending all assets may be prohibitively expensive for accounts which store lots of assets. But at the same time, it is not clear how we can request only a small subset of assets.

For each account requested I added optional list of asset IDs. The response contains information necessary to recreate vault SMT with only requested assets.

  • The client may want to requests states of multiple accounts (e.g., if they want to execute a transaction which reads states of multiple oracles). Doing this with the endpoint proposed above would be difficult because we return the latest known account states, and so, if the blockchain advances between between the request for account A and the request for account B, the client may end up with inconsistent data. One way to solve it is to request states of multiple accounts in a single request.

In the proposed solution we are able to request multiple accounts in a single request.

  • Should we include full block header instead of just block_num in the response? The assumption here was that the client would probably first sync to the tip and then get the account states it is interested in, and so, the client would already have the block header. But maybe that's not always the case?

I think, it's enough to return just block_num. If the client doesn't have the block referenced in response, it can request block header by using GetBlockHeaderByNumber endpoint.

Two things are unclear for me:

  1. What if the client wants to request not a value from the storage map, but just a scalar from the slot? Should we support this?
  2. In this proposal we return all account's assets if the assets' IDs list is empty. Maybe it would be better to return only requested assets and none of them if the list is empty?

What do you think, @bobbinth, @Mirko-von-Leipzig?

@Mirko-von-Leipzig
Copy link
Contributor Author

Mirko-von-Leipzig commented Sep 23, 2024

The interface looks sensible to me.

Two things are unclear for me:

  1. What if the client wants to request not a value from the storage map, but just a scalar from the slot? Should we support this?

I think one of the discussed options was that we always return all scalar values by default, and that non-scalar data must be requested by the interface you've defined. Alternatively we need additional params; but I think just always sending all of it is fine. This really depends on the usage patterns.

  1. In this proposal we return all account's assets if the assets' IDs list is empty. Maybe it would be better to return only requested assets and none of them if the list is empty?

I don't think we should default to returning all assets. This would end poorly for accounts with many assets (are these bounded in some way?).

These sorts of endpoints very quickly become DoS vectors - even by well-intentioned callers. We will likely have to limit the query amounts at some stage in the future. So we should definitely not begin with all (except scalars which are limited to 255 units and are relatively cheap).

Is it possible for a caller to not know about all assets/keys? i.e. is there a use case for "send me all of it" so that the user can learn about unknown data?

@bobbinth
Copy link
Contributor

For each account requested I added optional list of asset IDs.

This would be meaningful only for fungible assets - so, not sure we should do it this way. What I would probably do for now is add an optional flag - something like include_assets and have it be set to "false" by default.

What if the client wants to request not a value from the storage map, but just a scalar from the slot? Should we support this?

As @Mirko-von-Leipzig mentioned, we should always return the full list of scalar values for all storage slots. In the worst case, this would be 8KB of data - but in most cases will probably be less than a few hundred bytes.

I've also realized that for storage slots, we need to return also slot types. So, the response would look something like this:

message AccountStateResponse {
    // Account header consisting of account_id, vault_root, storage_root, code_root, and nonce.
    account.AccountHeader header = 1;
    // Authentication path from the account_root for the block header to the account.
    merkle.MerklePath account_proof = 2;
    /// Values of all account storage slots (max 255).
    StorageHeader storage_header = 3;
    // A list of key-value pairs (and their corresponding proofs) for the requested keys.
    repeated StorageMapItem map_items = 4;
    // An optional list of all assets in the account.
    repeated bytes assets = 5;
}

In the above, storage slots are defined as storage header - which would be similar to the StorageHeader struct we recently introduced in miden-base. We could also probably serialize storage header and send it as just bytes.

Also, for assets I used just bytes because describing them in protobuf format may be tricky because we have fungible and non-fungible assets.

These sorts of endpoints very quickly become DoS vectors - even by well-intentioned callers.

Totally agreed. We'll need to think of ways how to handle accounts with lots of storage and many assets. The main issue is how to return all desired data to client without locking the database for too long while keeping the data consistent. This is all due to the fact that we keep only the latest account states, and so, we can't return data "in chunks" against some old state.

I don't have a great solution to this yet.

@Mirko-von-Leipzig
Copy link
Contributor Author

These sorts of endpoints very quickly become DoS vectors - even by well-intentioned callers.

Totally agreed. We'll need to think of ways how to handle accounts with lots of storage and many assets. The main issue is how to return all desired data to client without locking the database for too long while keeping the data consistent. This is all due to the fact that we keep only the latest account states, and so, we can't return data "in chunks" against some old state.

I don't have a great solution to this yet.

If we store the account update diffs then we can do this by paginating. Some Ethereum client do their initial state sync like this, and starknet has similar plans for p2p.

Each page we return is a chunk of continuous storage. Because its a continuous chunk, the proof of the entire chunk consists of a proof of the left-most and right-most element. The chunks can originate from different blocks.

Once all chunks are received, we also need the account storage diff from the earliest to the latest block in the set. We apply these diffs and now we should be at the state of the latest block.

If we don't want to store all the account diffs (I'm still not familiar with what we do/don't store), we could do it on an adhoc basis. As in, when the node receives an account sync request it starts storing the updates to that account specifically, until the request chain completes or we determine lack of interest due to a timeout between requests.

This is fairly complex to implement though. And also requires the receiver to understand how to build the merkle trees. Storage proofs are merkle trees, right?

@bobbinth
Copy link
Contributor

If we store the account update diffs then we can do this by paginating.

We store the diffs, but currently not super efficiently. We can make it much more efficient though (I thought there was an issue for that, but I can't find it now).

Each page we return is a chunk of continuous storage. Because its a continuous chunk, the proof of the entire chunk consists of a proof of the left-most and right-most element. The chunks can originate from different blocks.

I think we an assume that if a user wants to retrieve storage, they ether know exactly what they need (e.g., they know the keys for the maps), or they need the whole thing. In the "whole think" case, we don't actually need to return the Merkle paths - the user will be able to instantiate account storage from raw data.

Once all chunks are received, we also need the account storage diff from the earliest to the latest block in the set. We apply these diffs and now we should be at the state of the latest block.

This could work, but seems complicated from the user's standpoint. Would be awesome to simplify it somehow - but if we are dealing with a truly big state (e.g., 100 MB), not sure if there is a simple solution.

@Mirko-von-Leipzig
Copy link
Contributor Author

Each page we return is a chunk of continuous storage. Because its a continuous chunk, the proof of the entire chunk consists of a proof of the left-most and right-most element. The chunks can originate from different blocks.

I think we an assume that if a user wants to retrieve storage, they ether know exactly what they need (e.g., they know the keys for the maps), or they need the whole thing. In the "whole think" case, we don't actually need to return the Merkle paths - the user will be able to instantiate account storage from raw data.

The proofs are protection against bad chunk data in a p2p context; so if the caller 100% trusts us then no need for the proofs; or they can do the final check at the very end.

Once all chunks are received, we also need the account storage diff from the earliest to the latest block in the set. We apply these diffs and now we should be at the state of the latest block.

This could work, but seems complicated from the user's standpoint. Would be awesome to simplify it somehow - but if we are dealing with a truly big state (e.g., 100 MB), not sure if there is a simple solution.

If we're storing the diffs then we can also paginate the data at a given block height by aggregating the diffs backwards from the latest. However this can get expensive.

@bobbinth
Copy link
Contributor

If we're storing the diffs then we can also paginate the data at a given block height by aggregating the diffs backwards from the latest. However this can get expensive.

I think we can limit the use case here. We only need to get this info close to the chain tip - so, maybe we say that this endpoint works as long as you want data within 100 blocks of the chain tip. This way, there won't ever be more than 100 deltas to apply to w/e diff we get.

@polydez
Copy link
Contributor

polydez commented Sep 25, 2024

@bobbinth

If we store the account update diffs then we can do this by paginating.

We store the diffs, but currently not super efficiently. We can make it much more efficient though (I thought there was an issue for that, but I can't find it now).

I guess you mean this issue: #290

@bobbinth
Copy link
Contributor

I guess you mean this issue: #290

I was thinking more about the ideas described in #290 (comment) (i.e., having dedicated tables to store account deltas).

@polydez
Copy link
Contributor

polydez commented Oct 4, 2024

Resolved by #506

@polydez polydez closed this as completed Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants