Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC context slot does not match the actual observed slot, getMultipleAccounts is not atomic #4807

Open
ruuda opened this issue Feb 5, 2025 · 1 comment

Comments

@ruuda
Copy link

ruuda commented Feb 5, 2025

Problem

RPC responses include a field context, which is documented as

An RpcResponseContext JSON structure including a slot field at which the operation was evaluated.

I thought that this meant that when I call getAccountInfo or getMultipleAccounts, I get the account as it was at the end of the block in slot context.slot, but this is not the case.

Below is a minimal reproducer based on the clock sysvar. I believe that the problem is in context.slot, and not in the the clock sysvar, because I first discovered this bug when fetching vote accounts, where I observed the vote credits increasing more than what should have been possible based on how many slots apart these observations were (according to context.slot). When I changed that program to fetch the vote accounts as well as the clock sysvar in a single getMultipleAccounts, I no longer observed fewer impossible vote credits amounts (based on how many slots apart these observations were according to clock.slot).

Repro boilerplate

rust-toolchain.toml:

[toolchain]
channel = "1.79"

Cargo.toml:

[package]
name = "solana-context-slot-repro"
version = "0.1.0"
edition = "2021"

[dependencies]
bincode = "1.3.3"
solana-sdk = "2.1.13"
solana-client = "2.1.13"

I’ll omit the Cargo.lock here as the exact versions of the client libraries are not really relevant, the problem is server-side, and I reproduced against multiple RPCs.

src/main.rs:

use solana_sdk::commitment_config::CommitmentConfig;
use solana_sdk::sysvar;
use solana_client::rpc_client::RpcClient;

fn main() {
    let client = RpcClient::new("https://api.mainnet-beta.solana.com");
    let commitment = CommitmentConfig::finalized();

    loop {
        let response = client
            .get_account_with_commitment(&sysvar::clock::ID, commitment)
            .expect("Failed to fetch clock account from RPC");

        let data = &response
            .value
            .expect("The clock sysvar exists.")
            .data[..];

        let clock: sysvar::clock::Clock = bincode::deserialize(data)
            .expect("The clock sysvar is well-formed.");

        println!(
            "Context slot: {}, clock slot: {}, diff: {}",
            response.context.slot,
            clock.slot,
            response.context.slot as i64 - clock.slot as i64,
        );
    }
}

One run produced the following output:

Context slot: 318679832, clock slot: 318679833, diff: -1
Context slot: 318679832, clock slot: 318679833, diff: -1
Context slot: 318679876, clock slot: 318679878, diff: -2
Context slot: 318679876, clock slot: 318679878, diff: -2
Context slot: 318679876, clock slot: 318679878, diff: -2
Context slot: 318679876, clock slot: 318679878, diff: -2
Context slot: 318679876, clock slot: 318679878, diff: -2
Context slot: 318679876, clock slot: 318679878, diff: -2
Context slot: 318679876, clock slot: 318679878, diff: -2
Context slot: 318679876, clock slot: 318679878, diff: -2
Context slot: 318679876, clock slot: 318679878, diff: -2
Context slot: 318679876, clock slot: 318679878, diff: -2
Context slot: 318679901, clock slot: 318679908, diff: -7
Context slot: 318679901, clock slot: 318679908, diff: -7
Context slot: 318679901, clock slot: 318679908, diff: -7
Context slot: 318679901, clock slot: 318679908, diff: -7
Context slot: 318679901, clock slot: 318679908, diff: -7
Context slot: 318679901, clock slot: 318679908, diff: -7
Context slot: 318679901, clock slot: 318679908, diff: -7
Context slot: 318679901, clock slot: 318679908, diff: -7
Context slot: 318679901, clock slot: 318679908, diff: -7
Context slot: 318679901, clock slot: 318679908, diff: -7
Context slot: 318679959, clock slot: 318679960, diff: -1
Context slot: 318679959, clock slot: 318679960, diff: -1
Context slot: 318679959, clock slot: 318679960, diff: -1
Context slot: 318679959, clock slot: 318679960, diff: -1
Context slot: 318679959, clock slot: 318679961, diff: -2
Context slot: 318679960, clock slot: 318679961, diff: -1
Context slot: 318679959, clock slot: 318679960, diff: -1

Note that not only is the clock sysvar ahead of the context slot returned by the RPC (which I think should not be possible if my understanding of the docs is correct, but that could be a simple off-by-one), the difference fluctuates, and it fluctuates by many slots.

In particular, I got two responses that claimed to be for slot 318679959, but one had the clock sysvar at 318679960, and the other at 318679961. If I read the same account multiple times at finalized commitment level, and the RPC says it read it at slot 318679959, then I expect the account data to be the same in all responses, but that is not the case here.

I can reproduce this behavior against internal RPC nodes at Chorus One running Agave 2.0.21 and 2.1.12, and against api.mainnet-beta.solana.com which at this time returns context.apiVersion: 2.0.21.

This looks like a pretty serious issue to me, because it means that indexers and other tooling that were trusting the RPC’s context slot, may all have bogus data. If the problem is in the accounts db, then there might be correctness issues in other places too. Update: The problem also surfaces as getMultipleAccounts returning different data for the same account in a single call.

Proposed Solution

When using getAccountInfo or getMultipleAccounts to fetch the clock sysvar, the the slot returned in the RPC context and the clock sysvar should be the same.

I skimmed through the implementation of those two calls and they are using the same bank for reading the account and providing the context slot, and getting the account eventually calls bank.get_account, which looks correct to me if that method returns the account as it was at the bank’s slot. That method has some comments that worry me though:

agave/runtime/src/bank.rs

Lines 4967 to 4984 in bd6e9f9

// Hi! leaky abstraction here....
// try to use get_account_with_fixed_root() if it's called ONLY from on-chain runtime account
// processing. That alternative fn provides more safety.
pub fn get_account(&self, pubkey: &Pubkey) -> Option<AccountSharedData> {
self.get_account_modified_slot(pubkey)
.map(|(acc, _slot)| acc)
}
// Hi! leaky abstraction here....
// use this over get_account() if it's called ONLY from on-chain runtime account
// processing (i.e. from in-band replay/banking stage; that ensures root is *fixed* while
// running).
// pro: safer assertion can be enabled inside AccountsDb
// con: panics!() if called from off-chain processing
pub fn get_account_with_fixed_root(&self, pubkey: &Pubkey) -> Option<AccountSharedData> {
self.get_account_modified_slot_with_fixed_root(pubkey)
.map(|(acc, _slot)| acc)
}

I dove into the accounts db code, but at this point it’s getting more subtle than what I can investigate in one evening. Probably somebody who is more familiar with the accounts db code can diagnose this faster.

@ruuda ruuda added the community label Feb 5, 2025
@ruuda
Copy link
Author

ruuda commented Feb 7, 2025

After further testing, even getMultipleAccounts does not atomically fetch multiple accounts. Here’s a repro:

use solana_sdk::commitment_config::CommitmentConfig;
use solana_sdk::sysvar;
use solana_client::rpc_client::RpcClient;

fn main() {
    let client = RpcClient::new("https://api.mainnet-beta.solana.com");
    let commitment = CommitmentConfig::finalized();

    let addrs = vec![sysvar::clock::ID; 100];

    loop {
        let response = client
            .get_multiple_accounts_with_commitment(&addrs[..], commitment)
            .expect("Failed to fetch clock accounts from RPC")
            .value;

        for account in &response[1..] {
            assert_eq!(response[0], *account);
        }

        println!("ok");
    }
}

Output:

ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
thread 'main' panicked at src/main.rs:19:13:
assertion `left == right` failed
  left: Some(Account { lamports: 1169280, data.len: 40, owner: Sysvar1111111111111111111111111111111111111, executable: false, rent_epoch: 18446744073709551615, data: ad950413000000001170a46700000000e202000000000000e3020000000000005df5a56700000000 })
 right: Some(Account { lamports: 1169280, data.len: 40, owner: Sysvar1111111111111111111111111111111111111, executable: false, rent_epoch: 18446744073709551615, data: ae950413000000001170a46700000000e202000000000000e3020000000000005df5a56700000000 })
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@ruuda ruuda changed the title RPC context slot does not match the actual observed slot RPC context slot does not match the actual observed slot, getMultipleAccounts is not atomic Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant