Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accounts-db: relax intrabatch account locks #4253

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

2501babe
Copy link
Member

@2501babe 2501babe commented Jan 3, 2025

Problem

simd83 proposes removing the constraint that transactions in the same entry cannot have read/write or write/write contention on any account. a previous pr modified svm to be able to carry over state changes between such transactions while processing a transaction batch. this pr modifies consensus to remove the account locking rules that prevent such batches from being created or replayed

Summary of Changes

add a new function to AccountLocks, try_lock_transaction_batch, which only checks for locking conflicts with other batches, allowing any account overlap within the batch itself. modify Accounts and Bank to use it instead when the feature gate relax_intrabatch_account_locks is activated. also modify prepare_sanitized_batch_with_results to deduplicate transactions within a batch by signature to prevent replay attacks, such that two instances of the same transaction cause the first to lock out the second, in a similar manner to the non-simd83 behavior for this special case

since transaction results are used more extensively than previous, some functions with *_wiith_results variants have bene collapsed into wrappers around that variant. we also refactor several things to favor iterators over vectors, to avoid places where iters are collected and transformed back into iters

important code changes are confined to accounts-db/src/accounts.rs, accounts-db/src/account_locks.rs, and runtime/src/bank.rs. changes in core, ledger, and runtime transaction batch only affect tests. overall the large majority of changes are fixes or improvements to tests

Feature Gate Issue: https://github.com/anza-xyz/feature-gate-tracker/issues/76

@2501babe 2501babe force-pushed the 20250103_simd83locking branch from 3e016b3 to f748a5f Compare January 3, 2025 10:46
@2501babe 2501babe self-assigned this Jan 3, 2025
@2501babe 2501babe force-pushed the 20250103_simd83locking branch 4 times, most recently from 2fb8527 to 11d1dcb Compare January 7, 2025 11:23
@2501babe 2501babe changed the title accounts-db: only lock accounts across threads accounts-db: disable intrabatch account locks Jan 7, 2025
accounts-db/src/accounts.rs Outdated Show resolved Hide resolved
// HANA TODO the vec allocation here is unfortunate but hard to avoid
// we cannot do this in one closure because of borrow rules
// play around with alternate strategies, according to benches this may be up to
// 50% slower for small batches and few locks, but for large batches and many locks
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bench using jemalloc? i'd think it would do a reasonable job of just keeping the mem in thread-local cache for re-use

@2501babe 2501babe force-pushed the 20250103_simd83locking branch from 95b17b3 to d1ec289 Compare January 13, 2025 12:13
@2501babe 2501babe changed the title accounts-db: disable intrabatch account locks accounts-db: relax intrabatch account locks Jan 13, 2025
@2501babe 2501babe force-pushed the 20250103_simd83locking branch 2 times, most recently from a11ef4d to c234cc6 Compare January 17, 2025 14:23
Comment on lines 13 to 17
#[derive(Debug, Default)]
pub struct AccountLocks {
write_locks: AHashSet<Pubkey>,
write_locks: AHashMap<Pubkey, u64>,
readonly_locks: AHashMap<Pubkey, u64>,
}
Copy link
Member Author

@2501babe 2501babe Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the read- and write-lock hashmaps have the same type now, all the functions that change them are basically the same. we could use an enum or hashmap reference to discriminate and delete half of the functions, but i left it like this for your review before butchering it

Comment on lines +565 to -567
relax_intrabatch_account_locks: bool,
) -> Vec<Result<()>> {
// Validate the account locks, then get iterator if successful validation.
let tx_account_locks_results: Vec<Result<_>> = txs
Copy link
Member Author

@2501babe 2501babe Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accounts::lock_accounts() could be reimpled as a wrapper on lock_accounts_with_results() or possibly deleted, it isnt really required anymore since all batch-building needs to have results. but we could leave it as-is, or could do some kind of refactor with TransactionAccountLocksIterator

Comment on lines -596 to +600
fn lock_accounts_inner(
fn lock_accounts_inner<'a>(
&self,
tx_account_locks_results: Vec<Result<TransactionAccountLocksIterator<impl SVMMessage>>>,
tx_account_locks_results: impl Iterator<
Item = Result<TransactionAccountLocksIterator<'a, impl SVMMessage + 'a>>,
>,
relax_intrabatch_account_locks: bool,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general where possible i changed vecs to iters, we eliminate several uses of collect() and just pass around the closure chains. this makes the type signatures look kind of stupid tho and there are possibly things we could refactor to be better (like maybe combining transactions with the transaction results instead of taking for granted they always have the same length). im undecided about style tho

Comment on lines +603 to +618
if relax_intrabatch_account_locks {
let validated_batch_keys = tx_account_locks_results.map(|tx_account_locks_result| {
tx_account_locks_result
.map(|tx_account_locks| tx_account_locks.accounts_with_is_writable())
});

account_locks.try_lock_transaction_batch(validated_batch_keys)
} else {
tx_account_locks_results
.map(|tx_account_locks_result| match tx_account_locks_result {
Ok(tx_account_locks) => account_locks
.try_lock_accounts(tx_account_locks.accounts_with_is_writable()),
Err(err) => Err(err),
})
.collect()
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main branch that the feature controls. we pass it in as a param so Accounts doesnt need FeatureSet, only Bank. the only other place we use the feature is to enable signature-based transaction deduplication

Comment on lines 3183 to 3199
// with simd83 enabled, we must deduplicate transactions by signature
// previously, conflicting account locks would do it as a side effect
let mut batch_signatures = AHashSet::with_capacity(transactions.len());
let transaction_results =
transaction_results
.enumerate()
.map(|(i, tx_result)| match tx_result {
Ok(())
if relax_intrabatch_account_locks
&& !batch_signatures.insert(transactions[i].signature()) =>
{
Err(TransactionError::AccountInUse)
}
Ok(()) => Ok(()),
Err(e) => Err(e),
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the dedupe step mentioned in a comment above. we could use a double for loop instead of a hashset but this seemed much more straightforward since the inner loop would have to abort based on the outer loop index

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd guess it's almost certainly faster to just do a brute-force since our batches are small (replay will be size 1 for unified-scheduler); but might get complicated with the current iterator interface. Fine to leave it as is.

@2501babe 2501babe added the feature-gate Pull Request adds or modifies a runtime feature gate label Jan 23, 2025
@2501babe 2501babe force-pushed the 20250103_simd83locking branch 2 times, most recently from d378d3a to c6d7105 Compare January 24, 2025 09:04
@2501babe
Copy link
Member Author

2501babe commented Jan 24, 2025

recent sample of bench comparisons, master vs this branch with simd83 enabled

bench_lock_accounts/batch_size_1_locks_count_2
                        time:   [188.03 µs 188.11 µs 188.19 µs]
                        thrpt:  [5.4414 Melem/s 5.4438 Melem/s 5.4461 Melem/s]
bench_lock_accounts/batch_size_1_locks_count_2_old
                        time:   [194.10 µs 194.20 µs 194.32 µs]
                        thrpt:  [5.2696 Melem/s 5.2729 Melem/s 5.2757 Melem/s]

bench_lock_accounts/batch_size_32_locks_count_64
                        time:   [5.3985 ms 5.6560 ms 5.9096 ms]
                        thrpt:  [173.28 Kelem/s 181.05 Kelem/s 189.68 Kelem/s]
bench_lock_accounts/batch_size_32_locks_count_64_simd83
                        time:   [3.0115 ms 3.0123 ms 3.0132 ms]
                        thrpt:  [339.84 Kelem/s 339.94 Kelem/s 340.03 Kelem/s]

bench_lock_accounts/batch_size_64_locks_count_64_read_conflicts
                        time:   [2.8057 ms 2.8066 ms 2.8075 ms]
                        thrpt:  [364.74 Kelem/s 364.86 Kelem/s 364.97 Kelem/s]
bench_lock_accounts/batch_size_64_locks_count_64_read_conflicts_simd83
                        time:   [2.8175 ms 2.8181 ms 2.8188 ms]
                        thrpt:  [363.28 Kelem/s 363.36 Kelem/s 363.44 Kelem/s]

in general we perform slightly worse for tiny batches and as well or better for large batches. note these benches call code in Accounts and AccountLocks but not Bank

@2501babe 2501babe marked this pull request as ready for review January 24, 2025 12:19
@2501babe 2501babe requested a review from apfitzge January 24, 2025 12:19
Comment on lines 3179 to 3180
// with simd83 enabled, we must deduplicate transactions by signature
// previously, conflicting account locks would do it as a side effect

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we must dedup because we check for already_processed in a batch, then process, then add to the status_cache.

Is that correct summary of why we need to do this now?

Copy link
Member Author

@2501babe 2501babe Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not fully confident in my understanding of the status cache but i believe the way it works is you execute the batch and the signatures of processed (non-dropped) transactions go in the status cache only after theyve all been run. there are no checks in svm (nor should there be) for duplicate transactions within a batch

if replay is going to single-batch everything i guess it would enforce this as a side effect but it seemed good to do it here, since this code already did enforce this constraint (a malicious block that put the same transaction in one entry multiple times would fail locking in replay, without anything involving status cache, because the transactions would take the same write lock on the fee-payer)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we should enforce it for sure


// with simd83 enabled, we must deduplicate transactions by signature
// previously, conflicting account locks would do it as a side effect
let mut batch_signatures = AHashSet::with_capacity(transactions.len());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use message_hash here instead of signature.

status-cache uses both, but signature is only necessary for RPC operation for fast signature lookup.
iirc reason to use message hash is because of signature malleability.

Copy link
Member Author

@2501babe 2501babe Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i used it because message_hash isnt provided by SVMMessage or SVMTransaction. would you like me to add it to SVMTransaction? its available on SanitizedTransaction but providing it from SanitizedMessage would require us to add it to LegacyMessage and v0::LoadedMessage

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably just change the trait bound to TransactionWithMeta on this function, that trait should provide it, and I'm fairly certain the things we actually call this with impl it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to squash the past because rebasing was getting ugly but this is bae70c0

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the function directly as part of TransactionWithMeta, just call message_hash() on the TransactionWithMeta tx.

as_sanitized_transaction (unless it IS a sanitized transaction) will create a sanitized transaction. and possibly do 100s of allocations.
That fn is only there because of legacy interfaces - we shouldn't use it anywhere it's not strictly necessary (should be only geyser rn)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, i see it now. i only looked at TransactionWithMeta rather than StaticMeta

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a comment on that fn. I'll update it to be more clear that basically no one should be using that, except for the couple places we call into geyser.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made comments on that trait better here - #4827

@joncinque
Copy link

This PR contains changes to the solana sdk, which will be moved to a new repo within the next week, when v2.2 is branched from master.

Please merge or close this PR as soon as possible, or re-create the sdk changes when the new repository is ready at https://github.com/anza-xyz/solana-sdk

@2501babe 2501babe force-pushed the 20250103_simd83locking branch from d2cf375 to bae70c0 Compare February 5, 2025 15:03
@2501babe
Copy link
Member Author

2501babe commented Feb 5, 2025

@joncinque this is a core runtime pr, the only sdk change is adding the feature gate. i assume when the new repo is created the procedure is going to be like:

  • pr the new feature gate to sdk which will be accepted and merged without needing to see any outside code
  • keep this pr, rebase on new sdk dependency and remove changes to sdk/feature-set/ which will no longer exist
  • continue with code review as normal

right?

@joncinque
Copy link

Yep that sounds right. I'm also adding a new FeatureSet constructor so that agave can define its own features in this repo and avoid that annoying back and forth. 90% of PRs with changes to the sdk were just feature additions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-gate Pull Request adds or modifies a runtime feature gate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants