-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: introduce mutex for state and lastCommitInfo to avoid race conditions #22692
base: main
Are you sure you want to change the base?
Conversation
…betwwen Commit and CreateQueryContext
📝 Walkthrough📝 WalkthroughWalkthroughThe pull request introduces several enhancements and bug fixes across multiple modules within the Cosmos SDK. Key changes include the addition of features for simulating nested messages, a new Linux-only backend for the crypto/keyring module, and improvements to integration tests. Concurrency control is enhanced through the introduction of mutex locks to prevent race conditions in state management. The changelog has been updated to reflect these changes, and new test cases have been added to ensure the stability of the application under concurrent conditions. Changes
Assessment against linked issues
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Outside diff range and nitpick comments (13)
go.mod (1)
218-219
: Consider documenting the temporary nature of this replace directive.Since this replace directive is part of the race condition fix, consider moving it to the temporary replace section with a comment explaining its purpose.
// Here are the short-lived replace from the Cosmos SDK // Replace here are pending PRs, or version to be tagged -// replace ( -// <temporary replace> -// ) +replace ( + // Temporary: Using local store module for race condition fix (#22650) + cosmossdk.io/store => ./store +)CHANGELOG.md (4)
Line range hint
1-1
: Add version table at the top of changelogConsider adding a version table at the top that summarizes all v0.46.x releases with their dates and key changes for easier reference.
Line range hint
11-13
: Fix inconsistent bullet point formattingThe bullet points use a mix of
*
and-
. Standardize on one format throughout the document for consistency.
59-61
: Add impact details for breaking changesFor breaking changes like the Dragonberry security fix, consider adding more details about potential impact on users and required actions.
Line range hint
1-1000
: Fix typos and grammatical errorsSeveral typos and grammatical errors found throughout the document:
- "typographical" misspelled as "typograhical"
- Missing periods at end of some bullet points
- Inconsistent capitalization in section headers
store/rootmulti/store.go (1)
63-63
: Naming convention for mutex variables.Consider renaming
lastCommitInfoMut
tolastCommitInfoMu
to align with Go conventions, where mutex variables are typically suffixed withMu
.baseapp/baseapp.go (2)
127-127
: Consider renamingstateMut
tostateMutex
for clarity.According to the Uber Go Style Guide, abbreviations in variable names should be avoided to enhance readability. Renaming
stateMut
tostateMutex
makes the purpose of the mutex clearer.Apply this diff to improve clarity:
- stateMut sync.RWMutex + stateMutex sync.RWMutex
Line range hint
583-586
: Potential deadlock due to inconsistent lock ordering betweenapp.mu
andapp.stateMut
.In
getContextForTx
,app.mu
is locked before callingapp.getState(mode)
, which in turn locksapp.stateMut
. If elsewhereapp.stateMut
is locked before attempting to acquireapp.mu
, this could lead to a deadlock due to inconsistent lock ordering.Recommend reviewing the locking order to ensure that
app.mu
andapp.stateMut
are always acquired in a consistent order throughout the codebase to prevent potential deadlocks.baseapp/abci.go (3)
609-609
: Handle possible errors fromCacheContext
In the assignments:
ctx, _ = app.getState(execModeFinalize).Context().CacheContext()Ignoring the second return value without handling could overlook potential issues. Although
CacheContext()
may not currently return an error, future changes could introduce errors.Consider capturing both return values explicitly for clarity:
ctx, ms := app.getState(execModeFinalize).Context().CacheContext() // use ms if neededAlso applies to: 689-689
747-747
: Clarify the comment for better understandingThe comment at line 747 is unclear:
// only used to handle early cancellation, for anything related to state app.getState(execModeFinalize).Context()
Consider rephrasing for clarity:
// Use ctx only for early cancellation handling. For state-related operations, use app.getState(execModeFinalize).Context()
1194-1194
: Review the use ofCacheContext
In:
ctx, _ = app.getState(execModeFinalize).Context().CacheContext()Ensure that caching the context here is intended and that any modifications do not affect the main context unintentionally.
baseapp/abci_test.go (2)
2783-2830
: Add a function comment to describe the purpose of the testTo improve code readability and maintainability, consider adding a comment before the
TestABCI_Race_Commit_Query
function to explain its purpose and what it is testing.
2796-2797
: Simplify atomic counter initializationSince the zero value of
atomic.Uint64
is zero, the explicit call tocounter.Store(0)
is unnecessary. You can simplify the code by removing this call.Apply the following diff:
-counter := atomic.Uint64{} -counter.Store(0) +var counter atomic.Uint64
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
📒 Files selected for processing (7)
CHANGELOG.md
(1 hunks)baseapp/abci.go
(21 hunks)baseapp/abci_test.go
(2 hunks)baseapp/baseapp.go
(7 hunks)baseapp/test_helpers.go
(1 hunks)go.mod
(1 hunks)store/rootmulti/store.go
(5 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
CHANGELOG.md (1)
Pattern **/*.md
: "Assess the documentation for misspellings, grammatical errors, missing documentation and correctness"
baseapp/abci.go (1)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
baseapp/abci_test.go (2)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
Pattern **/*_test.go
: "Assess the unit test code assessing sufficient code coverage for the changes associated in the pull request"
baseapp/baseapp.go (1)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
baseapp/test_helpers.go (1)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
store/rootmulti/store.go (1)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
🔇 Additional comments (20)
baseapp/test_helpers.go (2)
47-47
: LGTM: Thread-safe state access implemented
The change from direct state access to using getState
aligns with the PR's objective of preventing race conditions. This modification ensures thread-safe access to the state during simulations.
Let's verify the thread safety implementation:
✅ Verification successful
Thread-safe state access confirmed via RLock/RUnlock
The verification confirms that getState
method properly implements thread-safety using RLock
/RUnlock
from stateMut
. The implementation correctly protects state access across different execution modes, including the execModeFinalize
used in the test helper.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that getState method uses proper synchronization
# Expected: Find mutex usage in getState implementation
ast-grep --pattern 'func (app *BaseApp) getState($_) $_ {
$$$
}'
Length of output: 831
53-58
: Thread-safe state access implemented, verify performance impact
The changes correctly implement thread-safe state access using getState
for both check and finalize modes. This aligns with fixing the race conditions identified in issue #22650.
Let's verify the usage patterns and potential performance impact:
✅ Verification successful
Based on the search results, I can now provide a final response since I have enough information about the usage patterns of NewContextLegacy
:
Thread safety changes are safe with minimal performance impact
The changes to NewContextLegacy
using getState
are safe because:
- The method is primarily used in test code and initialization paths, not in hot execution paths
- Main usage patterns are in:
- Test helpers and test suites
- App initialization and genesis setup
- Export functionality
- Simulation framework
The added synchronization through getState
won't cause performance issues since this method isn't called in performance-critical paths or high-frequency operations.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for high-frequency calls to NewContextLegacy that might be affected by the added synchronization
# Expected: Identify critical paths that might need optimization
# Find direct calls to NewContextLegacy
rg "NewContextLegacy\(" -A 2
# Find potential high-frequency callers through NewContext
rg "NewContext\(" -A 2
Length of output: 40662
go.mod (1)
219-219
: LGTM! Verify version consistency with other local modules.
The addition of the store module replace directive aligns with the existing pattern and is necessary for implementing the race condition fix. However, we should ensure version consistency.
✅ Verification successful
Let me verify the version consistency further by checking other local module replacements.
Version consistency verified across local module replacements
The replace directive for the store module follows the same pattern as other local module replacements (api, bank, staking, tx). All local modules use Go 1.23, and the version specified in the main go.mod (v1.1.1-0.20240418092142-896cdf1971bc) is consistent with the development nature of these changes.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check version consistency between go.mod and local store module
# Verify the store module's go.mod version matches the version specified in the main go.mod
# Get the version from main go.mod
echo "Main module store version:"
grep "cosmossdk.io/store" go.mod | head -n 1
# Get the version from local store module
echo "Local store module version:"
if [ -f "store/go.mod" ]; then
grep "module cosmossdk.io/store" store/go.mod
fi
Length of output: 329
Script:
#!/bin/bash
# Check all local module replacements and their versions
echo "Main go.mod versions:"
grep "cosmossdk.io/" go.mod | grep -v "replace" | sort
echo -e "\nReplace directives:"
grep "replace cosmossdk.io/" go.mod | sort
echo -e "\nLocal module versions:"
for dir in api bank staking tx store; do
if [ -f "$dir/go.mod" ]; then
echo -e "\n$dir/go.mod:"
grep "^module" "$dir/go.mod"
grep "^go 1." "$dir/go.mod"
fi
done
Length of output: 1573
store/rootmulti/store.go (7)
63-63
: Introduction of lastCommitInfoMut
for concurrency control.
Adding lastCommitInfoMut sync.RWMutex
to the Store
struct appropriately introduces a mutex to prevent race conditions when accessing lastCommitInfo
.
292-294
: Proper locking around lastCommitInfo
assignment in loadVersion()
.
Locking lastCommitInfoMut
before updating rs.lastCommitInfo
ensures thread-safe write operations.
440-446
: Consistent use of LastCommitInfo()
in LatestVersion()
.
Updating LatestVersion()
to use rs.LastCommitInfo()
ensures safe concurrent access to lastCommitInfo
.
448-451
: Thread-safe access to lastCommitInfo
via LastCommitInfo()
method.
The new LastCommitInfo()
method correctly implements read locking to allow safe concurrent reads of lastCommitInfo
.
456-474
: Safe access to lastCommitInfo
in LastCommitID()
.
Using rs.LastCommitInfo()
in LastCommitID()
prevents data races during read operations of lastCommitInfo
.
518-521
: Ensure minimal locking when updating lastCommitInfo.Timestamp
.
Holding the lock only during the assignment to lastCommitInfo.Timestamp
is acceptable. Confirm that no other operations within the lock could cause delays or potential deadlocks.
800-802
: Thread-safe retrieval of lastCommitInfo
in Query()
.
Using rs.LastCommitInfo()
in the Query()
method ensures safe concurrent access and prevents race conditions.
baseapp/abci.go (9)
72-72
: Good use of encapsulation with app.getState
The introduction of app.getState(execModeFinalize)
enhances state management by encapsulating state retrieval, improving code maintainability and readability.
78-78
: Ensure req.ConsensusParams
is validated
While storing consensus parameters, make sure that req.ConsensusParams
is not nil and contains valid data to prevent potential runtime errors.
90-97
: Consistent Context Update for States
Updating both checkState
and finalizeState
contexts ensures consistency across different execution modes. This is a good practice for maintaining state integrity.
110-110
: Initialization of Infinite Gas Meter
Setting an infinite gas meter during InitChain
is appropriate to allow unrestricted operations during genesis. Ensure that this does not inadvertently persist into states where gas limits should be enforced.
936-939
: Ensure proper mutex usage when modifying shared state
Locking app.stateMut
before modifying app.finalizeBlockState
is correct. Ensure that all other accesses to shared state variables are similarly protected to prevent race conditions.
1024-1026
: Mutex locking and unlocking
Proper use of app.stateMut.Lock()
and app.stateMut.Unlock()
ensures thread-safe operations when modifying shared states.
1047-1047
: Ensure ms.Write()
is safe and necessary
Calling ms.Write()
writes pending changes to the parent store. Confirm that this call is necessary and that any potential errors are handled appropriately.
1409-1409
: Verify consensus parameters retrieval
Retrieving consensus parameters using app.getState(execModeFinalize).Context()
should be thread-safe and consistent with the application's state at this point.
978-979
:
Accessing finalizeState
without synchronization
Accessing finalizeState
here may lead to race conditions if not properly synchronized. Verify that app.stateMut
is held during this access.
Run the following script to identify unsynchronized accesses to app.getState
:
Also applies to: 983-983
baseapp/abci_test.go (1)
2796-2797
: Verify Go version compatibility for atomic.Uint64
The atomic.Uint64
type is available from Go 1.19 onwards. Ensure that the project specifies Go 1.19 or newer in its configuration to avoid compatibility issues.
Run the following script to check the Go version specified in the go.mod
file:
✅ Verification successful
Go version compatibility verified for atomic.Uint64
The project uses Go 1.23.2, which fully supports atomic.Uint64
(introduced in Go 1.19). No compatibility issues exist.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check that the project requires Go 1.19 or newer.
# Look for the 'go' directive in the go.mod file
grep "^go " go.mod
Length of output: 30
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
store/rootmulti/store.go (1)
514-522
: Consider combining the mutex locks.The code currently acquires and releases the mutex twice in quick succession. Consider combining these operations to reduce lock overhead:
- rs.lastCommitInfoMut.Lock() - rs.lastCommitInfo = cInfo - rs.lastCommitInfoMut.Unlock() - - rs.lastCommitInfoMut.Lock() - rs.lastCommitInfo.Timestamp = rs.commitHeader.Time - rs.lastCommitInfoMut.Unlock() + rs.lastCommitInfoMut.Lock() + rs.lastCommitInfo = cInfo + rs.lastCommitInfo.Timestamp = rs.commitHeader.Time + rs.lastCommitInfoMut.Unlock()
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
📒 Files selected for processing (1)
store/rootmulti/store.go
(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
store/rootmulti/store.go (1)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
🔇 Additional comments (4)
store/rootmulti/store.go (4)
63-63
: LGTM: Good choice of RWMutex for lastCommitInfo synchronization.
Using sync.RWMutex is appropriate here as it allows concurrent reads while ensuring exclusive writes, which aligns with the access patterns of lastCommitInfo.
292-294
: LGTM: Appropriate lock scope in LoadVersion.
The mutex lock is correctly scoped to only protect the critical section (lastCommitInfo assignment), minimizing lock contention.
448-451
: LGTM: Well-implemented thread-safe accessor.
The method correctly uses RLock/RUnlock for concurrent read access to lastCommitInfo. The defer statement ensures proper unlock even in case of panics.
440-446
: LGTM: Consistent usage of thread-safe accessor.
All methods consistently use the thread-safe LastCommitInfo accessor, properly addressing the race condition between Commit and CreateQueryContext operations.
Also applies to: 456-474, 802-804
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
store/rootmulti/store.go (4)
63-63
: Excellent use of atomic.Pointer for thread safety.Using atomic.Pointer instead of mutex locks is a more efficient solution for this use case since the CommitInfo content itself is not modified concurrently, only the pointer is updated atomically.
This approach provides better performance than mutex locks as it:
- Avoids lock contention
- Allows for lock-free reads
- Provides atomic pointer updates
438-443
: Consider caching the atomic load result.Multiple accesses to lastCommitInfo.Load() could be optimized by storing the result in a local variable.
-lastCommitInfo := rs.lastCommitInfo.Load() -if lastCommitInfo == nil { - return GetLatestVersion(rs.db) -} - -return lastCommitInfo.Version +if info := rs.lastCommitInfo.Load(); info == nil { + return GetLatestVersion(rs.db) +} else { + return info.Version +}
510-512
: Optimize multiple atomic loads.Multiple consecutive calls to lastCommitInfo.Load() should be consolidated to a single load.
-cInfo := rs.lastCommitInfo.Load() -cInfo.Timestamp = rs.commitHeader.Time - +if cInfo := rs.lastCommitInfo.Load(); cInfo != nil { + cInfo.Timestamp = rs.commitHeader.Time +}
791-793
: Consolidate atomic load operation.The atomic load result should be cached to avoid multiple loads and null checks.
-lastCommitInfo := rs.lastCommitInfo.Load() -if res.Height == lastCommitInfo.Version { - commitInfo = lastCommitInfo +if info := rs.lastCommitInfo.Load(); info != nil && res.Height == info.Version { + commitInfo = info
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
📒 Files selected for processing (1)
store/rootmulti/store.go
(9 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
store/rootmulti/store.go (1)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
🔇 Additional comments (1)
store/rootmulti/store.go (1)
12-12
: LGTM: Required import for atomic operations.
The addition of "sync/atomic" import is necessary for the atomic pointer implementation.
@@ -124,6 +124,7 @@ type BaseApp struct { | |||
prepareProposalState *state | |||
processProposalState *state | |||
finalizeBlockState *state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make notes on these *state
variables that they should only be accessed and updated via the set
and get
methods instead of directly?
Alternatively creating a wrapper that does not actually expose the *state
unless these getters or setters are called
Description
Closes: #22650
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
!
in the type prefix if API or client breaking changeCHANGELOG.md
Reviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
Please see Pull Request Reviewer section in the contributing guide for more information on how to review a pull request.
I have...
Summary by CodeRabbit
New Features
Improvements
Bug Fixes