Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ChainHelper] reduce complexity, pool based noncemanager #2332

Merged
merged 17 commits into from
Dec 8, 2024

Conversation

nick-bisonai
Copy link
Collaborator

@nick-bisonai nick-bisonai commented Dec 7, 2024

Description

Remove additional json rpc usage from the logic
use nonce pool with pool reset on regular interval

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.

Deployment

  • Should publish npm package
  • Should publish Docker image

@nick-bisonai nick-bisonai self-assigned this Dec 7, 2024
Copy link
Contributor

coderabbitai bot commented Dec 7, 2024

Caution

Review failed

The pull request is closed.

📝 Walkthrough
📝 Walkthrough

Walkthrough

The changes in this pull request primarily focus on refactoring the ChainHelper struct and its associated methods within the helper package. Key modifications include the transition from multiple clients to a single primary client, the introduction of a new nonce manager, and the removal of the maxTxSubmissionRetries parameter from various method signatures. Additionally, several test functions have been updated to reflect these changes. The overall structure aims to enhance clarity and maintainability while simplifying nonce management and transaction submission processes.

Changes

File Path Change Summary
node/pkg/chain/helper/helper.go Updated import for nonce manager to noncemanagerv2. Simplified client management by removing the clients slice and using a single primaryClient. Refactored methods to utilize the new nonce manager and streamlined transaction submission logic. Added new method SubmitDirect.
node/pkg/chain/helper/signer.go Modified signerUpdate method to remove maxTxSubmissionRetries parameter from SubmitDelegatedFallbackDirect call.
node/pkg/chain/helper/types.go Removed clients field and added client and noncemanager fields in ChainHelper struct. Removed WithoutAdditionalProviderUrls function.
node/pkg/chain/noncemanagerv2/noncemanagerv2.go Introduced NonceManagerV2 struct with methods for nonce management, including GetNonce, Reset, fillPool, and flushPool.
node/pkg/chain/tests/chain_test.go Renamed TestNewChainHelper to TestNewSigner. Updated test cases to reflect changes in method signatures, particularly removing maxTxSubmissionRetries from submissions. Added logging for better traceability.
node/pkg/checker/inspect/app.go Removed WithoutAdditionalProviderUrls from NewChainHelper instantiation in Setup function.
node/pkg/por/app.go Removed WithoutAdditionalProviderUrls option from NewChainHelper call.
node/pkg/reporter/app.go Removed WithoutAdditionalProviderUrls from setReporters method. Added a 2-second sleep between reporter initializations in startReporters.
node/pkg/reporter/reporter.go Updated report method to remove maxTxSubmissionRetries from SubmitDelegatedFallbackDirect call.
node/script/test_dal_consumer/main.go Removed maxTxSubmissionRetries from SubmitDelegatedFallbackDirect method calls.
node/script/test_submission/main.go Removed maxTxSubmissionRetries from SubmitDelegatedFallbackDirect call in testContractFeeDelegatedCall function.

Possibly related PRs

Suggested reviewers

  • Intizar-T

Poem

In the code where bunnies hop,
A nonce manager takes the top.
With clients few and logic clear,
We streamline tasks, bring cheer!
Transactions dance, no retries in sight,
Hopping along, everything feels right! 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0f1926d and 06040a4.

📒 Files selected for processing (2)
  • node/pkg/chain/noncemanagerv2/noncemanagerv2.go (1 hunks)
  • node/pkg/reporter/app.go (1 hunks)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@nick-bisonai nick-bisonai changed the title [ChainHelper] reduce complicity, pool based noncemanager [ChainHelper] reduce complexity, pool based noncemanager Dec 7, 2024
@nick-bisonai nick-bisonai marked this pull request as ready for review December 7, 2024 13:34
@nick-bisonai nick-bisonai requested a review from a team as a code owner December 7, 2024 13:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (4)
node/pkg/chain/noncemanagerv2/noncemanagerv2.go (2)

63-69: Inefficient Channel Draining in unsafeFlushPool

The unsafeFlushPool method drains the nonce pool by reading from the channel until it's empty:

for len(pool) > 0 {
	<-pool
}

This approach can be inefficient. Consider replacing the existing channel with a new one to clear the pool more efficiently.

Apply this diff to replace the channel:

func (m *NonceManagerV2) unsafeFlushPool(address string) {
+   if _, exists := m.noncePool[address]; exists {
+       m.noncePool[address] = make(chan uint64, noncePoolBufferSize)
+   }
}

37-38: Use Constants for Channel Buffer Size

The channel buffer size is hardcoded as 30:

m.noncePool[address] = make(chan uint64, 30)

Consider defining a constant for the buffer size to improve readability and maintainability.

Apply this diff to introduce a constant:

 const (
     poolSize               = 15 // expect maximum 15 submission per minute
     minimumNoncePoolSize   = 3
     poolAutoRefillInterval = time.Minute
+    noncePoolBufferSize    = 30
 )

...

 m.noncePool[address] = make(chan uint64, noncePoolBufferSize)
node/pkg/chain/helper/helper.go (1)

187-210: Improved Error Handling in SubmitDelegatedFallbackDirect

In SubmitDelegatedFallbackDirect, if GetSignedFromDelegator fails, the error is not logged before falling back to a direct transaction:

if err != nil {
	tx, err = utils.MakeDirectTx(ctx, t.client, contractAddress, t.wallet, functionString, t.chainID, nonce, args...)

Consider logging the error from GetSignedFromDelegator to aid in debugging.

Apply this diff to log the error:

if err != nil {
+   log.Error().Err(err).Msg("Failed to get signed transaction from delegator, falling back to direct transaction")
    tx, err = utils.MakeDirectTx(ctx, t.client, contractAddress, t.wallet, functionString, t.chainID, nonce, args...)
    if err != nil {
        return err
    }
    return utils.SubmitRawTx(ctx, t.client, tx)
}
node/pkg/reporter/app.go (1)

127-127: Consider a more robust solution for managing reporter concurrency

While adding sleep reduces concurrent starts, it might not be the most efficient solution. Consider alternatives:

  1. Using a semaphore to control concurrent reporters
  2. Implementing a proper rate limiter
  3. Using worker pools with controlled concurrency

Example implementation using a semaphore:

-		time.Sleep(2 * time.Second) // sleep for less concurrency
+		// Create a semaphore with desired concurrency limit
+		sem := make(chan struct{}, 3) // Allow 3 concurrent reporters
+		sem <- struct{}{} // Acquire
+		go func(reporter *Reporter) {
+			defer func() { <-sem }() // Release
+			reporter.Run(ctx)
+		}(reporter)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between cd5b9cc and ec5cafa.

⛔ Files ignored due to path filters (1)
  • .github/workflows/dal.test.yaml is excluded by !**/*.yaml
📒 Files selected for processing (11)
  • node/pkg/chain/helper/helper.go (6 hunks)
  • node/pkg/chain/helper/signer.go (1 hunks)
  • node/pkg/chain/helper/types.go (1 hunks)
  • node/pkg/chain/noncemanagerv2/noncemanagerv2.go (1 hunks)
  • node/pkg/chain/tests/chain_test.go (7 hunks)
  • node/pkg/checker/inspect/app.go (1 hunks)
  • node/pkg/por/app.go (0 hunks)
  • node/pkg/reporter/app.go (3 hunks)
  • node/pkg/reporter/reporter.go (1 hunks)
  • node/script/test_dal_consumer/main.go (2 hunks)
  • node/script/test_submission/main.go (1 hunks)
💤 Files with no reviewable changes (1)
  • node/pkg/por/app.go
🔇 Additional comments (15)
node/pkg/chain/noncemanagerv2/noncemanagerv2.go (1)

85-95: Potential Unbounded Goroutine in StartAutoRefill

The StartAutoRefill method runs indefinitely in a goroutine without a proper shutdown mechanism:

for {
	select {
	case <-ctx.Done():
		return
	case <-time.After(poolAutoRefillInterval):
		err := m.refillAll(ctx)
		if err != nil {
			log.Error().Err(err).Msg("Failed to refill nonce pool")
		}
	}
}

Ensure that the context passed to StartAutoRefill is appropriately canceled to prevent goroutine leaks when the application shuts down.

node/pkg/chain/helper/helper.go (1)

92-96: Initialize nonceManager Before Starting Auto Refill

The nonceManager is initialized and StartAutoRefill is called in the background:

nonceManager := noncemanagerv2.New(primaryClient)
go nonceManager.StartAutoRefill(ctx)

Ensure that nonceManager is fully initialized before starting the auto-refill routine to prevent race conditions.

node/script/test_submission/main.go (1)

22-22: Update Function Call to Match New Method Signature

The call to SubmitDelegatedFallbackDirect has been updated to remove the maxTxSubmissionRetries parameter:

return kaiaHelper.SubmitDelegatedFallbackDirect(ctx, contractAddress, contractFunction, args...)

This change aligns with the updated method signature. Ensure that any necessary retry logic is handled appropriately elsewhere if required.

node/pkg/chain/helper/types.go (2)

16-20: ChainHelper Struct Refactored for Simplification

The ChainHelper struct now uses a single client instead of a slice of clients, and a noncemanager has been added:

type ChainHelper struct {
	client       utils.ClientInterface
	wallet       string
	chainID      *big.Int
	delegatorUrl string
	noncemanager *noncemanagerv2.NonceManagerV2
}

This simplification improves code maintainability and aligns with the updated nonce management approach.


Line range hint 34-36: Removal of WithoutAdditionalProviderUrls Function

The WithoutAdditionalProviderUrls function has been removed:

// func WithoutAdditionalProviderUrls() ChainHelperOption {
// 	return func(c *ChainHelperConfig) {
// 		c.UseAdditionalProviderUrls = false
// 	}
// }

Verify that any code dependent on this option has been updated accordingly to prevent unintended behavior.

node/script/test_dal_consumer/main.go (2)

Line range hint 77-84: Update Method Call to Reflect Updated Signature

The SubmitDelegatedFallbackDirect method is called without the maxTxSubmissionRetries parameter:

err = kaiaHelper.SubmitDelegatedFallbackDirect(ctx, contractAddr, SUBMIT_STRICT, feedHashes, values, timestamps, proofs)

This update aligns with the new method signature. Ensure that the removal of the retry parameter does not affect the expected behavior.


Line range hint 90-95: Consistent Update of Method Call Signature

Similarly, the second call to SubmitDelegatedFallbackDirect has been updated:

err = kaiaHelper.SubmitDelegatedFallbackDirect(ctx, contractAddr, SUBMIT_STRICT, feedHashes, values, timestamps, proofs)

Confirm that this change is consistent throughout the codebase and that any additional retry logic is implemented if necessary.

node/pkg/reporter/reporter.go (1)

139-139: Verify error handling for transaction submissions

With the removal of explicit retry control, we should ensure that transaction failures are still properly handled by the internal nonce management system.

✅ Verification successful

Let me gather more information about the error handling implementation in the KaiaHelper.


Based on the results, I can now provide a final response. The code shows a robust nonce management system with the following key aspects:

Transaction submission is well-handled with improved nonce management

The removal of maxTxSubmissionRetries is justified as the system now uses a sophisticated nonce management approach:

  • Uses NonceManagerV2 with automatic nonce pool refilling
  • Implements thread-safe nonce allocation with proper mutex locks
  • Handles nonce errors and maintains a pool of available nonces
  • Includes fallback to direct transaction submission if delegated transaction fails

The error handling is actually more robust now with the nonce pool system that:

  1. Automatically refills when running low (minimumNoncePoolSize check)
  2. Maintains thread-safety for concurrent submissions
  3. Includes proper error propagation and logging
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for error handling patterns around SubmitDelegatedFallbackDirect
rg -A 5 "SubmitDelegatedFallbackDirect.*err\s*:=|err\s*=.*SubmitDelegatedFallbackDirect" 

# Look for any retry-related code in the new implementation
ast-grep --pattern 'func $_ ($_, $_) {
  $$$
  retry $$$
  $$$
}'

Length of output: 2282


Script:

#!/bin/bash
# Look for the implementation of SubmitDelegatedFallbackDirect
ast-grep --pattern 'func $_ SubmitDelegatedFallbackDirect($$$) {
  $$$
}'

# Search for any transaction retry or nonce management related code
rg -B 2 -A 5 "nonce|retry|backoff" node/pkg/chain/

Length of output: 26546

node/pkg/reporter/app.go (1)

54-54: LGTM: Simplified chain helper initialization

The removal of additional provider URLs aligns with the PR's goal of reducing complexity.

node/pkg/chain/helper/signer.go (1)

243-243: Verify signer renewal reliability

The removal of explicit retry control in signer updates is a critical change. We need to ensure that signer renewal remains reliable, as failed renewals could impact system security.

Consider implementing a dead man's switch or monitoring system to detect and alert on signer renewal failures, as this is a critical security operation.

✅ Verification successful

Let me gather more information about the transaction submission and error handling mechanisms.


Based on the gathered information, I can now provide the final response:

Signer renewal remains reliable despite retry removal

The signer renewal system has multiple layers of protection that ensure reliability:

  • Automatic renewal checks at regular intervals via autoRenew
  • Comprehensive error logging for all failure scenarios
  • Admin endpoints for manual intervention if needed
  • Early renewal threshold to provide time buffer for retries

The removal of explicit retry control is acceptable because:

  • The underlying transaction system already handles basic retries
  • The periodic auto-renewal provides implicit retry capability
  • Failed renewals are prominently logged for operational visibility
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for signer renewal error handling patterns
rg -A 10 "func.*Renew.*|func.*signerUpdate.*" 

# Look for any recovery mechanisms in signer-related code
ast-grep --pattern 'func $_ ($_, $_) {
  $$$
  defer $$$
  $$$
}'

Length of output: 4984


Script:

#!/bin/bash
# Check the implementation of SubmitDelegatedFallbackDirect for retry/error handling
ast-grep --pattern 'func $_ SubmitDelegatedFallbackDirect($$$) {
  $$$
}'

# Look for error handling patterns in chain helper
rg -A 5 "failed to.*signer" 

# Check for monitoring or alerting mechanisms
rg -A 5 "monitoring|alert.*signer"

Length of output: 6188

node/pkg/checker/inspect/app.go (1)

99-99: LGTM: Simplified chain helper initialization

The removal of WithoutAdditionalProviderUrls() aligns with the PR's objective of reducing complexity in JSON RPC usage.

Let's verify that this change is consistent across the codebase:

✅ Verification successful

Based on the search results, I can now provide a definitive response about the removal of WithoutAdditionalProviderUrls(). The codebase search shows that:

  1. There are no occurrences of WithoutAdditionalProviderUrls or WithAdditionalProviderUrls in the codebase
  2. The NewChainHelper function is used consistently across the codebase with minimal options, primarily WithReporterPk and WithBlockchainType
  3. The test files confirm that the chain helper initialization is working correctly with the simplified approach

LGTM: Simplified chain helper initialization is consistent

The removal of WithoutAdditionalProviderUrls() is safe and aligns with the codebase's current patterns. All other usages of NewChainHelper are already using the simplified initialization approach.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining usage of WithoutAdditionalProviderUrls
# Expect: No occurrences as this option should be removed

rg "WithoutAdditionalProviderUrls"

Length of output: 34


Script:

#!/bin/bash
# Let's verify the chain helper initialization patterns across the codebase
# First, find all usages of NewChainHelper
rg "NewChainHelper" -A 3

# Also check for any WithAdditionalProviderUrls to ensure we're not missing any variants
rg "WithAdditionalProviderUrls"

# Let's also check the helper package to understand the available options
fd "chain.*\.go" --exec cat {} \;

Length of output: 27589

node/pkg/chain/tests/chain_test.go (4)

Line range hint 54-60: LGTM: Improved test focus

The rename from TestNewChainHelper to TestNewSigner better reflects the test's purpose and aligns with the codebase's evolution towards simplified functionality.


242-254: LGTM: Simplified transaction submission API

The removal of maxTxSubmissionRetries parameter from SubmitDelegatedFallbackDirect aligns with the PR's objective of reducing complexity. The nonce pool-based approach should handle retries more efficiently.


95-95: LGTM: Enhanced test observability

Added logging statements improve test debugging and error tracing capabilities.

Also applies to: 124-124


751-756: Verify cleanup transaction submission

The cleanup operations in the test are critical. Ensure proper error handling and verification of transaction success.

Let's verify the transaction submission cleanup:

✅ Verification successful

Proper error handling is implemented for cleanup transactions

The verification shows that:

  • The cleanup operations in the test file properly check for errors after each SubmitDelegatedFallbackDirect call
  • The implementation in helper.go includes error handling for nonce management
  • All other usages across the codebase consistently handle errors, either through error returns or panic (in test scripts)
  • The specific cleanup transactions in the review are properly wrapped with error checks that fail the test if an error occurs
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for proper error handling in cleanup operations
# Expect: Error handling around SubmitDelegatedFallbackDirect calls

ast-grep --pattern 'SubmitDelegatedFallbackDirect($$$) {
  $$$
  if err != nil {
    $$$
  }
  $$$
}'

Length of output: 103


Script:

#!/bin/bash
# Let's try a different approach to check the implementation and usage of SubmitDelegatedFallbackDirect

# First, let's find the implementation
ast-grep --pattern 'func $_ SubmitDelegatedFallbackDirect($$$) $_ {
  $$$
}'

# Also check all usages with context
rg -A 5 "SubmitDelegatedFallbackDirect" 

Length of output: 4016

node/pkg/chain/noncemanagerv2/noncemanagerv2.go Outdated Show resolved Hide resolved
node/pkg/chain/noncemanagerv2/noncemanagerv2.go Outdated Show resolved Hide resolved
node/pkg/chain/helper/helper.go Show resolved Hide resolved
node/pkg/chain/helper/helper.go Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
node/pkg/chain/helper/helper.go (2)

187-210: Improve error handling in transaction submission.

The fallback mechanism could benefit from more specific error types to distinguish between different failure scenarios (e.g., delegation failure vs. transaction creation failure).

+type DelegationError struct {
+    Err error
+}
+
+func (e *DelegationError) Error() string {
+    return fmt.Sprintf("delegation failed: %v", e.Err)
+}

 func (t *ChainHelper) SubmitDelegatedFallbackDirect(ctx context.Context, contractAddress, functionString string, args ...interface{}) error {
     // ... existing code ...
     tx, err = t.GetSignedFromDelegator(tx)
     if err != nil {
+        if _, ok := err.(*DelegationError); !ok {
+            return fmt.Errorf("unexpected error during delegation: %w", err)
+        }
         tx, err = utils.MakeDirectTx(ctx, t.client, contractAddress, t.wallet, functionString, t.chainID, nonce, args...)
         // ... rest of the code ...
     }

222-224: Add documentation for FlushNoncePool method.

The new method lacks documentation explaining its purpose, when to use it, and potential side effects.

+// FlushNoncePool forces a refresh of the nonce pool, clearing any cached nonces
+// and fetching the latest nonce from the blockchain. This should be used with
+// caution as it may impact ongoing transactions.
 func (t *ChainHelper) FlushNoncePool(ctx context.Context) error {
     return t.noncemanager.Refill(ctx, t.wallet)
 }
node/pkg/chain/noncemanagerv2/noncemanagerv2.go (1)

89-101: Optimize Periodic Refill with time.Ticker Instead of time.After

In StartAutoRefill, using time.After within a loop creates a new timer each iteration, which can be inefficient:

case <-time.After(poolAutoRefillInterval):

Using time.Ticker is more efficient for periodic execution as it avoids creating new timers repeatedly.

Modify the loop to utilize time.Ticker:

func (m *NonceManagerV2) StartAutoRefill(ctx context.Context) {
+	ticker := time.NewTicker(poolAutoRefillInterval)
+	defer ticker.Stop()
	for {
		select {
		case <-ctx.Done():
			return
-		case <-time.After(poolAutoRefillInterval):
+		case <-ticker.C:
			err := m.refillAll(ctx)
			if err != nil {
				log.Error().Err(err).Msg("Failed to refill nonce pool")
			}
		}
	}
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ec5cafa and 58a7350.

📒 Files selected for processing (3)
  • node/pkg/chain/helper/helper.go (6 hunks)
  • node/pkg/chain/noncemanagerv2/noncemanagerv2.go (1 hunks)
  • node/pkg/reporter/reporter.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • node/pkg/reporter/reporter.go
🔇 Additional comments (5)
node/pkg/chain/helper/helper.go (3)

85-85: Ensure secure storage of private keys.

The private key handling requires careful consideration for security.


193-193: Fix potential integer overflow in nonce logging.

The nonce value casting could cause overflow.


10-10: Verify the noncemanagerv2 implementation.

The switch to noncemanagerv2 represents a significant architectural change.

✅ Verification successful

NonceManagerV2 implementation is thread-safe and well-structured

The new nonce manager implementation is properly designed with:

  • Thread-safe operations using mutex locks
  • Buffered channels for nonce pooling (capacity: 30)
  • Automatic pool refill mechanism with proper error handling
  • Clean separation between safe (locked) and unsafe operations
  • Context-aware operations for proper cancellation

The implementation shows good practices for concurrent access and resource management, making it a solid improvement over the previous version.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the noncemanagerv2 implementation and its usage
# Check for any potential race conditions or concurrency issues in the new implementation

# Search for concurrent access patterns in noncemanagerv2
ast-grep --pattern 'type NonceManagerV2 struct {
  $$$
}'

# Look for mutex usage and goroutine safety
rg -A 5 "sync\." "pkg/chain/noncemanagerv2"

Length of output: 576


Script:

#!/bin/bash
# Let's examine the implementation details of NonceManagerV2 more thoroughly

# Check the implementation of methods
rg -A 10 "func.*NonceManagerV2" node/pkg/chain/noncemanagerv2/

# Look for mutex usage patterns
rg -A 5 "mu\." node/pkg/chain/noncemanagerv2/

# Check how nonce pool is managed
rg -A 5 "noncePool" node/pkg/chain/noncemanagerv2/

# Look for potential initialization and cleanup
rg -A 5 "New.*NonceManagerV2" node/pkg/chain/noncemanagerv2/

Length of output: 10274

node/pkg/chain/noncemanagerv2/noncemanagerv2.go (2)

32-48: ⚠️ Potential issue

Potential Deadlock in GetNonce Due to Channel Read While Holding Mutex

In GetNonce, the mutex is held while reading from the nonce channel:

nonce := <-m.noncePool[address]

If the channel is empty, the read operation will block while the mutex is still held, preventing other goroutines from accessing noncePool to refill it. This can lead to a deadlock.

Apply this diff to release the mutex before reading from the channel:

func (m *NonceManagerV2) GetNonce(ctx context.Context, address string) (uint64, error) {
	m.mu.Lock()
	if _, ok := m.noncePool[address]; !ok {
		m.noncePool[address] = make(chan uint64, poolSize)
	}
	if len(m.noncePool[address]) < minimumNoncePoolSize {
		if err := m.unsafeRefill(ctx, address); err != nil {
			m.mu.Unlock()
			return 0, fmt.Errorf("failed to refill nonce pool: %w", err)
		}
	}
	m.mu.Unlock()

	nonce := <-m.noncePool[address]
	return nonce, nil
}

77-87: 🛠️ Refactor suggestion

Ensure All Nonce Pools Are Refilled Even If Some Refill Operations Fail

In the refillAll method, if an error occurs while refilling the nonce pool for one address, the function returns immediately:

if err := m.unsafeRefill(ctx, address); err != nil {
	return err
}

This means that nonce pools for other addresses will not be refilled, which could impact transaction submissions for those addresses.

Modify the function to continue refilling other addresses and collect any errors:

func (m *NonceManagerV2) refillAll(ctx context.Context) error {
	var errs []error
	for address := range m.noncePool {
		if err := m.unsafeRefill(ctx, address); err != nil {
			errs = append(errs, fmt.Errorf("failed to refill nonce pool for address %s: %w", address, err))
		}
	}
	if len(errs) > 0 {
		return fmt.Errorf("refillAll encountered errors: %v", errs)
	}
	return nil
}

node/pkg/chain/helper/helper.go Outdated Show resolved Hide resolved
node/pkg/chain/noncemanagerv2/noncemanagerv2.go Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)
node/pkg/chain/helper/helper.go (4)

187-209: Improve error handling in SubmitDelegatedFallbackDirect

The error handling could be enhanced to provide more context about fallback scenarios.

 func (t *ChainHelper) SubmitDelegatedFallbackDirect(ctx context.Context, contractAddress, functionString string, args ...interface{}) error {
     nonce, err := t.noncemanager.GetNonce(ctx, t.wallet)
     if err != nil {
-        return err
+        return fmt.Errorf("failed to get nonce: %w", err)
     }
     log.Debug().Uint64("nonce", nonce).Msg("nonce")

     tx, err := utils.MakeFeeDelegatedTx(ctx, t.client, contractAddress, t.wallet, functionString, t.chainID, nonce, args...)
     if err != nil {
-        return err
+        return fmt.Errorf("failed to make fee delegated tx: %w", err)
     }

     tx, err = t.GetSignedFromDelegator(tx)
     if err != nil {
+        log.Debug().Err(err).Msg("delegator signing failed, falling back to direct tx")
         tx, err = utils.MakeDirectTx(ctx, t.client, contractAddress, t.wallet, functionString, t.chainID, nonce, args...)
         if err != nil {
-            return err
+            return fmt.Errorf("failed to make direct tx after delegator failure: %w", err)
         }
         return utils.SubmitRawTx(ctx, t.client, tx)
     }

     return utils.SubmitRawTx(ctx, t.client, tx)
 }

212-219: Add debug logging for transaction submission

Consider adding debug logging to track transaction submission status.

 func (t *ChainHelper) SubmitDirect(ctx context.Context, contractAddress, functionString string, args ...interface{}) error {
+    log.Debug().Str("contract", contractAddress).Str("function", functionString).Msg("submitting direct transaction")
     tx, err := t.MakeDirectTx(ctx, contractAddress, functionString, args...)
     if err != nil {
+        log.Debug().Err(err).Msg("failed to make direct transaction")
         return err
     }

     return utils.SubmitRawTx(ctx, t.client, tx)
 }

221-223: Add documentation for FlushNoncePool

Please add documentation explaining:

  • When this method should be called
  • What are the implications of flushing the nonce pool
  • Any potential side effects
+// FlushNoncePool forces a refresh of the nonce pool for the given wallet.
+// This should be called when nonce synchronization issues are detected or
+// when manual intervention in nonce management is required.
 func (t *ChainHelper) FlushNoncePool(ctx context.Context) error {
     return t.noncemanager.Refill(ctx, t.wallet)
 }

Line range hint 1-223: Document architectural changes and migration path

This PR represents significant architectural changes:

  1. Transition from multiple clients to single client
  2. Introduction of pool-based nonce management
  3. Simplified transaction submission logic

Please ensure:

  1. Update main documentation to reflect these changes
  2. Provide migration guide for users
  3. Add integration tests for the new nonce management system
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 58a7350 and 7918d95.

⛔ Files ignored due to path filters (1)
  • .github/workflows/reporter.test.yaml is excluded by !**/*.yaml
📒 Files selected for processing (2)
  • node/pkg/chain/helper/helper.go (6 hunks)
  • node/pkg/chain/noncemanagerv2/noncemanagerv2.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • node/pkg/chain/noncemanagerv2/noncemanagerv2.go
🔇 Additional comments (2)
node/pkg/chain/helper/helper.go (2)

86-87: Ensure proper goroutine cleanup.

The StartAutoRefill goroutine lacks a proper cleanup mechanism.


101-101: 🛠️ Refactor suggestion

Add cleanup for noncemanager

The Close method should also clean up the noncemanager to prevent resource leaks.

 func (t *ChainHelper) Close() {
+    if t.noncemanager != nil {
+        t.noncemanager.Close()
+    }
     t.client.Close()
 }

Likely invalid or redundant comment.

node/pkg/chain/helper/helper.go Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
node/pkg/chain/noncemanagerv2/noncemanagerv2.go (3)

76-82: Inefficient Flushing of Nonce Pool in unsafeFlushPool

In unsafeFlushPool, the method flushes the nonce pool by consuming all elements:

for len(pool) > 0 {
	<-pool
}

This loop can be inefficient and can cause unnecessary blocking if the channel is large or if other goroutines are interacting with the channel.

Consider resetting the channel by recreating it, which is more efficient and thread-safe.

Apply this diff to reinitialize the channel:

func (m *NonceManagerV2) unsafeFlushPool(address string) {
	if _, exists := m.noncePool[address]; exists {
-		for len(m.noncePool[address]) > 0 {
-			<-m.noncePool[address]
-		}
+		m.noncePool[address] = make(chan uint64, poolSize)
	}
}

64-67: Error Wrapping for Better Context in unsafeInitPool

In unsafeInitPool, when an error occurs fetching the current nonce, it returns the error directly:

if err != nil {
	return err
}

Wrap the error with additional context to improve debugability.

Apply this diff to wrap the error:

if err != nil {
-	return err
+	return fmt.Errorf("failed to get current nonce for address %s: %w", address, err)
}

19-22: Clarify Constants with Descriptive Names

The constants poolSize, minimumNoncePoolSize, and poolAutoRefillInterval could be more descriptive to improve code readability.

Consider renaming the constants:

const (
-	poolSize               = 30 // expect maximum 15 submission per minute
-	minimumNoncePoolSize   = 5
-	poolAutoRefillInterval = time.Minute
+	maxNoncePoolSize       = 30 // expected max 15 submissions per minute
+	minNoncePoolSize       = 5
+	noncePoolRefillInterval = time.Minute
)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7918d95 and 0312a52.

📒 Files selected for processing (2)
  • node/pkg/chain/helper/helper.go (6 hunks)
  • node/pkg/chain/noncemanagerv2/noncemanagerv2.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • node/pkg/chain/helper/helper.go
🔇 Additional comments (2)
node/pkg/chain/noncemanagerv2/noncemanagerv2.go (2)

51-54: Ensure Mutex Protection in Reset Method

In the Reset method, after acquiring the mutex, it calls unsafeInitPool:

return m.unsafeInitPool(ctx, address)

Ensure that unsafeInitPool does not attempt to lock the mutex again to prevent potential deadlocks.

Review the unsafeInitPool method to confirm it does not lock the mutex.


31-48: ⚠️ Potential issue

Potential Deadlock in GetNonce Due to Blocking Channel Read While Holding Mutex

In the GetNonce method, the mutex mu is held throughout the function, including during the read from the nonce channel:

nonce := <-m.noncePool[address]

If the channel is empty, this read operation will block while holding the mutex, potentially causing a deadlock and preventing other goroutines from accessing noncePool. Consider releasing the mutex before reading from the channel to avoid blocking other operations.

Apply this diff to release the mutex before reading from the channel:

func (m *NonceManagerV2) GetNonce(ctx context.Context, address string) (uint64, error) {
	m.mu.Lock()
	if _, ok := m.noncePool[address]; !ok {
		m.noncePool[address] = make(chan uint64, poolSize)
		if err := m.unsafeInitPool(ctx, address); err != nil {
			m.mu.Unlock()
			return 0, fmt.Errorf("failed to refill nonce pool: %w", err)
		}
	}
	if len(m.noncePool[address]) < minimumNoncePoolSize {
		m.unsafeFillPool(address)
	}
	m.mu.Unlock()

	nonce := <-m.noncePool[address]
	return nonce, nil
}

node/pkg/chain/noncemanagerv2/noncemanagerv2.go Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (4)
node/pkg/chain/helper/helper.go (2)

189-208: Consider adding retry mechanism for transaction submission

While the fallback logic from delegated to direct transactions is well implemented, consider adding a retry mechanism for network-related failures during transaction submission.

 func (t *ChainHelper) SubmitDelegatedFallbackDirect(ctx context.Context, contractAddress, functionString string, args ...interface{}) error {
     nonce := t.noncemanager.GetNonce()
     log.Debug().Uint64("nonce", nonce).Msg("nonce")
 
     tx, err := utils.MakeFeeDelegatedTx(ctx, t.client, contractAddress, t.wallet, functionString, t.chainID, nonce, args...)
     if err != nil {
         return err
     }
 
     tx, err = t.GetSignedFromDelegator(tx)
     if err != nil {
         tx, err = utils.MakeDirectTx(ctx, t.client, contractAddress, t.wallet, functionString, t.chainID, nonce, args...)
         if err != nil {
             return err
         }
-        return utils.SubmitRawTx(ctx, t.client, tx)
+        return utils.RetrySubmitRawTx(ctx, t.client, tx, maxRetries)
     }
 
-    return utils.SubmitRawTx(ctx, t.client, tx)
+    return utils.RetrySubmitRawTx(ctx, t.client, tx, maxRetries)
 }

220-222: Document FlushNoncePool usage guidelines

The FlushNoncePool method is a critical operation that resets the nonce state. Please add documentation to clarify:

  1. When this method should be called
  2. Potential side effects
  3. Recovery mechanisms if called inappropriately
node/pkg/chain/noncemanagerv2/noncemanagerv2.go (2)

18-22: Consider optimizing pool size parameters

The current poolSize of 100 might be excessive compared to the minimumNoncePoolSize of 5. This large gap could lead to unnecessary memory allocation. Consider reducing poolSize or making it configurable based on transaction volume.

const (
-	poolSize               = 100
+	poolSize               = 30  // or make it configurable
	minimumNoncePoolSize   = 5
	poolAutoRefillInterval = time.Minute
)

11-82: Consider architectural improvements for production readiness

The current implementation could benefit from the following architectural improvements:

  1. Add logging for debugging concurrent operations
  2. Implement metrics for monitoring pool usage and refill operations
  3. Add recovery mechanisms for channel panics
  4. Consider implementing a circuit breaker for RPC calls

Would you like me to provide specific implementation details for any of these improvements?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0312a52 and 0f1926d.

📒 Files selected for processing (2)
  • node/pkg/chain/helper/helper.go (6 hunks)
  • node/pkg/chain/noncemanagerv2/noncemanagerv2.go (1 hunks)
🔇 Additional comments (5)
node/pkg/chain/helper/helper.go (3)

10-10: Verify noncemanagerv2 migration impact

The transition from noncemanager to noncemanagerv2 represents a breaking change. Please ensure:

  1. All dependent code is updated to handle any API changes
  2. Migration documentation is available for users

85-85: Ensure secure handling of private keys

The private key wallet is obtained from config.ReporterPk. Ensure secure storage and access of private keys.


86-98: LGTM: Proper error handling in constructor

The initialization of noncemanager and error handling is well implemented. The code properly checks for errors and returns early if any initialization fails.

node/pkg/chain/noncemanagerv2/noncemanagerv2.go (2)

42-52: ⚠️ Potential issue

Fix potential deadlock in GetNonce

The mutex is held while reading from the channel, which could lead to a deadlock if the channel is empty. This blocks other goroutines from accessing the noncePool.

func (m *NonceManagerV2) GetNonce() uint64 {
	m.mu.Lock()
	if len(m.noncePool) < minimumNoncePoolSize {
		m.fillPool()
	}
+	m.mu.Unlock()

	nonce := <-m.noncePool
-	m.mu.Unlock()
	return nonce
}

71-76: ⚠️ Potential issue

Fix blocking issue in fillPool

The method blocks on channel read without checking channel state, which could lead to deadlock.

-func (m *NonceManagerV2) fillPool() {
-	nonce := <-m.noncePool
-	for i := 0; i < poolSize-len(m.noncePool); i++ {
-		m.noncePool <- nonce + uint64(i)
-	}
-}
+func (m *NonceManagerV2) fillPool(lastNonce uint64) {
+	remaining := poolSize - len(m.noncePool)
+	for i := 0; i < remaining; i++ {
+		m.noncePool <- lastNonce + uint64(i+1)
+	}
+}

node/pkg/chain/noncemanagerv2/noncemanagerv2.go Outdated Show resolved Hide resolved
node/pkg/chain/noncemanagerv2/noncemanagerv2.go Outdated Show resolved Hide resolved
@nick-bisonai nick-bisonai merged commit da98ae7 into master Dec 8, 2024
1 check passed
@nick-bisonai nick-bisonai deleted the feat/remove-manual-nonce-management branch December 8, 2024 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant