chainntfns: Panics during initial sync blow up consumers. #827

JoeGruffins · 2022-11-04T01:29:00Z

Specifically, if wallet is locked during recovery btcwallet panics. Trying to use the wallet blows up whatever app is using it if a user simply locks the wallet at the wrong time. I also don't think there is a way to check if btcwallet is currently doing recovery.

I wonder if it's possible to remove the panic logic in *Wallet.handleChainNotifications altogether and handle errors more gracefully?

The point of panic that is causing some pain

btcwallet/wallet/chainntfns.go

Lines 115 to 119 in 27f244e

    
           err = w.syncWithChain(birthdayBlock) 
        
           if err != nil && !w.ShuttingDown() { 
        
           	panic(fmt.Errorf("unable to synchronize "+ 
        
           		"wallet to chain: %v", err)) 
        
           }

The problematic recovery that we can not currently escape unless shutdown. Locking the wallet here panics.

btcwallet/wallet/wallet.go

Lines 665 to 792 in 27f244e

    
           // recovery attempts to recover any unspent outputs that pay to any of our 
        
           // addresses starting from our birthday, or the wallet's tip (if higher), which 
        
           // would indicate resuming a recovery after a restart. 
        
           func (w *Wallet) recovery(chainClient chain.Interface, 
        
           	birthdayBlock *waddrmgr.BlockStamp) error { 
        
           	log.Infof("RECOVERY MODE ENABLED -- rescanning for used addresses "+ 
        
           		"with recovery_window=%d", w.recoveryWindow) 
        
           	// Wallet locking must synchronize with the end of recovery, since use of 
        
           	// keys in recovery is racy with manager IsLocked checks, which could 
        
           	// result in enrypting data with a zeroed key. 
        
           	syncer := &recoverySyncer{done: make(chan struct{})} 
        
           	w.recovering.Store(syncer) 
        
           	defer close(syncer.done) 
        
           	// We'll initialize the recovery manager with a default batch size of 
        
           	// 2000. 
        
           	recoveryMgr := NewRecoveryManager( 
        
           		w.recoveryWindow, recoveryBatchSize, w.chainParams, 
        
           	) 
        
           	// In the event that this recovery is being resumed, we will need to 
        
           	// repopulate all found addresses from the database. Ideally, for basic 
        
           	// recovery, we would only do so for the default scopes, but due to a 
        
           	// bug in which the wallet would create change addresses outside of the 
        
           	// default scopes, it's necessary to attempt all registered key scopes. 
        
           	scopedMgrs := make(map[waddrmgr.KeyScope]*waddrmgr.ScopedKeyManager) 
        
           	for _, scopedMgr := range w.Manager.ActiveScopedKeyManagers() { 
        
           		scopedMgrs[scopedMgr.Scope()] = scopedMgr 
        
           	} 
        
           	err := walletdb.View(w.db, func(tx walletdb.ReadTx) error { 
        
           		txMgrNS := tx.ReadBucket(wtxmgrNamespaceKey) 
        
           		credits, err := w.TxStore.UnspentOutputs(txMgrNS) 
        
           		if err != nil { 
        
           			return err 
        
           		} 
        
           		addrMgrNS := tx.ReadBucket(waddrmgrNamespaceKey) 
        
           		return recoveryMgr.Resurrect(addrMgrNS, scopedMgrs, credits) 
        
           	}) 
        
           	if err != nil { 
        
           		return err 
        
           	} 
        
           	// Fetch the best height from the backend to determine when we should 
        
           	// stop. 
        
           	_, bestHeight, err := chainClient.GetBestBlock() 
        
           	if err != nil { 
        
           		return err 
        
           	} 
        
           	// Now we can begin scanning the chain from the wallet's current tip to 
        
           	// ensure we properly handle restarts. Since the recovery process itself 
        
           	// acts as rescan, we'll also update our wallet's synced state along the 
        
           	// way to reflect the blocks we process and prevent rescanning them 
        
           	// later on. 
        
           	// 
        
           	// NOTE: We purposefully don't update our best height since we assume 
        
           	// that a wallet rescan will be performed from the wallet's tip, which 
        
           	// will be of bestHeight after completing the recovery process. 
        
           	var blocks []*waddrmgr.BlockStamp 
        
           	startHeight := w.Manager.SyncedTo().Height + 1 
        
           	for height := startHeight; height <= bestHeight; height++ { 
        
           		if atomic.LoadUint32(&syncer.quit) == 1 { 
        
           			return errors.New("recovery: forced shutdown") 
        
           		} 
        
           		hash, err := chainClient.GetBlockHash(int64(height)) 
        
           		if err != nil { 
        
           			return err 
        
           		} 
        
           		header, err := chainClient.GetBlockHeader(hash) 
        
           		if err != nil { 
        
           			return err 
        
           		} 
        
           		blocks = append(blocks, &waddrmgr.BlockStamp{ 
        
           			Hash:      *hash, 
        
           			Height:    height, 
        
           			Timestamp: header.Timestamp, 
        
           		}) 
        
           		// It's possible for us to run into blocks before our birthday 
        
           		// if our birthday is after our reorg safe height, so we'll make 
        
           		// sure to not add those to the batch. 
        
           		if height >= birthdayBlock.Height { 
        
           			recoveryMgr.AddToBlockBatch( 
        
           				hash, height, header.Timestamp, 
        
           			) 
        
           		} 
        
           		// We'll perform our recovery in batches of 2000 blocks.  It's 
        
           		// possible for us to reach our best height without exceeding 
        
           		// the recovery batch size, so we can proceed to commit our 
        
           		// state to disk. 
        
           		recoveryBatch := recoveryMgr.BlockBatch() 
        
           		if len(recoveryBatch) == recoveryBatchSize || height == bestHeight { 
        
           			err := walletdb.Update(w.db, func(tx walletdb.ReadWriteTx) error { 
        
           				ns := tx.ReadWriteBucket(waddrmgrNamespaceKey) 
        
           				for _, block := range blocks { 
        
           					err := w.Manager.SetSyncedTo(ns, block) 
        
           					if err != nil { 
        
           						return err 
        
           					} 
        
           				} 
        
           				return w.recoverScopedAddresses( 
        
           					chainClient, tx, ns, recoveryBatch, 
        
           					recoveryMgr.State(), scopedMgrs, 
        
           				) 
        
           			}) 
        
           			if err != nil { 
        
           				return err 
        
           			} 
        
           			if len(recoveryBatch) > 0 { 
        
           				log.Infof("Recovered addresses from blocks "+ 
        
           					"%d-%d", recoveryBatch[0].Height, 
        
           					recoveryBatch[len(recoveryBatch)-1].Height) 
        
           			} 
        
           			// Clear the batch of all processed blocks to reuse the 
        
           			// same memory for future batches. 
        
           			blocks = blocks[:0] 
        
           			recoveryMgr.ResetBlockBatch() 
        
           		} 
        
           	} 
        
           	return nil 
        
           }

The text was updated successfully, but these errors were encountered:

Roasbeef · 2022-11-04T02:02:53Z

Ah, this makes sense, much of the recent work we've done in btcwallet was motivated by lnd. For lnd, the wallet is unlocked once at start up, then never locked again.

What's missing here is that we properly thread through a context.Context (or something like that), so it can be cancelled to ensure we can do things like unsafe shutdown.

I agree that panic doesn't really need to be there either, it should be handled gracefully. So we'd either need to try to retry once the wallet is unlocked again, or move it to something periodic tat attempts to sync again once things are clear.

chappjc · 2022-11-11T22:30:01Z

I wonder if it's possible to remove the panic logic in *Wallet.handleChainNotifications altogether and handle errors more gracefully?

This would be good. There are numerous ways this explicit panic in handleChainNotifications can be triggered:

panic: Unable to synchronize wallet to chain: unable to perform wallet recovery: json: cannot unmarshal object into Go value of type string #729 (getblock json response unmarshalling)
btc/spv: Unable to synchronize wallet panic. decred/dcrdex#1690 (wallet db closed/shutdown)
parameter #2 'verbosity' must be type int (got bool) #687 (bad json payload in request)
requesting a block from a peer failed (next comment) - couldn't retrieve block <hash> from network
locking the wallet, as in this issue

chappjc · 2022-11-14T13:51:15Z

Just got this recovering a mainnet wallet:

panic: unable to synchronize wallet to chain: unable to perform wallet recovery: couldn't retrieve block 00000000000000000004d901bfc6973cad6a39d65f88ac6af77c2c212d1635fd from network

goroutine 4830583 [running]:
github.com/btcsuite/btcwallet/wallet.(*Wallet).handleChainNotifications(0xc0050d6900)
	github.com/btcsuite/[email protected]/wallet/chainntfns.go:117 +0x1065
created by github.com/btcsuite/btcwallet/wallet.(*Wallet).SynchronizeRPC
	github.com/btcsuite/[email protected]/wallet/wallet.go:213 +0x25c

The neutrino log had no direct error at the time, but shortly prior the node to which that getdata block request was made had dropped and reconnected a couple times.

JoeGruffins · 2022-12-08T10:47:51Z

This seems to come up a bit while testing dcrdex. Will attempt to stop the panics as simply as possible if that's alright. Will look through the code some more and come up with a plan.

JoeGruffins mentioned this issue Nov 4, 2022

neutrino sync and shutdown problems decred/dcrdex#1934

Closed

JoeGruffins mentioned this issue Dec 9, 2022

chainntfns: Don't panic. #840

Closed

guggero mentioned this issue May 25, 2023

wallet: don't panic, retry instead #870

Merged

Roasbeef closed this as completed in #870 May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chainntfns: Panics during initial sync blow up consumers. #827

chainntfns: Panics during initial sync blow up consumers. #827

JoeGruffins commented Nov 4, 2022

Roasbeef commented Nov 4, 2022

chappjc commented Nov 11, 2022 •

edited

Loading

chappjc commented Nov 14, 2022 •

edited

Loading

JoeGruffins commented Dec 8, 2022

chainntfns: Panics during initial sync blow up consumers. #827

chainntfns: Panics during initial sync blow up consumers. #827

Comments

JoeGruffins commented Nov 4, 2022

Roasbeef commented Nov 4, 2022

chappjc commented Nov 11, 2022 • edited Loading

chappjc commented Nov 14, 2022 • edited Loading

JoeGruffins commented Dec 8, 2022

chappjc commented Nov 11, 2022 •

edited

Loading

chappjc commented Nov 14, 2022 •

edited

Loading