v2.60.6 sync from scratch after restart and very slow sync rate #12055

arashsari · 2024-09-21T03:40:56Z

System information

Erigon version: 2.60.6-d24e5d45

OS & Version: Ubuntu 20.04.6 LTS

Erigon Command (with flags/config):
/home/user/erigon/build/bin/erigon --chain=mainnet --snap.stop --datadir="/home/user/erigon-data" --private.api.addr=0.0.0.0:9090 --http.addr="0.0.0.0" --http.port=8545 --http.vhosts="" --http.corsdomain="" --db.size.limit=8TB --http.api="eth,debug,net,trace,web3,erigon" --ws --authrpc.jwtsecret="/home/user/erigon-data/jwt.hex"

Consensus Layer:

Consensus Layer Command (with flags/config):

Chain/Network: mainnet

Expected behaviour:

After 4 weeks, the sync progressed to block 19 million in Stage 4 (Execution). However, three main issues occurred:

Issue 1 - Slow Syncing: Initially, the sync was progressing at around 1 million blocks/day during the Execution stage. Eventually, it slowed down to 300k blocks/day and kept getting slower.
Issue 2 - Restart Issue: After a restart, the sync started again from scratch, and after 3 days it had only reached block 4,828,671, as shown below: [4/12 Execution] Executed blocks number=4828671

The df -h command showed a disk usage drop from 3.8 TB to 1.9 TB after the restart.
Issue 3 - Latest Block Issue: The latest block continues to show as zero despite sync progress.

curl -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' http://localhost:8545
{"jsonrpc":"2.0","id":1,"result":"0x0"}

Additionally, running the health check:
curl --location --request GET 'http://localhost:8545/health' --header 'X-ERIGON-HEALTHCHECK: min_peer_count1' --header 'X-ERIGON-HEALTHCHECK: synced' --header 'X-ERIGON-HEALTHCHECK: max_seconds_behind600'

{"check_block":"DISABLED","max_seconds_behind":"ERROR: timestamp too old: got ts: 0, need: 1726888120","min_peer_count":"HEALTHY","synced":"ERROR: not synced"}

Another health check:
curl -X POST http://localhost:8545/health --data '{"min_peer_count": 3, "known_block": "0x1F"}'
{"check_block":"ERROR: no known block with number 31 (1f hex)","healthcheck_query":"HEALTHY","min_peer_count":"HEALTHY"}

Actual behaviour

Issue 1: Given the below specifications and our previous experience, I expected a faster sync. In the older version (v2.58.0), sync performance was significantly faster on our previous server with the same configuration.
free -h
total used free shared buff/cache available
Mem: 503Gi 18Gi 6.4Gi 0.0Ki 478Gi 480Gi
Swap: 57Gi 28Mi 57Gi

Issue 2: In v2.58.0, we did not experience the restart issue.
Issue 3: Even though we had synced up to block 19 million in Stage 4, the latest block still shows as zero.

Questions:

Could you recommend any tested versions that may avoid these issues?
Do you have any suggestions to address these problems, particularly the restart issue?
Does the absence of a latest block indicate that the sync process is in trouble?

Steps to reproduce the behaviour

User version 2.60.6-d24e5d45

Backtrace

[backtrace]

The text was updated successfully, but these errors were encountered:

AskAlexSharov · 2024-09-21T05:48:53Z

"after restart" - restart of erigon or restart of server? If after server restart - maybe it's about loosing of PageCache (and exec is 1-threaded - no much warmup/readahead is there). You have 500gb ram - try: ./build/bin/integration warmup --datadir=/your --bucket=PlainState
what disk do you use?
show make db-tools ./build/bin/mdbx_stat -ef /erigon-data/chaindata/

arashsari · 2024-09-21T07:13:36Z

Thanks for your response. It wasn't Server Restart. We run Erigon on Screen, and it was service restart. Should i ran warmup? If yes, should i stop Erigon before running warmup command?

Its is SSD disk with 7TB size in one partition, as shown below.
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 7.0T 2.0T 4.7T 30% /home

i couldn't run mdbx_stat command : make db-tools ./build/bin/mdbx_stat -ef /home/user/erigon-data/chaindata/
It returns : make: *** /home/user/erigon-data/chaindata/: Is a directory. Stop.
No showing any DB stats.

arashsari · 2024-09-23T12:39:46Z

@AskAlexSharov any other suggestion?
Specially about issue 3 that the latest block continues to show as zero despite sync progress. Commands and results are shared earlier. Now it is sync 12 Mil of block in Executed blocks stage. Should not return block by now?
[4/12 Execution] Executed blocks number=12244915
The sync rate is same as before restart the service.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.60.6 sync from scratch after restart and very slow sync rate #12055

v2.60.6 sync from scratch after restart and very slow sync rate #12055

arashsari commented Sep 21, 2024

AskAlexSharov commented Sep 21, 2024

arashsari commented Sep 21, 2024

arashsari commented Sep 23, 2024

v2.60.6 sync from scratch after restart and very slow sync rate #12055

v2.60.6 sync from scratch after restart and very slow sync rate #12055

Comments

arashsari commented Sep 21, 2024

System information

Expected behaviour:

Actual behaviour

Steps to reproduce the behaviour

Backtrace

AskAlexSharov commented Sep 21, 2024

arashsari commented Sep 21, 2024

arashsari commented Sep 23, 2024