Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.60.6 sync from scratch after restart and very slow sync rate #12055

Open
arashsari opened this issue Sep 21, 2024 · 3 comments
Open

v2.60.6 sync from scratch after restart and very slow sync rate #12055

arashsari opened this issue Sep 21, 2024 · 3 comments

Comments

@arashsari
Copy link

System information

Erigon version: 2.60.6-d24e5d45

OS & Version: Ubuntu 20.04.6 LTS

Erigon Command (with flags/config):
/home/user/erigon/build/bin/erigon --chain=mainnet --snap.stop --datadir="/home/user/erigon-data" --private.api.addr=0.0.0.0:9090 --http.addr="0.0.0.0" --http.port=8545 --http.vhosts="" --http.corsdomain="" --db.size.limit=8TB --http.api="eth,debug,net,trace,web3,erigon" --ws --authrpc.jwtsecret="/home/user/erigon-data/jwt.hex"

Consensus Layer:

Consensus Layer Command (with flags/config):

Chain/Network: mainnet

Expected behaviour:

After 4 weeks, the sync progressed to block 19 million in Stage 4 (Execution). However, three main issues occurred:

  • Issue 1 - Slow Syncing: Initially, the sync was progressing at around 1 million blocks/day during the Execution stage. Eventually, it slowed down to 300k blocks/day and kept getting slower.

  • Issue 2 - Restart Issue: After a restart, the sync started again from scratch, and after 3 days it had only reached block 4,828,671, as shown below: [4/12 Execution] Executed blocks number=4828671

    The df -h command showed a disk usage drop from 3.8 TB to 1.9 TB after the restart.

  • Issue 3 - Latest Block Issue: The latest block continues to show as zero despite sync progress.

curl -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' http://localhost:8545
{"jsonrpc":"2.0","id":1,"result":"0x0"}

Additionally, running the health check:
curl --location --request GET 'http://localhost:8545/health' --header 'X-ERIGON-HEALTHCHECK: min_peer_count1' --header 'X-ERIGON-HEALTHCHECK: synced' --header 'X-ERIGON-HEALTHCHECK: max_seconds_behind600'

{"check_block":"DISABLED","max_seconds_behind":"ERROR: timestamp too old: got ts: 0, need: 1726888120","min_peer_count":"HEALTHY","synced":"ERROR: not synced"}

Another health check:
curl -X POST http://localhost:8545/health --data '{"min_peer_count": 3, "known_block": "0x1F"}'
{"check_block":"ERROR: no known block with number 31 (1f hex)","healthcheck_query":"HEALTHY","min_peer_count":"HEALTHY"}

Actual behaviour

  • Issue 1: Given the below specifications and our previous experience, I expected a faster sync. In the older version (v2.58.0), sync performance was significantly faster on our previous server with the same configuration.
    free -h
    total used free shared buff/cache available
    Mem: 503Gi 18Gi 6.4Gi 0.0Ki 478Gi 480Gi
    Swap: 57Gi 28Mi 57Gi

Screenshot 2024-09-21 at 1 22 06 PM

  • Issue 2: In v2.58.0, we did not experience the restart issue.

  • Issue 3: Even though we had synced up to block 19 million in Stage 4, the latest block still shows as zero.

Questions:

  • Could you recommend any tested versions that may avoid these issues?
  • Do you have any suggestions to address these problems, particularly the restart issue?
  • Does the absence of a latest block indicate that the sync process is in trouble?

Steps to reproduce the behaviour

User version 2.60.6-d24e5d45

Backtrace

[backtrace]
@AskAlexSharov
Copy link
Collaborator

  • "after restart" - restart of erigon or restart of server? If after server restart - maybe it's about loosing of PageCache (and exec is 1-threaded - no much warmup/readahead is there). You have 500gb ram - try: ./build/bin/integration warmup --datadir=/your --bucket=PlainState

  • what disk do you use?

  • show make db-tools ./build/bin/mdbx_stat -ef /erigon-data/chaindata/

@arashsari
Copy link
Author

Thanks for your response. It wasn't Server Restart. We run Erigon on Screen, and it was service restart. Should i ran warmup? If yes, should i stop Erigon before running warmup command?

Its is SSD disk with 7TB size in one partition, as shown below.
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 7.0T 2.0T 4.7T 30% /home

i couldn't run mdbx_stat command : make db-tools ./build/bin/mdbx_stat -ef /home/user/erigon-data/chaindata/
It returns : make: *** /home/user/erigon-data/chaindata/: Is a directory. Stop.
No showing any DB stats.

@arashsari
Copy link
Author

@AskAlexSharov any other suggestion?
Specially about issue 3 that the latest block continues to show as zero despite sync progress. Commands and results are shared earlier. Now it is sync 12 Mil of block in Executed blocks stage. Should not return block by now?
[4/12 Execution] Executed blocks number=12244915
The sync rate is same as before restart the service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants