Leveldb exception handle #3356

vncoelho · 2024-06-24T13:03:22Z

Describe the bug
Run a setup with 4 nodes running private net

To Reproduce
Steps to reproduce the behavior:
Start nodes and they will crash almost instantaneously

Error

dotnet: ./db/dbformat.cc:16: uint64_t leveldb::PackSequenceAndType(uint64_t, leveldb::ValueType): Assertion `seq <= kMaxSequenceNumber' failed.

cschuchardt88 · 2024-06-24T18:19:28Z

Need more information.

vncoelho · 2024-06-24T18:26:22Z

Need more information.

description updated

shargon · 2024-06-25T08:35:22Z

Seems that the data is corrupted, it's a fresh installation?

vncoelho · 2024-06-25T10:59:25Z

fresh with master

vncoelho · 2024-06-25T11:00:02Z

Seems that the data is corrupted, it's a fresh installation?

probably due to the unhanded exception management feature, but still did not investigate further.
It is easy to reproduce. Just run a node.

Hecate2 · 2024-06-26T03:56:05Z

Is it because the 4 nodes are using the same directory for leveldb?

cschuchardt88 · 2024-06-26T05:27:26Z

Based off the source code from you error, It look like this Your database is corrupt. try deleting it to see if the problem goes away.

Has to do with Seeking with KeyComparator source code says

// User key has become shorter physically, but larger logically.
// Tack on the earliest possible number to the shortened user key.

vncoelho · 2024-06-28T16:05:44Z

Based off the source code from you error, It look like this Your database is corrupt. try deleting it to see if the problem goes away.

Has to do with Seeking with KeyComparator source code says
// User key has become shorter physically, but larger logically.
// Tack on the earliest possible number to the shortened user key.

No @cschuchardt88 , it is a recent introduced problem.

Jim8y · 2024-06-28T16:17:14Z

its because you run too many nodes in the same machine that all use leveldb. Not a core problem. This happens every time when you run multiple nodes in the same machine.

vncoelho · 2024-06-28T17:15:16Z

its because you run too many nodes in the same machine that all use leveldb. Not a core problem. This happens every time when you run multiple nodes in the same machine.

No. This is not true in my Setup.

vncoelho · 2024-06-28T17:22:41Z

Too much complaints and not a real investigation in a simple scenario.
The cause is that we now crash the clients with unhandled exception.

Without minimum tests the neo-cli will be unused until we implement the exception handle and find the BASIC problems.

vncoelho · 2024-06-28T17:30:23Z

#3366 (comment)

Jim8y · 2024-06-29T01:19:21Z

Too much complaints and not a real investigation in a simple scenario.

You can say this when you locate the real problem.

We have being working like this for many years, and all of a sudden its all wrong, we all become complainers? And our work are lack of investigation products? But we definitely have tested it, checked it everywhere, and for this one, i have run the node~~~~ And i have asked help from NGD to test it as well.

But code were there, pr were there, you were able to test, to review, to comment. We have followed your suggestion to leave it for a while to review. Actually that pr was there for a week before i collected sufficient review approvals.

Before we release any new version, we still can correct any problem, so chill. A team means even some one made some problem, some one else can correct it, isn't it?

The cause is that we now crash the clients with unhandled exception.

Funny part is we should have crashed with unhandled exception, unless we have set plugins to ignore unhandled exception. I would say that pr have found an issue, if any, instead of introduced an issue.

BTW, i admit that even if i run the test on my machine, i at most run a single node,,,,, i dont have a 4 nodes private net test environment. I will create one.

AnnaShaleva · 2024-07-01T18:23:14Z

its because you run too many nodes in the same machine that all use leveldb

It was not a problem for me either, I used NeoBench to run 4-nodes and 7-nodes privnet with Dockerized C# nodes on my single machine, and it was OK.

i dont have a 4 nodes private net test environment.

I'd suggest you to use NeoBench, but it's not yet updated to use fresh monorepo, we have nspcc-dev/neo-bench#175 for that.

vncoelho · 2024-07-01T22:02:24Z

its because you run too many nodes in the same machine that all use leveldb

It was not a problem for me either, I used NeoBench to run 4-nodes and 7-nodes privnet with Dockerized C# nodes on my single machine, and it was OK.

i dont have a 4 nodes private net test environment.

I'd suggest you to use NeoBench, but it's not yet updated to use fresh monorepo, we have nspcc-dev/neo-bench#175 for that.

Are you using leveldb? Maybe it was rocksdb instead.

Were your experiments with master branch?
Mine just run now reverting the exception handle crash.

cschuchardt88 · 2024-07-02T04:44:39Z

@vncoelho
Are you sure you didn't run out storage (disk space)? Why don't give #3355 a try?

cschuchardt88 · 2024-07-02T18:05:25Z

Try doing ./neo-cli /repair or neo-cli.exe /repair

vncoelho · 2024-07-02T19:32:37Z

Try doing ./neo-cli /repair or neo-cli.exe /repair

This is not the case, @cschuchardt88 .

The testing environment is the same for testing with and without the PR being reverted.
The problem is that leveldb probably regenerates from the crash, but the PR that handles exception detects it and then crash the client.

The behavior may not the wrong. But before merging that PR this should had been tested because the problem is simple to be seen.
Can you verify that @superboyiii ?

cschuchardt88 · 2024-07-02T20:15:47Z

Try with this version of LevelDbStore #3274

Jim8y · 2024-07-04T00:43:52Z

its because you run too many nodes in the same machine that all use leveldb

It was not a problem for me either, I used NeoBench to run 4-nodes and 7-nodes privnet with Dockerized C# nodes on my single machine, and it was OK.

i dont have a 4 nodes private net test environment.

i would love to argue, but i am not an expert of leveldb, all i can

say is now it happened, and apparently a leveldb exception, not related to the core.

possible reasons could be: platform, os, version, dependencies. i would suggest to try rockdb and memorydb as well.

vncoelho · 2024-07-11T00:08:56Z

its because you run too many nodes in the same machine that all use leveldb

It was not a problem for me either, I used NeoBench to run 4-nodes and 7-nodes privnet with Dockerized C# nodes on my single machine, and it was OK.

i dont have a 4 nodes private net test environment.

i would love to argue, but i am not an expert of leveldb, all i can

say is now it happened, and apparently a leveldb exception, not related to the core.

possible reasons could be: platform, os, version, dependencies. i would suggest to try rockdb and memorydb as well.

So, this error without the Exception Handle was good and safe to run a node?
Now, after the PR the node is broken, right?Is it not a core problem?

cschuchardt88 · 2024-07-11T00:18:33Z

It's a corruption problem.

We need more information on your setup :

are you using a container?
what version of leveldb you have?
what CI build you using?
What filesystem?
What Operating System?
What CPU arch?
Have you tried leveldb `repair?
How many threads does you OS limit?
Have you ran filesystem repair tool?
Does this happen on other setups?
What's your node setup?

vncoelho · 2024-07-11T00:55:14Z

1. are you using a `container`?

Yes

2. what `version` of `leveldb` you have?

Master compiled plugin and libleveldb-dev from apt get mcr.microsoft.com/dotnet/aspnet:8.0.3-jammy

it is all dockerfile in a container with the amount of threads that is necessary for it to run safe.
It usually run a node on mainnet with the resources it have available.
It is running perfect without the commit I said that should be reverted until fixed.

The problem could be due to some limitation on leveldb safe off course. But that should be handled before the PR was merged.
Furthermore, In my last tests rocksdb was also broken.

Only way to run a node nowdays is memorystore.

vncoelho · 2024-07-15T17:14:28Z

Still crashing. I thought it was solved but my config was with "MemoryStore" instead.

The problem persist even updating all libraries for dotnet during build and run.

RocksDb is also corrupted. But perhaps a difference reason.

Jim8y · 2024-07-15T22:52:28Z

I will setup a multi-nodes on my machine, will check it.

gsmachado · 2024-07-16T22:13:58Z

not entirely related, but see neo-project/neo-express#455

cschuchardt88 · 2024-09-05T18:26:16Z

fixed

vncoelho mentioned this issue Jun 28, 2024

Revert "Plugin unhandled exception (#3349)" #3366

Merged

vncoelho closed this as completed Jul 15, 2024

vncoelho reopened this Jul 15, 2024

Jim8y self-assigned this Jul 15, 2024

vncoelho changed the title ~~Leveldb crash~~ Leveldb exception handle Jul 18, 2024

Jim8y mentioned this issue Jul 18, 2024

Fix plugin exception #3426

Merged

15 tasks

cschuchardt88 closed this as completed Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leveldb exception handle #3356

Leveldb exception handle #3356

vncoelho commented Jun 24, 2024 •

edited

Loading

cschuchardt88 commented Jun 24, 2024

vncoelho commented Jun 24, 2024

shargon commented Jun 25, 2024

vncoelho commented Jun 25, 2024

vncoelho commented Jun 25, 2024

Hecate2 commented Jun 26, 2024

cschuchardt88 commented Jun 26, 2024

vncoelho commented Jun 28, 2024

Jim8y commented Jun 28, 2024 •

edited

Loading

vncoelho commented Jun 28, 2024

vncoelho commented Jun 28, 2024

vncoelho commented Jun 28, 2024

Jim8y commented Jun 29, 2024 •

edited

Loading

AnnaShaleva commented Jul 1, 2024 •

edited

Loading

vncoelho commented Jul 1, 2024

cschuchardt88 commented Jul 2, 2024 •

edited

Loading

cschuchardt88 commented Jul 2, 2024 •

edited

Loading

vncoelho commented Jul 2, 2024

cschuchardt88 commented Jul 2, 2024 •

edited

Loading

Jim8y commented Jul 4, 2024 •

edited

Loading

vncoelho commented Jul 11, 2024

cschuchardt88 commented Jul 11, 2024 •

edited

Loading

vncoelho commented Jul 11, 2024 •

edited

Loading

vncoelho commented Jul 15, 2024

Jim8y commented Jul 15, 2024

gsmachado commented Jul 16, 2024

cschuchardt88 commented Sep 5, 2024

Leveldb exception handle #3356

Leveldb exception handle #3356

Comments

vncoelho commented Jun 24, 2024 • edited Loading

cschuchardt88 commented Jun 24, 2024

vncoelho commented Jun 24, 2024

shargon commented Jun 25, 2024

vncoelho commented Jun 25, 2024

vncoelho commented Jun 25, 2024

Hecate2 commented Jun 26, 2024

cschuchardt88 commented Jun 26, 2024

vncoelho commented Jun 28, 2024

Jim8y commented Jun 28, 2024 • edited Loading

vncoelho commented Jun 28, 2024

vncoelho commented Jun 28, 2024

vncoelho commented Jun 28, 2024

Jim8y commented Jun 29, 2024 • edited Loading

AnnaShaleva commented Jul 1, 2024 • edited Loading

vncoelho commented Jul 1, 2024

cschuchardt88 commented Jul 2, 2024 • edited Loading

cschuchardt88 commented Jul 2, 2024 • edited Loading

vncoelho commented Jul 2, 2024

cschuchardt88 commented Jul 2, 2024 • edited Loading

Jim8y commented Jul 4, 2024 • edited Loading

vncoelho commented Jul 11, 2024

cschuchardt88 commented Jul 11, 2024 • edited Loading

We need more information on your setup :

vncoelho commented Jul 11, 2024 • edited Loading

vncoelho commented Jul 15, 2024

Jim8y commented Jul 15, 2024

gsmachado commented Jul 16, 2024

cschuchardt88 commented Sep 5, 2024

vncoelho commented Jun 24, 2024 •

edited

Loading

Jim8y commented Jun 28, 2024 •

edited

Loading

Jim8y commented Jun 29, 2024 •

edited

Loading

AnnaShaleva commented Jul 1, 2024 •

edited

Loading

cschuchardt88 commented Jul 2, 2024 •

edited

Loading

cschuchardt88 commented Jul 2, 2024 •

edited

Loading

cschuchardt88 commented Jul 2, 2024 •

edited

Loading

Jim8y commented Jul 4, 2024 •

edited

Loading

cschuchardt88 commented Jul 11, 2024 •

edited

Loading

vncoelho commented Jul 11, 2024 •

edited

Loading