fix(block): da unavailability fixes #1215

mtsitrin · 2024-11-11T09:40:24Z

if ApplyBatchFromSL fails, don't go unhealthy (might be caused due to DA unavailability)
avoid being stuck in an infinite loop in case of missing DA data

srene

not sure why removing the error log when applying from local

omritoptix · 2024-11-11T10:57:07Z

block/sync.go

@@ -75,7 +70,14 @@ func (m *Manager) SettlementSyncLoop(ctx context.Context) error {

 				err = m.ApplyBatchFromSL(settlementBatch.Batch)
 				if err != nil {
-					return fmt.Errorf("process next DA batch. err:%w", err)
+					m.logger.Error("Apply batch from SL", "err", err)


the problem I see with logging and breaking here is that as a node operator you'll never know if you have an issue.

imagine u use a wrong celestia rpc or the rpc is down and you need to change it.
in that case you simply break and the node operator has no idea that he needs to change it. that's why we prefer to emit health issues.

the da rpc retry loop should be long enough to indicate there is an rpc problem and now the operator should do something about it (the node operator doesn't look at the logs but have alerts to health status)

ok makes sense
I'll revert

i think that what we need to do here is only log and break in case the error is a da error (ErrRetrieval or ErrBlobNotfound), in the other cases return error as usual. This way if da fails the node will not stop and retry in the next state update but fail in case applyblock actually fails and is not da related.

omritoptix · 2024-11-11T11:03:23Z

block/sync.go

+
+				// if height havent been updated, we are stuck
+				// this covers the scenario where no applicable blocks were found in the DA
+				if m.State.NextHeight() == currH {


can u elaborate how this can happen? afaiu we got into the loop because we have new settlement height, which means the blocks are in the da. assuming no fraud (as it's handled by blocks unavaliable in the DA), when will this case happen?

which means the blocks are in the da

how u know that? if the data is not there, u'll be stuck

allowing for DA unavalability. checking for DA data missing

dc61392

mtsitrin requested a review from a team as a code owner November 11, 2024 09:40

github-actions bot added the dym-internal label Nov 11, 2024

mtsitrin changed the title ~~fix(block): allowing for DA unavalability. checking for DA data missing~~ fix(block): da unavailability fixes Nov 11, 2024

srene previously approved these changes Nov 11, 2024

View reviewed changes

omritoptix reviewed Nov 11, 2024

View reviewed changes

return error on DA sync

9f690dd

mtsitrin dismissed srene’s stale review via 9f690dd November 11, 2024 12:04

pretify

eaf2549

mtsitrin requested review from omritoptix and srene November 11, 2024 12:20

mtsitrin closed this Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(block): da unavailability fixes #1215

fix(block): da unavailability fixes #1215

mtsitrin commented Nov 11, 2024 •

edited

Loading

srene left a comment

omritoptix Nov 11, 2024 •

edited

Loading

mtsitrin Nov 11, 2024

srene Nov 11, 2024

omritoptix Nov 11, 2024

mtsitrin Nov 11, 2024

fix(block): da unavailability fixes #1215

fix(block): da unavailability fixes #1215

Conversation

mtsitrin commented Nov 11, 2024 • edited Loading

srene left a comment

Choose a reason for hiding this comment

omritoptix Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

mtsitrin Nov 11, 2024

Choose a reason for hiding this comment

srene Nov 11, 2024

Choose a reason for hiding this comment

omritoptix Nov 11, 2024

Choose a reason for hiding this comment

mtsitrin Nov 11, 2024

Choose a reason for hiding this comment

mtsitrin commented Nov 11, 2024 •

edited

Loading

omritoptix Nov 11, 2024 •

edited

Loading