Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: retry Filecoin.StateMinerInfo requests #96

Merged
merged 3 commits into from
Sep 24, 2024
Merged

Conversation

bajtos
Copy link
Member

@bajtos bajtos commented Sep 23, 2024

  • deps: add retry from Deno stdlib
  • fix: retry Filecoin.StateMinerInfo requests

I noticed that sometimes my Station Desktop will go offline. The module logs contain the following messages:

[2024-09-23T12:43:03Z INFO  module:spark/main] Calling Filecoin JSON-RPC to get PeerId of miner f02085652
{"type":"activity:error","module":"spark/main","message":"SPARK failed reporting retrieval"}
[2024-09-23T12:43:04Z ERROR module:spark/main] Error: Cannot obtain miner info for f02085652: error sending request for url (https://api.node.glif.io/): connection closed before message completed
        at async mainFetch (ext:deno_fetch/26_fetch.js:277:12)
        at async fetch (ext:deno_fetch/26_fetch.js:504:7)
        at async rpc (file:///Users/bajtos/Library/Caches/app.filstation.desktop/sources/spark/lib/miner-info.js:36:15)
        at async Spark.getMinerPeerId (file:///Users/bajtos/Library/Caches/app.filstation.desktop/sources/spark/lib/miner-info.js:9:17)
        at async Spark.executeRetrievalCheck (file:///Users/bajtos/Library/Caches/app.filstation.desktop/sources/spark/lib/spark.js:50:22)
        at async Spark.nextRetrieval (file:///Users/bajtos/Library/Caches/app.filstation.desktop/sources/spark/lib/spark.js:197:5)
        at async Spark.run (file:///Users/bajtos/Library/Caches/app.filstation.desktop/sources/spark/lib/spark.js:208:9)
        at async file:///Users/bajtos/Library/Caches/app.filstation.desktop/sources/spark/main.js:4:1
[2024-09-23T12:43:04Z INFO  module:spark/main] Sleeping for 75 seconds before starting the next task...

In other words, when the RPC API fails, Spark waits more than a minute until it starts another check. I think that next check will pick the next task from the list, meaning that Spark effectively skips a task when this error happens.

In this pull request, I am wrapping the RPC request in a retry logic.

@bajtos bajtos requested a review from juliangruber September 23, 2024 12:54
Copy link
Member

@juliangruber juliangruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@bajtos
Copy link
Member Author

bajtos commented Sep 23, 2024

A test was failing, so I had to tweak this change. PTAL again.

@bajtos bajtos requested a review from juliangruber September 23, 2024 14:53
return res.PeerId
} catch (err) {
if (err.name === 'RetryError' && err.cause) {
// eslint-disable-next-line no-ex-assign
err = err.cause
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep err with err.cause?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the caller of this function does not care that we are retrying the requests; they are interested in the details about why the request failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

@bajtos bajtos merged commit 995e0bd into main Sep 24, 2024
2 checks passed
@bajtos bajtos deleted the retry-miner-info-request branch September 24, 2024 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ done
Development

Successfully merging this pull request may close these issues.

2 participants