-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance of Filecoin.StateGetBeaconEntry
for old beacons
#12527
Comments
OK, wow, that's worth a look, but I suppose it's to do with tipset lookup |
I can't replicate this unfortunately @LesnyRumcajs, first run or otherwise:
In fact, I can run it with any random epoch and it's all pretty quick:
I had to upgrade to rc1 to get this going, so it's fairly soon after running. And I'm running splitstore on this so only have a ~170G blockstore. But it looks like you're running calibnet so I imagine it's not huge, or is it a non-splitstore calibnet node perhaps? Or is this a particularly slow machine? What might be the difference here? |
ahh, restart my node and call it as soon as it's live:
I must have waited too long after restart the first time and something had the chance to prime the tipset cache. |
Yeah, it's the issue after startup. For context, the setup in Forest's parity tests is:
Given that, Lotus started to timeout (over 120s on a regular GHA runner) for this method since |
The conclusion from a brief investigation and internal chat about this is that it comes down to hydrating the tipset cache, which is only done on demand. So if your call to fetch tipset A possible solution might be to lump this into the ChainIndexer work, where we're rolling up the 3 (optional) sqlite indexes of chain data into a single entity, which can be extended to index other parts of the chain, or state. Having to re-parse the entire chain each restart isn't ideal, but having a trusted index that could be used immediately and checked occasionally, might be a good approach here. For now, could you cheat a bit on this one? If you're doing a whole bunch of compatibility checks, could you send a request for the epoch 1 beacon at the beginning of your checks and let it run in the background (ignoring the result but letting it process) while you do the rest of the checks and then do the beacon checks right at the end? So you get lotus starting on that tipset walk process in parallel to the other tests you're running? |
I mean, it's fine, our current workaround is to just check a more recent epoch. Could you please help me understand why the current solution takes so much time, whereas the previous one (v1.28.0) didn't? I'm curious, because a similar change might need to be implemented in Forest as well (or not), which does this, currently, relatively fast (code for reference) |
I think you're doing a call out to drand here, right? https://github.com/ChainSafe/forest/blob/5fd1ec506ee7db4d83fa8aca14338429bf11938c/src/rpc/methods/state.rs#L1755 Previously Lotus did this, but the client configs were removed for mainnet, so we only have client configs for quicknet. There's a couple of issues with this: It doesn't scale, because it requires drand to be maintaining those networks in perpetuity, which isn't the case. Prior to Filecoin mainnet launch (proper), between epochs 0 and The other problem is that we have a mismatch on some epochs between the drand epoch and the filecoin epoch. It only got super strict by NV15, and during NV14 it was off-by-one. So we've had to decide what So, if you want to match Lotus precisely on this API you're going to have to fish around in tipsets to find beacon entries using that same algorithm for historic look-ups. Then if you want to be even more comprehensive in testing compatibility you could look at null and non-null rounds within each of the main periods of beacon history: pre-nv13, nv13, nv14, post-nv14 and then post-quicknet which was 120 epochs after nv23. Probably unnecessary, however, given FIP-0095, it's now possible for smart contracts to reach back in history for these beacons, so it'll end up being a consensus issue if Forest is providing different entries to Lotus and someone writes a contract that goes back into these periods. (Also depends on how you're exposing the beacon lookups to that syscall, maybe that's different to how you're serving |
I'd be interested to know if you come up with an efficient and dependable way to establish the epoch->tipset mapping, both on startup and continuously. |
@rvagg It seems like you have addressed the reason why the performance of Filecoin.StateGetBeaconEntry might be slow. Do we still want this ticket to be held open, or should we open another ticket for exploring the epoch->tipset mapping and link to this for context? |
This ticket is going to be the best to track the resolution for the concern here: #12568 |
Checklist
Latest release
, the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.Lotus component
Lotus Version
Repro Steps
First run takes well over a minute (91s on my machine)
Describe the Bug
It is a regression from old performance (pre #12414 and pre change that lead to creation of this issue), where it took significantly less time to get that beacon information (the results are identical).
Logging Information
Lotus
v1.29.2-rc1
Lotus
v1.28.0
in comparison:The text was updated successfully, but these errors were encountered: