Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate performance of proving for padded height $2^{22}$ and 2^{23}$ #346

Open
Sword-Smith opened this issue Mar 2, 2025 · 3 comments
Labels
🔴 prio: high Pretty urgent ⏩ speedup Makes stuff go faster.

Comments

@Sword-Smith
Copy link
Collaborator

Sword-Smith commented Mar 2, 2025

Running both with and without TVM_LDE_TRACE="no_cache", I've seen some quite troubling performance numbers for proving when the padded height is $2^{23}$. We should get performance numbers and RAM consumption for these three padded heights and investigate: $2^{21}$ (for reference), $2^{22}$, and $2^{23}$.

Specifically, a user with 2TB RAM and 256 cores reports that proving for a padded height of $2^{23}$ takes two hours where it, by my calculations, should take less than 600 seconds. And I ran out of RAM when attempting to create these proofs, even with TVM_LDE_TRACE="no_cache" and having 768GB RAM, which should be plenty for this configuration.

@Sword-Smith Sword-Smith added ⏩ speedup Makes stuff go faster. 🔴 prio: high Pretty urgent 🤔 question More information is needed and removed 🤔 question More information is needed labels Mar 2, 2025
@jan-ferdinand
Copy link
Member

Do you have insight into the various tables' heights, the program being run, etc.?

@Sword-Smith
Copy link
Collaborator Author

Sword-Smith commented Mar 3, 2025

The behavior was observed when running a SingleProof program with many inputs, which translates to the verification of many correct executions of RemovalRecordIntegrity.

This piece of code exhibits the problem, for a padded height of $2^{23}$:
Neptune-Crypto/neptune-core@ec9e693#diff-b03414229c3f4a14c7601dfee113a82c91419c0d1f5fe4181835d6609650f2edR1587-R1600

Edit: The above-marked function call in the test crashes on our hardware, though: Even with 768GB RAM and TVM_LDE_TRACE="no_cache". A way forward would be to run it with live time reporting activated to see where the problems occur.

@jan-ferdinand
Copy link
Member

Can you extract the VM's initial state or the (program, input, non-determinism)-triple from this function? That would allow analysis in Triton CLI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔴 prio: high Pretty urgent ⏩ speedup Makes stuff go faster.
Projects
None yet
Development

No branches or pull requests

2 participants