-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Move away from Regexp parsing of denom validity / decimal amounts #17221
Comments
Thanks @ValarDragon!
This is sorta moot, no? Since it's a single fixed cost.
Yes, totally agree regex is not ideal for matching in terms of perf in hot paths of the code, which coins are! When you say dead simple loop, wdym exactly? |
It increases the cost of every CLI open, or opening time of any library importing the SDK. Its moot for state machine performance, not other contexts. (I think this is still a minor cost in the grand scheme of things FWIW)
The simplest looks something like this:
But on looking at it, it probably is better to just code generate something faster with ragel |
Yeah, perhaps not moot but certainly not the overwhelming culprit. LGTM! |
Were seeing in osmosis block sync, that validate denom is taking about 1% of time within block sync. (And this is after we've spent hours removing Denom validation from hot loops) |
Can you provide a profile? I'm curious as to why it's so costly? Once the regex is compiled (single time fixed cost), matching should be efficient, no? |
Its mentioned a bit in the linked blog post:
In general you want to avoid regex's in hot loops on really well structured data / for simple things. Heres a link to the profile for validate denom: This is in a 1000 block sync, with a not-that-high amount of swaps. In the state machine side of block processing, we took ~700 seconds. (We've reduced I/O demands a decent amount, which is why I was measuring this as out of 600 seconds when saying 1% before) Each of our swaps does around 10 balance updates, in addition to using coins internally in operations. (Theres also protorev increasing the number of swap attempts, but not simulating the balance movement, so those are just Coin calls within swaps) We also have things like |
I see, this makes a ton of sense given we know what the structure (denom) of the data looks like. Let's move to a more performant for loop. |
Summary
Right now we use regular expressions for parsing coin denoms & decimal amounts. This has a couple drawbacks:
Would be nice to replace this with more direct methods that are functionally equivalent. This blog post describes a tool "ragel" that could even code generate it: https://dgryski.medium.com/speeding-up-regexp-matching-with-ragel-4727f1c16027 , though honestly a dead simple for loop would probably perform better and be faster to write. (As we do very state-minimal matching)
Problem Definition
Makes coin denom matching faster, and pushes more work to compile time rather than runtime initializations.
Proposal
Change the regexp's usage in https://github.com/cosmos/cosmos-sdk/blob/main/types/coin.go#L833-L837 to direct matching methods.
The text was updated successfully, but these errors were encountered: