Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Move away from Regexp parsing of denom validity / decimal amounts #17221

Closed
ValarDragon opened this issue Jul 31, 2023 · 7 comments · Fixed by #19511
Closed

[Feature]: Move away from Regexp parsing of denom validity / decimal amounts #17221

ValarDragon opened this issue Jul 31, 2023 · 7 comments · Fixed by #19511
Labels
T:feature-request T: Performance Performance improvements

Comments

@ValarDragon
Copy link
Contributor

Summary

Right now we use regular expressions for parsing coin denoms & decimal amounts. This has a couple drawbacks:

  • More expensive runtime initialization (compiling the regex at init)
  • Generally slower than other direct methods

Would be nice to replace this with more direct methods that are functionally equivalent. This blog post describes a tool "ragel" that could even code generate it: https://dgryski.medium.com/speeding-up-regexp-matching-with-ragel-4727f1c16027 , though honestly a dead simple for loop would probably perform better and be faster to write. (As we do very state-minimal matching)

Problem Definition

Makes coin denom matching faster, and pushes more work to compile time rather than runtime initializations.

Proposal

Change the regexp's usage in https://github.com/cosmos/cosmos-sdk/blob/main/types/coin.go#L833-L837 to direct matching methods.

@alexanderbez
Copy link
Contributor

Thanks @ValarDragon!

More expensive runtime initialization (compiling the regex at init)

This is sorta moot, no? Since it's a single fixed cost.

Generally slower than other direct methods

Yes, totally agree regex is not ideal for matching in terms of perf in hot paths of the code, which coins are! When you say dead simple loop, wdym exactly?

@ValarDragon
Copy link
Contributor Author

This is sorta moot, no? Since it's a single fixed cost.

It increases the cost of every CLI open, or opening time of any library importing the SDK. Its moot for state machine performance, not other contexts. (I think this is still a minor cost in the grand scheme of things FWIW)

Yes, totally agree regex is not ideal for matching in terms of perf in hot paths of the code, which coins are! When you say dead simple loop, wdym exactly?

The simplest looks something like this:


func isValidRune(r rune) bool {  
	return unicode.IsLetter(r) || unicode.IsDigit(r) || r == '/' || r == ':' || r == '.' || r == '_' || r == '-'  
}  
  
func isValidDenomination(s string) bool {  
	length := len(s)  
	if length < 3 || length > 128 {  
		return false  
	}  
  
	firstRune := rune(s[0])  
	if !unicode.IsLetter(firstRune) {  
		return false  
	}  
  
	for _, r := range s[1:] {  
		if !isValidRune(r) {  
			return false  
		}  
	}  
  
	return true  
}

But on looking at it, it probably is better to just code generate something faster with ragel

@alexanderbez
Copy link
Contributor

It increases the cost of every CLI open, or opening time of any library importing the SDK. Its moot for state machine performance, not other contexts. (I think this is still a minor cost in the grand scheme of things FWIW)

Yeah, perhaps not moot but certainly not the overwhelming culprit.

LGTM!

@github-project-automation github-project-automation bot moved this to 👀 To Do in Cosmos-SDK Nov 17, 2023
@tac0turtle tac0turtle added the T: Performance Performance improvements label Nov 17, 2023
@ValarDragon
Copy link
Contributor Author

Were seeing in osmosis block sync, that validate denom is taking about 1% of time within block sync. (And this is after we've spent hours removing Denom validation from hot loops)

@alexanderbez
Copy link
Contributor

Can you provide a profile? I'm curious as to why it's so costly? Once the regex is compiled (single time fixed cost), matching should be efficient, no?

@ValarDragon
Copy link
Contributor Author

ValarDragon commented Feb 17, 2024

Its mentioned a bit in the linked blog post:

How can you speed up regular expression matching in Go? As Rob Pike points out, the benefit of regular expressions is their dynamic nature. If you know what you’re going to be matching, you can do better. Sometimes doing better just means a for loop with an if check. ( Rob’s comments on regular expressions goes into more detail: https://commandcenter.blogspot.ca/2011/08/regular-expressions-in-lexing-and.html )

In general you want to avoid regex's in hot loops on really well structured data / for simple things.

Heres a link to the profile for validate denom:
image

This is in a 1000 block sync, with a not-that-high amount of swaps. In the state machine side of block processing, we took ~700 seconds. (We've reduced I/O demands a decent amount, which is why I was measuring this as out of 600 seconds when saying 1% before)

Each of our swaps does around 10 balance updates, in addition to using coins internally in operations. (Theres also protorev increasing the number of swap attempts, but not simulating the balance movement, so those are just Coin calls within swaps)

We also have things like SubUnlockedCoins should never be calling regex's on the denoms, but is actually calling it 7 times.
image

@alexanderbez
Copy link
Contributor

I see, this makes a ton of sense given we know what the structure (denom) of the data looks like. Let's move to a more performant for loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T:feature-request T: Performance Performance improvements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants