[Feature]: Move away from Regexp parsing of denom validity / decimal amounts #17221

ValarDragon · 2023-07-31T13:12:28Z

Summary

Right now we use regular expressions for parsing coin denoms & decimal amounts. This has a couple drawbacks:

More expensive runtime initialization (compiling the regex at init)
Generally slower than other direct methods

Would be nice to replace this with more direct methods that are functionally equivalent. This blog post describes a tool "ragel" that could even code generate it: https://dgryski.medium.com/speeding-up-regexp-matching-with-ragel-4727f1c16027 , though honestly a dead simple for loop would probably perform better and be faster to write. (As we do very state-minimal matching)

Problem Definition

Makes coin denom matching faster, and pushes more work to compile time rather than runtime initializations.

Proposal

Change the regexp's usage in https://github.com/cosmos/cosmos-sdk/blob/main/types/coin.go#L833-L837 to direct matching methods.

alexanderbez · 2023-07-31T15:49:15Z

Thanks @ValarDragon!

More expensive runtime initialization (compiling the regex at init)

This is sorta moot, no? Since it's a single fixed cost.

Generally slower than other direct methods

Yes, totally agree regex is not ideal for matching in terms of perf in hot paths of the code, which coins are! When you say dead simple loop, wdym exactly?

ValarDragon · 2023-07-31T19:17:10Z

This is sorta moot, no? Since it's a single fixed cost.

It increases the cost of every CLI open, or opening time of any library importing the SDK. Its moot for state machine performance, not other contexts. (I think this is still a minor cost in the grand scheme of things FWIW)

Yes, totally agree regex is not ideal for matching in terms of perf in hot paths of the code, which coins are! When you say dead simple loop, wdym exactly?

The simplest looks something like this:


func isValidRune(r rune) bool {  
	return unicode.IsLetter(r) || unicode.IsDigit(r) || r == '/' || r == ':' || r == '.' || r == '_' || r == '-'  
}  
  
func isValidDenomination(s string) bool {  
	length := len(s)  
	if length < 3 || length > 128 {  
		return false  
	}  
  
	firstRune := rune(s[0])  
	if !unicode.IsLetter(firstRune) {  
		return false  
	}  
  
	for _, r := range s[1:] {  
		if !isValidRune(r) {  
			return false  
		}  
	}  
  
	return true  
}

But on looking at it, it probably is better to just code generate something faster with ragel

alexanderbez · 2023-07-31T21:20:34Z

It increases the cost of every CLI open, or opening time of any library importing the SDK. Its moot for state machine performance, not other contexts. (I think this is still a minor cost in the grand scheme of things FWIW)

Yeah, perhaps not moot but certainly not the overwhelming culprit.

LGTM!

ValarDragon · 2024-02-17T14:03:25Z

Were seeing in osmosis block sync, that validate denom is taking about 1% of time within block sync. (And this is after we've spent hours removing Denom validation from hot loops)

alexanderbez · 2024-02-17T17:27:19Z

Can you provide a profile? I'm curious as to why it's so costly? Once the regex is compiled (single time fixed cost), matching should be efficient, no?

ValarDragon · 2024-02-17T17:43:12Z

Its mentioned a bit in the linked blog post:

How can you speed up regular expression matching in Go? As Rob Pike points out, the benefit of regular expressions is their dynamic nature. If you know what you’re going to be matching, you can do better. Sometimes doing better just means a for loop with an if check. ( Rob’s comments on regular expressions goes into more detail: https://commandcenter.blogspot.ca/2011/08/regular-expressions-in-lexing-and.html )

In general you want to avoid regex's in hot loops on really well structured data / for simple things.

Heres a link to the profile for validate denom:

This is in a 1000 block sync, with a not-that-high amount of swaps. In the state machine side of block processing, we took ~700 seconds. (We've reduced I/O demands a decent amount, which is why I was measuring this as out of 600 seconds when saying 1% before)

Each of our swaps does around 10 balance updates, in addition to using coins internally in operations. (Theres also protorev increasing the number of swap attempts, but not simulating the balance movement, so those are just Coin calls within swaps)

We also have things like SubUnlockedCoins should never be calling regex's on the denoms, but is actually calling it 7 times.

alexanderbez · 2024-02-17T17:54:38Z

I see, this makes a ton of sense given we know what the structure (denom) of the data looks like. Let's move to a more performant for loop.

ValarDragon added the T:feature-request label Jul 31, 2023

tac0turtle added this to Cosmos-SDK Nov 17, 2023

github-project-automation bot moved this to 👀 To Do in Cosmos-SDK Nov 17, 2023

tac0turtle added the T: Performance Performance improvements label Nov 17, 2023

mattverse mentioned this issue Feb 21, 2024

feat(x/bank): Replace regex parsing of denom validation to generated parsing #19511

Merged

12 tasks

facundomedica closed this as completed in #19511 Mar 7, 2024

github-project-automation bot moved this from 👀 To Do to 🥳 Done in Cosmos-SDK Mar 7, 2024

tac0turtle removed this from Cosmos-SDK Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Move away from Regexp parsing of denom validity / decimal amounts #17221

[Feature]: Move away from Regexp parsing of denom validity / decimal amounts #17221

ValarDragon commented Jul 31, 2023

alexanderbez commented Jul 31, 2023

ValarDragon commented Jul 31, 2023

alexanderbez commented Jul 31, 2023

ValarDragon commented Feb 17, 2024

alexanderbez commented Feb 17, 2024

ValarDragon commented Feb 17, 2024 •

edited

Loading

alexanderbez commented Feb 17, 2024

[Feature]: Move away from Regexp parsing of denom validity / decimal amounts #17221

[Feature]: Move away from Regexp parsing of denom validity / decimal amounts #17221

Comments

ValarDragon commented Jul 31, 2023

Summary

Problem Definition

Proposal

alexanderbez commented Jul 31, 2023

ValarDragon commented Jul 31, 2023

alexanderbez commented Jul 31, 2023

ValarDragon commented Feb 17, 2024

alexanderbez commented Feb 17, 2024

ValarDragon commented Feb 17, 2024 • edited Loading

alexanderbez commented Feb 17, 2024

ValarDragon commented Feb 17, 2024 •

edited

Loading