-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regex very slow at CT #104
Comments
IIRC, about 400ms is the compilation itself. The culprit is
No, I don't think there is. Params/result are too complex. I'll have this is mind when iterating on nregex, though. nregex improvements tend to be ported to nim-regex. |
I'm not holding my breath ;-) ; maybe there's something that can be improved regardless of IC?
that seems exactly like the kind of thing where a vmhook would help, assuming it indeed is a bottleneck, eg
TIL about https://github.com/nitely/nregex ; btw, do you have a page (github issue / doc whatever) containing a comparison of nregex vs nim-regex, when to use which one, etc? |
nregex is just a playground for my regex experiments, don't use it. I'm working on a TDFA, and I'll likely break all APIs. I think it only makes sense to use it for very simple regex matching. Find/findAll are slow. Captures/submatches, and assertions ( |
ok, I was more concerned about the 51 seconds for
you can also dlopen variables, so values would work too. If there's no obvious internal proc that's called by all others, we can also wrap user facing API example:
|
Are you concerned about the tests being slow, or about nim-regex being slow at CT? It's not the same, because the current APIs need to be tested at CT, so even if we implement option 1, it won't make the tests fast. The issue I see is there is no way to compile the pattern at compile time, at least not with option 1. |
I'm not concerned about the tests's running time, since they don't affect user code; I'm concerned about nim-regex being slow at CT as it can affect user code.
i agree with that.
option 1 can work like this: proc main =
let a = replace(input, "\w+", "-")
static: main()
# this will be compiled into a dll and dlopened (once) via vmhook mechanism:
proc replaceVmHook(input, pattern, by): string =
result = replace(input, re(pattern), by) which runs at CT (from the perspective of client code) but using native code. Furthermore, |
What I mean is that API will compile the regex at runtime every time it's called, when called at runtime. |
I think this can be workaround if we can do something like: when nimvm:
func replace(s: string; pattern: string; by: string; limit = 0): string {.vhmhook.}
else:
func replace(s: string; pattern: static string; by: string; limit = 0): string
replaceImpl # compile the regex at CT? or even without the |
something like this could work, overloading on whether pattern is a static param: func replace(s: string; pattern: string; by: string; limit = 0): string {.vhmhook.}
func replace(s: string; pattern: static string; by: string; limit = 0): string {.vhmhook.} or simply require user to make this explicit, eg: func replace(s: string; pattern: string; by: string; limit = 0): string {.vhmhook.}
func replace(s: string; pattern: CtReg; by: string; limit = 0): string {.vhmhook.}
const r: CtReg = re"\w+" # now pkg/regex doesn't need to deal with caching, it's user responsability
let a = replace("ab", r, "cd") |
Is This is a valid use case that cannot be supported unless the API takes a compiled regex: func findInFiles(pattern: string, files: seq[File]): string =
let reg = re(pattern) # compile pattern
for file in files:
if find(file.content, reg):
return file.name if we only have |
example 1
nim c -d:danger tests/tests.nim
51 seconds
tests/tests
0.04 seconds
notes
I'm not sure whether it's due to VM code executing tests at CT or whether it's the compilation time itself, but the slow compilation time is noticeable.
profiling
nim c --profileVM tests/tests.nim
shows
but it's to be taken with a grain of salt due to implementation of
profileVM
example 2: self contained benchmark
this test shows a 20_000 X slowdown in VM
proposal
vmhook
: user defined vmops:proc fn(): int {.vmhook.} = impl
to JIT-compile and dlopenfn
at CT (using regular backend, not VM) timotheecour/Nim#598; this is analog to howvmops.hashVmImpl
speeds up in VM a key part of hashing in std/hashes, by running native code instead of VM codequestion
is there 1 or a few procs that could be run natively, so that the rest of code at CT would execute fast? In other words, is there an equivalent to hashes.hashVmImpl, hashes.hashVmImplByte in nim-regex?
ideally a proc that's:
The text was updated successfully, but these errors were encountered: