Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

D: use ctRegex #46

Draft
wants to merge 1 commit into
base: optimized
Choose a base branch
from
Draft

Conversation

kubo39
Copy link

@kubo39 kubo39 commented Apr 13, 2023

No description provided.

@kubo39 kubo39 marked this pull request as draft April 13, 2023 10:21
@mariomka
Copy link
Owner

Thanks, my knowledge about D is limited, so I have some doubts.

  • What is the difference from the actual implementation? And why do you want to change it?
  • Does it make sense to keep only one implementation or both?

@kubo39
Copy link
Author

kubo39 commented Apr 14, 2023

Hi,

ctRegex compiles regular expression at compile-time.

I expected three things for performnace:

  1. avoid runtime regex construction cost, including for unicode.
  2. avoid heap allocations.
  3. compiles to native code and could be replaced with specialized instruction set.

see also this cool article, it's for rust's regex! macro (deprecated now described here, but very useful!).

However, in my local, the benchmark shows no difference. (sorry, I should check before send PR!)

I'm digging, and will close it found the reason.

@mariomka
Copy link
Owner

Thanks for the info!

I ran it on my computer, and there's a small change, not huge, but it's better.

@kubo39
Copy link
Author

kubo39 commented Apr 14, 2023

DMD - v2.103.0

  • slower than optimized branch.
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ git branch
* d-compile-time-regex
  master
  optimized
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ dmd -O -release d/benchmark.d
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ./benchmark input-text.txt
307.404800 - 92
300.025700 - 5301
4.375800 - 5
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ git checkout optimized
Switched to branch 'optimized'
Your branch is up to date with 'upstream/optimized'.
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ dmd -O -release d/benchmark.d
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ./benchmark input-text.txt
262.630300 - 92
269.145000 - 5301
5.823400 - 5
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ./benchmark input-text.txt
264.894800 - 92
268.622300 - 5301
5.635600 - 5
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ git checkout d-compile-time-regex
Switched to branch 'd-compile-time-regex'
Your branch is up to date with 'origin/d-compile-time-regex'.
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ dmd -O -release d/benchmark.d
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ./benchmark input-text.txt
290.224200 - 92
283.388900 - 5301
4.662600 - 5

LDC - v1.32.0

  • much faster than optimized.
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ git checkout optimized
Switched to branch 'optimized'
Your branch is up to date with 'upstream/optimized'.
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ldc2 -O3 -release d/benchmark.d
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ git branch
  d-compile-time-regex
  master
* optimized
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ldc2 -O3 -release d/benchmark.d
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ./benchmark input-text.txt
167.561100 - 92
163.916900 - 5301
4.397100 - 5
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ git checkout d-compile-time-regex
Switched to branch 'd-compile-time-regex'
Your branch is up to date with 'origin/d-compile-time-regex'.
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ldc2 -O3 -release d/benchmark.d
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ./benchmark input-text.txt
88.026900 - 92
88.755400 - 5301
3.594100 - 5
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ./benchmark input-text.txt
88.634500 - 92
88.571300 - 5301
3.624200 - 5

@cyrusmsk
Copy link

cyrusmsk commented May 31, 2023

Just use:

import std.array;

auto m = data.matchAll(ctRegex!(pattern));
count = cast(int) m.array.length;

It is easy to read and run faster than foreach.

@cyrusmsk
Copy link

Thanks, my knowledge about D is limited, so I have some doubts.

  • What is the difference from the actual implementation? And why do you want to change it?
  • Does it make sense to keep only one implementation or both?

I propose to remain both. Currently ctRegex should work faster. But in D community many people don't like this approach - because it increase compilation time significantly. There were even some talks to remove ctRegex from std library. But it is just some rumors - and it is better to have both. It will be ease to remove one solution in future in case something will changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants