Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VerbalExpression Performance Question #18

Open
ophirmi opened this issue Aug 20, 2013 · 13 comments
Open

VerbalExpression Performance Question #18

ophirmi opened this issue Aug 20, 2013 · 13 comments

Comments

@ophirmi
Copy link

ophirmi commented Aug 20, 2013

Hi all,
I ran the following test to check how well verbal expression performs compared to a regular expression:

    [Test]
    public void TestingIfWeHaveAValidURL()
    {
        var testMe = "https://www.google.com";

        var swVerb = new Stopwatch();
        swVerb.Start();
        verbEx = VerbalExpressions.DefaultExpression
                    .StartOfLine()
                    .Then("http")
                    .Maybe("s")
                    .Then("://")
                    .Maybe("www.")
                    .AnythingBut(" ")
                    .EndOfLine();


        Assert.IsTrue(verbEx.Test(testMe), "The URL is incorrect");
        swVerb.Stop();

        var swRegex = new Stopwatch();
        swRegex.Start();
        var regex = new Regex( @"^http(s)?://([\w-]+.)+[\w-]+(/[\w- ./?%&=])?$" );
        Assert.IsTrue( regex.IsMatch( testMe ) );
        swRegex.Stop();
        //Verb: 133 ms Regex: 4 ms
        Console.WriteLine("Verb: {0}   Regex: {1}", swVerb.ElapsedMilliseconds, swRegex.ElapsedMilliseconds);
    }

I ran it a couple of times and verbal expression runs at about 130 milliseconds, while regular expression runs in about 5 ms.

Same results returned in other tests I did.
I'm considering using the verbal expression in my indexing project, so this time gap is too big.
What do you think?

Thanks

@jwood803
Copy link
Contributor

I'm guessing the extra time is just due to the chaining of the methods calling individual expressions themselves instead of just one big expression like in the second test. Would be interesting to hear other thoughts, though.

@ophirmi
Copy link
Author

ophirmi commented Aug 20, 2013

Changing to the following test brings almost the same results:

[Test]
public void TestingIfWeHaveAValidURL()
{
var testMe = "https://www.google.com";

        var swVerb = new Stopwatch();
        verbEx = VerbalExpressions.DefaultExpression
                    .StartOfLine()
                    .Then("http")
                    .Maybe("s")
                    .Then("://")
                    .Maybe("www.")
                    .AnythingBut(" ")
                    .EndOfLine();

        swVerb.Start();
        Assert.IsTrue(verbEx.Test(testMe), "The URL is incorrect");
        swVerb.Stop();

        var swRegex = new Stopwatch();
        swRegex.Start();
        var regex = new Regex( @"^http(s)?://([\w-]+.)+[\w-]+(/[\w- ./?%&=])?$" );
        Assert.IsTrue( regex.IsMatch( testMe ) );
        swRegex.Stop();
        //Verb: 133 ms Regex: 4 ms
        Console.WriteLine("Verb: {0}   Regex: {1}", swVerb.ElapsedMilliseconds, swRegex.ElapsedMilliseconds);
    }

As I understand it the chaining gets translated to the same regular expression here:

    public bool IsMatch(string toTest)
    {
        return PatternRegex.IsMatch(toTest);
    }

So why the big time difference?
Thanks :)

@jwood803
Copy link
Contributor

Perhaps because it still has to chain it all together before it can get to the IsMatch() method that's taking the extra time?

I might run it under the profiler when I get a chance and see if I can find anything.

@alexpeta
Copy link
Contributor

If i understand correctly , you are timing the amount of time it takes for a verbex to initialize and to be asserted versus a regex and i dont see the point here.

If i were to benchmark performance for verbex vs regex then i would have them already initialized and run agains a very large search input ( a large text file) a couple of hundreds of times , and then obtain an average and compare the both.

How does this sound?

@ophirmi
Copy link
Author

ophirmi commented Aug 20, 2013

Same big gap (135ms to 1ms) in the following example:

    [Test]
    public void Then_VerbalExpressionsEmail_DoesMatchEmail()
    {
        verbEx.StartOfLine().Then(CommonRegex.Email);

        var swVer = new Stopwatch();            
        swVer.Start();
        var isMatchVer = verbEx.IsMatch("[email protected]");
        Assert.IsTrue(isMatchVer, "Should match email address");
        swVer.Stop();

        Regex regex = verbEx.ToRegex();
        var swRegex = new Stopwatch();
        swRegex.Start();
        var isMatch = regex.IsMatch("[email protected]");
        Assert.IsTrue(isMatch, "Should match email address");
        swRegex.Stop();
        //Ver: 121 ms,    Regex: 0 ms  
        Console.Write( "Ver: {0}   Regex: {1}", swVer.ElapsedMilliseconds, swRegex.ElapsedMilliseconds );
    }

And here it's the same regular expression.

@ophirmi
Copy link
Author

ophirmi commented Aug 20, 2013

alexpeta: I'll try that, thanks.

@psoholt
Copy link
Member

psoholt commented Aug 20, 2013

It's just the initialization of Regex the first time. Try doing the regex timing first and then the verbex and you get exactly the opposite result:

    [Test]
    public void TestingIfWeHaveAValidURLTiming()
    {
        var testMe = "https://www.google.com";

        var swRegex = new Stopwatch();
        swRegex.Start();
        var regex = new Regex(@"^http(s)?://([\w-]+.)+[\w-]+(/[\w- ./?%&=])?$");
        Assert.IsTrue(regex.IsMatch(testMe));
        swRegex.Stop();

        var swVerb = new Stopwatch();
        verbEx =
            VerbalExpressions.DefaultExpression.StartOfLine()
                             .Then("http")
                             .Maybe("s")
                             .Then("://")
                             .Maybe("www.")
                             .AnythingBut(" ")
                             .EndOfLine();

        swVerb.Start();
        Assert.IsTrue(verbEx.Test(testMe), "The URL is incorrect");
        swVerb.Stop();

        //Verb: 12   Regex: 161
        Console.WriteLine("Verb: {0}   Regex: {1}", swVerb.ElapsedMilliseconds, swRegex.ElapsedMilliseconds);
    }

@ophirmi
Copy link
Author

ophirmi commented Aug 21, 2013

Hi,
I built this simple console application to test regex vs verbal expressions run time,
when going over a file of 300,000 urls and testing each url by regex or verbal expression:

private const string urlListFilePath = @"Data\urlList.txt";
private const int testReturnTimes = 1;
private static string[] urls;

    static void Main(string[] args)
    {
        urls = File.ReadAllLines(urlListFilePath);

        long totalTime = 0;
        for (int j = 0; j < testReturnTimes; j++)
        {
            totalTime += TestRegex();
        }
        //10 times test - 260 ms
        //20 times test - 250 ms
        long avgTimeRegex = totalTime / testReturnTimes;

        totalTime = 0;
        for(int i=0;i<testReturnTimes;i++)
        {
            totalTime += TestVer();
        }
        //10 times test - 4000 ms
        //20 times test -3900 ms
        long avgTimeVer = totalTime / testReturnTimes;
    }

    public static long TestVer()
    {
        var verbEx = VerbalExpressions.DefaultExpression
                    .StartOfLine()
                    .Then("http")
                    .Maybe("s")
                    .Then("://")
                    .Maybe("www.")
                    .AnythingBut(" ")
                    .EndOfLine();

        int urlsCount = 0;
        var swVerb = new Stopwatch();

        swVerb.Start();
        foreach (var url in urls)
        {
            if (verbEx.Test(url)) urlsCount++;
        }
        swVerb.Stop();

        return swVerb.ElapsedMilliseconds;
    }

    public static long TestRegex()
    {
        var regex = new Regex( @"^(http)(s)?(://)(www\.)?([^\ ]*)$", RegexOptions.Multiline );
        int urlsCount = 0;
        var swRegex = new Stopwatch();

        swRegex.Start();
        foreach (var url in urls)
        {
            if (regex.IsMatch(url)) urlsCount++;
        }
        swRegex.Stop();

        return swRegex.ElapsedMilliseconds;
    }

Tested average of 10 test runs and 20 and results are the same: regular expression takes about 250 ms while verbal expression takes 4000 ms.

What do you thinks?
Thanks

@ophirmi
Copy link
Author

ophirmi commented Aug 21, 2013

Just to make the former post clearer, regular expression average run takes about 250 ms per run on 300K urls, while verbal expression takes about 4000 ms per run. Same results for 100 runs.

@jwood803
Copy link
Contributor

I totally forgot to profile it the other night, but I'll remember to do so tonight to see what it looks like from that perspective. I like messing with performance stuff, so thanks for posing these questions! :]

@jwood803
Copy link
Contributor

Let me know if I'm totally wrong with this, everyone.

Did a small memory profile and it seems to indicate that the biggest offender may be the Test method. Looking at it, it calls the PatternRegex.IsMatch() method. The PatternRegex property seems to new up a new Regex object each time. I wonder if doing that each time it's called can cause it to create new objects on the heap each time it's called cause the performance numbers you were encountering?

@psoholt
Copy link
Member

psoholt commented Aug 27, 2013

Interesting have to look into this...

@Yousefjb
Copy link

hello ?
maybe someone can change IsMatch to return this

return Regex.IsMatch(toTest, RegexString, _modifiers);   

and Capture to this

var match = Regex.Match(toTest, RegexString, _modifiers);`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants