Precompile regexes #382

GioSensation · 2023-09-21T07:59:43Z

Reviewer: @shakyShane
Asana: https://app.asana.com/0/0/1205509987156723/f

Description

Precompiles the regexes at build time instead of at runtime. This is to avoid calling new RegExp up to several thousand times in certain edge cases. The actual performance improvement is negligible on most cases, but in certain edge cases it can go up to 10-12% of total scan time. On the test suite, it shaves off around a second (on my machine). The idea came initially from Lucas.

Steps to test

All tests passing 🎉. CI should be good as well, because we just use the same old grunt workflow.

Signed-off-by: Emanuele Feliziani <[email protected]>

GioSensation · 2023-09-21T08:02:17Z

.gitattributes

@@ -1,2 +1,3 @@
 dist/** binary linguist-generated
 swift-package/Resources/assets/** binary linguist-generated
+src/Form/matching-config/__generated__/** binary linguist-generated


I've added this to the generated files so it doesn't pollute the diffs in reviews. A counterargument could be that manually reviewing it could ensure proper output, but the tests are in charge of that 💪. Let me know if you disagree.

in this instance, I think we shouldn't hide the diff - mostly just because of what the Node docs say about inspect

The output of util.inspect may change at any time and should not be depended upon programmatically

likewise I'm not 100% how it's handling unicode output - I think we should add a unit test to this PR - just a very simple one that snapshots the output of util.inspect on a handful of the more exotic regexes. In that same test file you can require the compiled output and test against a few strings.

^ none of that has to be exhaustive - but given the amount of strings/arrays we're compressing here, I think a few tests would really help us understand what's happening - not to mention it would be a great place to add regression tests if we need them

Not hiding the diff makes sense. I'll do that.

On the testing, I'm a little unclear what exactly you would test. We have a lot of tests that include characters like メールアドレス, 電話, 姓, plus Russian characters and whatnot. I could snapshot the output on a few of these, but overall if inspect changes and starts failing on certain inputs, it would happen during a node upgrade and will break several of the existing tests. I guess a specific unit test could decrease debugging time. Is that your rationale?

I think that because we're checking in the file, it makes it much less risky. As you say, so much testing is done elsewhere too - so we could come back here if/when inspect changes it's output. Happy with that approach :)

GioSensation · 2023-09-21T08:08:01Z

scripts/precompile-regexes.js

+    `/* DO NOT EDIT, this file was generated by scripts/precompile-regexes.js */\n\n`,
+    `/** @type {MatchingConfiguration} */\n`,
+    'const matchingConfiguration = ',
+    inspect(matchingConfiguration, {maxArrayLength: Infinity, depth: Infinity, maxStringLength: Infinity}),


This is a node function that prints out objects as strings. It's used by the console, for example, when you log objects and such. It works fine for our needs, in particular for outputting RegExp objects as literals. All the alternatives seemed much more effort-intensive for no apparent benefit.

Signed-off-by: Emanuele Feliziani <[email protected]>

GioSensation · 2023-09-21T08:13:37Z

src/Form/matching-config/matching-config-source.js

@@ -1007,4 +1006,4 @@ const matchingConfiguration = {
    }
 }

-export { matchingConfiguration }
+module.exports = { matchingConfiguration }


Using CommonJS since these are handled in node at build time.

GioSensation · 2023-09-21T08:14:22Z

src/Form/matching.js

@@ -128,15 +126,12 @@ class Matching {
            console.warn('CSS selector not found for %s, using a default value', selectorName)
            return ''
        }
-        if (Array.isArray(match)) {
-            return match.join(',')
-        }


Done at build time 🎉.

Signed-off-by: Emanuele Feliziani <[email protected]>

GioSensation · 2023-09-21T08:20:29Z

src/Form/matching.js

-        return undefined
-    }
-}
-


Regexes are sanitized at build time.

shakyShane

@GioSensation this looks great - I like the idea of pre-compiling for sure.

I've just added one comment about how I think some basic unit-tests would help here, and how I don't think we should bury the diff output of this generated file - let me know what you think or if you have counter-ideas :)

shakyShane · 2023-09-21T10:05:02Z

.gitattributes

@@ -1,2 +1,3 @@
 dist/** binary linguist-generated
 swift-package/Resources/assets/** binary linguist-generated
+src/Form/matching-config/__generated__/** binary linguist-generated


in this instance, I think we shouldn't hide the diff - mostly just because of what the Node docs say about inspect

The output of util.inspect may change at any time and should not be depended upon programmatically

likewise I'm not 100% how it's handling unicode output - I think we should add a unit test to this PR - just a very simple one that snapshots the output of util.inspect on a handful of the more exotic regexes. In that same test file you can require the compiled output and test against a few strings.

^ none of that has to be exhaustive - but given the amount of strings/arrays we're compressing here, I think a few tests would really help us understand what's happening - not to mention it would be a great place to add regression tests if we need them

Signed-off-by: Emanuele Feliziani <[email protected]>

@GioSensation

GioSensation added 2 commits September 20, 2023 16:50

Precompile regexes

b8102f7

Signed-off-by: Emanuele Feliziani <[email protected]>

Commit compiled files

76c1598

Signed-off-by: Emanuele Feliziani <[email protected]>

GioSensation self-assigned this Sep 21, 2023

GioSensation added 2 commits September 21, 2023 10:00

Mark generated file as such in git

13f2af5

Signed-off-by: Emanuele Feliziani <[email protected]>

Fix unintended changes

39d0668

Signed-off-by: Emanuele Feliziani <[email protected]>

GioSensation commented Sep 21, 2023

View reviewed changes

Fix spacing

06716bd

Signed-off-by: Emanuele Feliziani <[email protected]>

GioSensation commented Sep 21, 2023

View reviewed changes

Remove safeRegex

3e39f94

Signed-off-by: Emanuele Feliziani <[email protected]>

GioSensation commented Sep 21, 2023

View reviewed changes

src/Form/matching.js

return undefined

}

}

Copy link

Member Author

GioSensation Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regexes are sanitized at build time.

GioSensation marked this pull request as ready for review September 21, 2023 08:35

shakyShane requested changes Sep 21, 2023

View reviewed changes

shakyShane approved these changes Sep 21, 2023

View reviewed changes

Remove compiled matching config from gitattributes

04ea0d9

Signed-off-by: Emanuele Feliziani <[email protected]>

GioSensation merged commit 4a0fd69 into main Sep 21, 2023
1 check passed

GioSensation deleted the ema/precompile-regexes branch September 21, 2023 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precompile regexes #382

Precompile regexes #382

GioSensation commented Sep 21, 2023 •

edited

Loading

GioSensation Sep 21, 2023

shakyShane Sep 21, 2023

GioSensation Sep 21, 2023

shakyShane Sep 21, 2023

GioSensation Sep 21, 2023

GioSensation Sep 21, 2023

GioSensation Sep 21, 2023

GioSensation Sep 21, 2023

shakyShane left a comment

shakyShane Sep 21, 2023

Precompile regexes #382

Precompile regexes #382

Conversation

GioSensation commented Sep 21, 2023 • edited Loading

Description

Steps to test

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shakyShane left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GioSensation commented Sep 21, 2023 •

edited

Loading