-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precompile regexes #382
Precompile regexes #382
Conversation
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
.gitattributes
Outdated
@@ -1,2 +1,3 @@ | |||
dist/** binary linguist-generated | |||
swift-package/Resources/assets/** binary linguist-generated | |||
src/Form/matching-config/__generated__/** binary linguist-generated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this to the generated files so it doesn't pollute the diffs in reviews. A counterargument could be that manually reviewing it could ensure proper output, but the tests are in charge of that 💪. Let me know if you disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this instance, I think we shouldn't hide the diff - mostly just because of what the Node docs say about inspect
The output of util.inspect may change at any time and should not be depended upon programmatically
likewise I'm not 100% how it's handling unicode output - I think we should add a unit test to this PR - just a very simple one that snapshots the output of util.inspect on a handful of the more exotic regexes. In that same test file you can require the compiled output and test against a few strings.
^ none of that has to be exhaustive - but given the amount of strings/arrays we're compressing here, I think a few tests would really help us understand what's happening - not to mention it would be a great place to add regression tests if we need them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not hiding the diff makes sense. I'll do that.
On the testing, I'm a little unclear what exactly you would test. We have a lot of tests that include characters like メールアドレス, 電話, 姓, plus Russian characters and whatnot. I could snapshot the output on a few of these, but overall if inspect
changes and starts failing on certain inputs, it would happen during a node upgrade and will break several of the existing tests. I guess a specific unit test could decrease debugging time. Is that your rationale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that because we're checking in the file, it makes it much less risky. As you say, so much testing is done elsewhere too - so we could come back here if/when inspect
changes it's output. Happy with that approach :)
`/* DO NOT EDIT, this file was generated by scripts/precompile-regexes.js */\n\n`, | ||
`/** @type {MatchingConfiguration} */\n`, | ||
'const matchingConfiguration = ', | ||
inspect(matchingConfiguration, {maxArrayLength: Infinity, depth: Infinity, maxStringLength: Infinity}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a node function that prints out objects as strings. It's used by the console, for example, when you log objects and such. It works fine for our needs, in particular for outputting RegExp
objects as literals. All the alternatives seemed much more effort-intensive for no apparent benefit.
Signed-off-by: Emanuele Feliziani <[email protected]>
@@ -1007,4 +1006,4 @@ const matchingConfiguration = { | |||
} | |||
} | |||
|
|||
export { matchingConfiguration } | |||
module.exports = { matchingConfiguration } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using CommonJS since these are handled in node at build time.
@@ -128,15 +126,12 @@ class Matching { | |||
console.warn('CSS selector not found for %s, using a default value', selectorName) | |||
return '' | |||
} | |||
if (Array.isArray(match)) { | |||
return match.join(',') | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done at build time 🎉.
Signed-off-by: Emanuele Feliziani <[email protected]>
return undefined | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regexes are sanitized at build time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GioSensation this looks great - I like the idea of pre-compiling for sure.
I've just added one comment about how I think some basic unit-tests would help here, and how I don't think we should bury the diff output of this generated file - let me know what you think or if you have counter-ideas :)
.gitattributes
Outdated
@@ -1,2 +1,3 @@ | |||
dist/** binary linguist-generated | |||
swift-package/Resources/assets/** binary linguist-generated | |||
src/Form/matching-config/__generated__/** binary linguist-generated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this instance, I think we shouldn't hide the diff - mostly just because of what the Node docs say about inspect
The output of util.inspect may change at any time and should not be depended upon programmatically
likewise I'm not 100% how it's handling unicode output - I think we should add a unit test to this PR - just a very simple one that snapshots the output of util.inspect on a handful of the more exotic regexes. In that same test file you can require the compiled output and test against a few strings.
^ none of that has to be exhaustive - but given the amount of strings/arrays we're compressing here, I think a few tests would really help us understand what's happening - not to mention it would be a great place to add regression tests if we need them
Signed-off-by: Emanuele Feliziani <[email protected]>
Task/Issue URL: https://app.asana.com/0/1205602928782192/1205602928782192 Autofill Release: https://github.com/duckduckgo/duckduckgo-autofill/releases/tag/8.4.1 BSK PR: duckduckgo/BrowserServicesKit#517 ## Description Updates Autofill to version [8.4.1](https://github.com/duckduckgo/duckduckgo-autofill/releases/tag/8.4.1). ### Autofill 8.4.1 release notes ## What's Changed * Precompile regexes by @GioSensation in duckduckgo/duckduckgo-autofill#382 * Add perf tests by @GioSensation in duckduckgo/duckduckgo-autofill#383 * deps: remove deprecated content-scope-utils by @shakyShane in duckduckgo/duckduckgo-autofill#384 * Update dependencies by @GioSensation in duckduckgo/duckduckgo-autofill#368 * Fix perf regressions by @GioSensation in duckduckgo/duckduckgo-autofill#385 **Full Changelog**: duckduckgo/duckduckgo-autofill@8.4.0...8.4.1 ## Steps to test This release has been tested during autofill development. For smoke test steps see [this task](https://app.asana.com/0/1198964220583541/1200583647142330/f). --------- Co-authored-by: GioSensation <[email protected]> Co-authored-by: Chris Brind <[email protected]>
Reviewer: @shakyShane
Asana: https://app.asana.com/0/0/1205509987156723/f
Description
Precompiles the regexes at build time instead of at runtime. This is to avoid calling
new RegExp
up to several thousand times in certain edge cases. The actual performance improvement is negligible on most cases, but in certain edge cases it can go up to 10-12% of total scan time. On the test suite, it shaves off around a second (on my machine). The idea came initially from Lucas.Steps to test
All tests passing 🎉. CI should be good as well, because we just use the same old grunt workflow.