-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precompile regexes #382
Precompile regexes #382
Changes from 5 commits
b8102f7
76c1598
13f2af5
39d0668
06716bd
3e39f94
04ea0d9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,3 @@ | ||
dist/** binary linguist-generated | ||
swift-package/Resources/assets/** binary linguist-generated | ||
src/Form/matching-config/__generated__/** binary linguist-generated | ||
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
const {matchingConfiguration} = require('../src/Form/matching-config/matching-config-source.js') | ||
const {writeFileSync} = require('fs') | ||
const {join} = require('path') | ||
const {inspect} = require('util') | ||
|
||
/** | ||
* DDGRegexes are stored as strings so we can annotate them with comments, here we transform them into RegExp | ||
*/ | ||
|
||
/** | ||
* Loop through Object.entries and transform all values to RegExp | ||
* @param {Object} obj | ||
*/ | ||
function convertAllValuesToRegex (obj) { | ||
for (const [key, value] of Object.entries(obj)) { | ||
const source = String(value).normalize('NFKC') | ||
obj[key] = new RegExp(source, 'ui') | ||
} | ||
return obj | ||
} | ||
for (const [key, value] of Object.entries(matchingConfiguration.strategies.ddgMatcher.matchers)) { | ||
matchingConfiguration.strategies.ddgMatcher.matchers[key] = convertAllValuesToRegex(value) | ||
} | ||
|
||
/** | ||
* Prepare CSS rules by concatenating arrays and removing whitespace | ||
*/ | ||
Object.entries(matchingConfiguration.strategies.cssSelector.selectors).forEach(([name, selector]) => { | ||
if (Array.isArray(selector)) { | ||
selector = selector.join(',') | ||
} | ||
matchingConfiguration.strategies.cssSelector.selectors[name] = selector.replace(/\n/g, ' ').replace(/\s{2,}/g, ' ').trim() | ||
}) | ||
|
||
/** | ||
* VendorRules come from different providers, here we merge them all together in one RegEx per inputType | ||
*/ | ||
|
||
/** | ||
* Merge our vendor rules into a single RegEx | ||
* @param {keyof VendorRegexRules} ruleName | ||
* @param {VendorRegexConfiguration["ruleSets"]} ruleSets | ||
* @return {{RULES: Record<keyof VendorRegexRules, RegExp | undefined>}} | ||
*/ | ||
function mergeVendorRules (ruleName, ruleSets) { | ||
let rules = [] | ||
ruleSets.forEach(set => { | ||
if (set[ruleName]) { | ||
rules.push(`(${set[ruleName]?.toLowerCase()})`.normalize('NFKC')) | ||
} | ||
}) | ||
return new RegExp(rules.join('|'), 'iu') | ||
} | ||
const ruleSets = matchingConfiguration.strategies.vendorRegex.ruleSets | ||
for (const ruleName of Object.keys(matchingConfiguration.strategies.vendorRegex.rules)) { | ||
matchingConfiguration.strategies.vendorRegex.rules[ruleName] = mergeVendorRules(ruleName, ruleSets) | ||
} | ||
|
||
/** | ||
* Build the file contents | ||
*/ | ||
const fileContents = [ | ||
`/* DO NOT EDIT, this file was generated by scripts/precompile-regexes.js */\n\n`, | ||
`/** @type {MatchingConfiguration} */\n`, | ||
'const matchingConfiguration = ', | ||
inspect(matchingConfiguration, {maxArrayLength: Infinity, depth: Infinity, maxStringLength: Infinity}), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a node function that prints out objects as strings. It's used by the console, for example, when you log objects and such. It works fine for our needs, in particular for outputting |
||
'\n\nexport { matchingConfiguration }\n' | ||
].join('') | ||
|
||
// Write to file | ||
const outputPath = join(__dirname, '../src/Form/matching-config/__generated__', '/compiled-matching-config.js') | ||
writeFileSync(outputPath, fileContents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this to the generated files so it doesn't pollute the diffs in reviews. A counterargument could be that manually reviewing it could ensure proper output, but the tests are in charge of that 💪. Let me know if you disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this instance, I think we shouldn't hide the diff - mostly just because of what the Node docs say about
inspect
likewise I'm not 100% how it's handling unicode output - I think we should add a unit test to this PR - just a very simple one that snapshots the output of util.inspect on a handful of the more exotic regexes. In that same test file you can require the compiled output and test against a few strings.
^ none of that has to be exhaustive - but given the amount of strings/arrays we're compressing here, I think a few tests would really help us understand what's happening - not to mention it would be a great place to add regression tests if we need them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not hiding the diff makes sense. I'll do that.
On the testing, I'm a little unclear what exactly you would test. We have a lot of tests that include characters like メールアドレス, 電話, 姓, plus Russian characters and whatnot. I could snapshot the output on a few of these, but overall if
inspect
changes and starts failing on certain inputs, it would happen during a node upgrade and will break several of the existing tests. I guess a specific unit test could decrease debugging time. Is that your rationale?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that because we're checking in the file, it makes it much less risky. As you say, so much testing is done elsewhere too - so we could come back here if/when
inspect
changes it's output. Happy with that approach :)