-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve non-English matching #374
Conversation
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
const conservativeSignupRegex = new RegExp(/sign.?up|join|register|enroll|newsletter|subscri(be|ption)|settings|preferences|profile|update/i) | ||
const strictSignupRegex = new RegExp(/sign.?up|join|register|(create|new).+account|enroll|settings|preferences|profile|update/i) | ||
const resetPasswordLink = new RegExp(/(forgot(ten)?|reset|don't remember) (your )?password|password forgotten/i) | ||
const loginProvidersRegex = new RegExp(/ with /i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved them all to the matching configuration, so they're centralized and easier to translated.
@@ -162,6 +153,24 @@ class FormAnalyzer { | |||
}) | |||
} | |||
|
|||
evaluateUrl () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just another signal.
@@ -207,12 +216,13 @@ class FormAnalyzer { | |||
strength: 5, | |||
signalType: `explicit: ${el.getAttribute('autocomplete')}` | |||
}) | |||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added these return
s, otherwise certain elements could be counted multiple times.
email: { | ||
type: 'email', | ||
unknown: { | ||
type: 'unknown', | ||
strategies: { | ||
cssSelector: 'email', | ||
ddgMatcher: 'email', | ||
ddgMatcher: 'unknown' | ||
} | ||
}, | ||
emailAddress: { | ||
type: 'emailAddress', | ||
strategies: { | ||
cssSelector: 'emailAddress', | ||
ddgMatcher: 'emailAddress', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is simpler to read if you hide whitespace, but anyway all I'm doing is:
- Introduce an
unknown
matcher to ensure we return early when certain signals are obvious (like captchas). This has a minor performance impact, but most importantly avoids false-matching with the vendor strategy when we already know it should beunknwon
(vendor strategies don't have the concept offorceUnknown
). - Rename
email
toemailAddress
because that's how we call it in the rest of the codebase, so this inconsistency just added confusion. If we ever want to go in and simplify all this stuff to only have one name, this brings us closer.
birthdayYear: css.__secret_do_not_use.birthdayYear | ||
} | ||
}, | ||
cssSelector: {selectors}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now exporting the selectors directly with simpler naming.
src/Form/matching-configuration.js
Outdated
unknown: { | ||
match: 'search|filter|subject|title|captcha|mfa|2fa|two factor|one-time|otp' + | ||
// Italian | ||
'|cerca|filtr|oggetto|titolo|(due|più) fattori', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm copying this approach from the vendors. It's just breaking these strings with a +
and adding comments. We can add more complexity later if we ever need it.
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]> # Conflicts: # src/Scanner.js
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]> # Conflicts: # dist/autofill-debug.js # dist/autofill.js # src/Scanner.js # swift-package/Resources/assets/autofill-debug.js # swift-package/Resources/assets/autofill.js
Signed-off-by: Emanuele Feliziani <[email protected]>
… ema/improve-non-english-matching Signed-off-by: Emanuele Feliziani <[email protected]> # Conflicts: # dist/autofill-debug.js # dist/autofill.js # src/Scanner.js # src/autofill-utils.js # swift-package/Resources/assets/autofill-debug.js # swift-package/Resources/assets/autofill.js
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
I've pushed a bunch of commits, but most were actually only from pulling from the other branch (already merged). Not much has actually changed in the core of this PR. I'll go ahead and merge. |
Task/Issue URL: https://app.asana.com/0/1205538042573894/1205538042573894 Autofill Release: https://github.com/duckduckgo/duckduckgo-autofill/releases/tag/8.4.0 ## Description Updates Autofill to version [8.4.0](https://github.com/duckduckgo/duckduckgo-autofill/releases/tag/8.4.0). ### Autofill 8.4.0 release notes ## What's Changed * Improve non-English matching by @GioSensation in duckduckgo/duckduckgo-autofill#374 **Full Changelog**: duckduckgo/duckduckgo-autofill@8.3.0...8.4.0 ## Steps to test This release has been tested during autofill development. For smoke test steps see [this task](https://app.asana.com/0/1198964220583541/1200583647142330/f). Co-authored-by: GioSensation <[email protected]>
Task/Issue URL: https://app.asana.com/0/1205538042573894/1205538042573894 Autofill Release: https://github.com/duckduckgo/duckduckgo-autofill/releases/tag/8.4.0 ## Description Updates Autofill to version [8.4.0](https://github.com/duckduckgo/duckduckgo-autofill/releases/tag/8.4.0). ### Autofill 8.4.0 release notes ## What's Changed * Improve non-English matching by @GioSensation in duckduckgo/duckduckgo-autofill#374 **Full Changelog**: duckduckgo/duckduckgo-autofill@8.3.0...8.4.0 ## Steps to test This release has been tested during autofill development. For smoke test steps see [this task](https://app.asana.com/0/1198964220583541/1200583647142330/f). Co-authored-by: GioSensation <[email protected]>
Reviewer: https://app.asana.com/0/0/1205354344050186/f
Asana: https://app.asana.com/0/0/1204235865964101/f
Description
Improves the accuracy of our algo on non-English forms. Note: the diff looks huge, but 2900 lines is just for one test case file. The PR itself is not that complicated.
Steps to test
Will add some automated tests, but the bulk of the testing will be done by running the script using autofill-compare.