Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve non-English matching #374

Merged
merged 45 commits into from
Sep 19, 2023
Merged

Conversation

GioSensation
Copy link
Member

@GioSensation GioSensation commented Aug 28, 2023

Reviewer: https://app.asana.com/0/0/1205354344050186/f
Asana: https://app.asana.com/0/0/1204235865964101/f

Description

Improves the accuracy of our algo on non-English forms. Note: the diff looks huge, but 2900 lines is just for one test case file. The PR itself is not that complicated.

Steps to test

Will add some automated tests, but the bulk of the testing will be done by running the script using autofill-compare.

Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
@GioSensation GioSensation self-assigned this Aug 28, 2023
const conservativeSignupRegex = new RegExp(/sign.?up|join|register|enroll|newsletter|subscri(be|ption)|settings|preferences|profile|update/i)
const strictSignupRegex = new RegExp(/sign.?up|join|register|(create|new).+account|enroll|settings|preferences|profile|update/i)
const resetPasswordLink = new RegExp(/(forgot(ten)?|reset|don't remember) (your )?password|password forgotten/i)
const loginProvidersRegex = new RegExp(/ with /i)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved them all to the matching configuration, so they're centralized and easier to translated.

@@ -162,6 +153,24 @@ class FormAnalyzer {
})
}

evaluateUrl () {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just another signal.

@@ -207,12 +216,13 @@ class FormAnalyzer {
strength: 5,
signalType: `explicit: ${el.getAttribute('autocomplete')}`
})
return
Copy link
Member Author

@GioSensation GioSensation Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added these returns, otherwise certain elements could be counted multiple times.

Comment on lines -13 to +23
email: {
type: 'email',
unknown: {
type: 'unknown',
strategies: {
cssSelector: 'email',
ddgMatcher: 'email',
ddgMatcher: 'unknown'
}
},
emailAddress: {
type: 'emailAddress',
strategies: {
cssSelector: 'emailAddress',
ddgMatcher: 'emailAddress',
Copy link
Member Author

@GioSensation GioSensation Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simpler to read if you hide whitespace, but anyway all I'm doing is:

  • Introduce an unknown matcher to ensure we return early when certain signals are obvious (like captchas). This has a minor performance impact, but most importantly avoids false-matching with the vendor strategy when we already know it should be unknwon (vendor strategies don't have the concept of forceUnknown).
  • Rename email to emailAddress because that's how we call it in the rest of the codebase, so this inconsistency just added confusion. If we ever want to go in and simplify all this stuff to only have one name, this brings us closer.

birthdayYear: css.__secret_do_not_use.birthdayYear
}
},
cssSelector: {selectors},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now exporting the selectors directly with simpler naming.

unknown: {
match: 'search|filter|subject|title|captcha|mfa|2fa|two factor|one-time|otp' +
// Italian
'|cerca|filtr|oggetto|titolo|(due|più) fattori',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm copying this approach from the vendors. It's just breaking these strings with a + and adding comments. We can add more complexity later if we ever need it.

Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>

# Conflicts:
#	src/Scanner.js
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>

# Conflicts:
#	dist/autofill-debug.js
#	dist/autofill.js
#	src/Scanner.js
#	swift-package/Resources/assets/autofill-debug.js
#	swift-package/Resources/assets/autofill.js
… ema/improve-non-english-matching

Signed-off-by: Emanuele Feliziani <[email protected]>

# Conflicts:
#	dist/autofill-debug.js
#	dist/autofill.js
#	src/Scanner.js
#	src/autofill-utils.js
#	swift-package/Resources/assets/autofill-debug.js
#	swift-package/Resources/assets/autofill.js
Signed-off-by: Emanuele Feliziani <[email protected]>
Signed-off-by: Emanuele Feliziani <[email protected]>
@GioSensation
Copy link
Member Author

I've pushed a bunch of commits, but most were actually only from pulling from the other branch (already merged). Not much has actually changed in the core of this PR. I'll go ahead and merge.

@GioSensation GioSensation merged commit f3eccad into main Sep 19, 2023
1 check passed
@GioSensation GioSensation deleted the ema/improve-non-english-matching branch September 19, 2023 10:24
CDRussell pushed a commit to duckduckgo/Android that referenced this pull request Sep 25, 2023
Task/Issue URL:
https://app.asana.com/0/1205538042573894/1205538042573894
Autofill Release:
https://github.com/duckduckgo/duckduckgo-autofill/releases/tag/8.4.0


## Description
Updates Autofill to version
[8.4.0](https://github.com/duckduckgo/duckduckgo-autofill/releases/tag/8.4.0).

### Autofill 8.4.0 release notes
## What's Changed
* Improve non-English matching by @GioSensation in
duckduckgo/duckduckgo-autofill#374


**Full Changelog**:
duckduckgo/duckduckgo-autofill@8.3.0...8.4.0

## Steps to test
This release has been tested during autofill development. For smoke test
steps see [this
task](https://app.asana.com/0/1198964220583541/1200583647142330/f).

Co-authored-by: GioSensation <[email protected]>
joshliebe pushed a commit to duckduckgo/Android that referenced this pull request Nov 7, 2023
Task/Issue URL:
https://app.asana.com/0/1205538042573894/1205538042573894
Autofill Release:
https://github.com/duckduckgo/duckduckgo-autofill/releases/tag/8.4.0


## Description
Updates Autofill to version
[8.4.0](https://github.com/duckduckgo/duckduckgo-autofill/releases/tag/8.4.0).

### Autofill 8.4.0 release notes
## What's Changed
* Improve non-English matching by @GioSensation in
duckduckgo/duckduckgo-autofill#374


**Full Changelog**:
duckduckgo/duckduckgo-autofill@8.3.0...8.4.0

## Steps to test
This release has been tested during autofill development. For smoke test
steps see [this
task](https://app.asana.com/0/1198964220583541/1200583647142330/f).

Co-authored-by: GioSensation <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants