Skip to content

Commit

Permalink
Improve non-English matching (#374)
Browse files Browse the repository at this point in the history
* Fix CSS selector naming

Signed-off-by: Emanuele Feliziani <[email protected]>

* Move more regexes to config and centralise access

Signed-off-by: Emanuele Feliziani <[email protected]>

* Minor type fix

Signed-off-by: Emanuele Feliziani <[email protected]>

* Improve and rename getText to getTextShallow

Signed-off-by: Emanuele Feliziani <[email protected]>

* Avoid evaluating buttons twice in FormAnalyzer

Signed-off-by: Emanuele Feliziani <[email protected]>

* Evaluate URL when analyzing the form

Signed-off-by: Emanuele Feliziani <[email protected]>

* Centralise safeUniversalSelector

Signed-off-by: Emanuele Feliziani <[email protected]>

* Fix form scanning when markup is broken

Signed-off-by: Emanuele Feliziani <[email protected]>

* Simpler regex handling (no toLowerCase)

Signed-off-by: Emanuele Feliziani <[email protected]>

* Minor accuracy improvement

Signed-off-by: Emanuele Feliziani <[email protected]>

* Rename email strategy for consistency

Signed-off-by: Emanuele Feliziani <[email protected]>

* Add/update test cases

Signed-off-by: Emanuele Feliziani <[email protected]>

* Add performance logging capability

Signed-off-by: Emanuele Feliziani <[email protected]>

* Remove strict/conservative regex distinction

Signed-off-by: Emanuele Feliziani <[email protected]>

* Log when site is disabled remotely

Signed-off-by: Emanuele Feliziani <[email protected]>

* Add the unknown matcher

Signed-off-by: Emanuele Feliziani <[email protected]>

* Update fixed test

Signed-off-by: Emanuele Feliziani <[email protected]>

* Fix comment and unused export

Signed-off-by: Emanuele Feliziani <[email protected]>

* Fix tests

Signed-off-by: Emanuele Feliziani <[email protected]>

* Initial regexes commit

Signed-off-by: Emanuele Feliziani <[email protected]>

* Add missing regexes

Signed-off-by: Emanuele Feliziani <[email protected]>

* Update selectors

Signed-off-by: Emanuele Feliziani <[email protected]>

* Better safeguard for password type

Signed-off-by: Emanuele Feliziani <[email protected]>

* Improve matchers after first tests

Signed-off-by: Emanuele Feliziani <[email protected]>

* Improve submit button identification

Signed-off-by: Emanuele Feliziani <[email protected]>

* Add test case

Signed-off-by: Emanuele Feliziani <[email protected]>

* Mock getComputedStyles to speed up tests

Signed-off-by: Emanuele Feliziani <[email protected]>

* Batch form file reading

Signed-off-by: Emanuele Feliziani <[email protected]>

* Memoize isCCForm

Signed-off-by: Emanuele Feliziani <[email protected]>

* Minimise impact of Array.from

Signed-off-by: Emanuele Feliziani <[email protected]>

* Limit test logs to surface failures

Signed-off-by: Emanuele Feliziani <[email protected]>

* Centralize scan stopping and disconnect observer

Signed-off-by: Emanuele Feliziani <[email protected]>

* Update snapshots and commit compiled files

Signed-off-by: Emanuele Feliziani <[email protected]>

* Commit compiled code

Signed-off-by: Emanuele Feliziani <[email protected]>

* Add performance logging capability

Signed-off-by: Emanuele Feliziani <[email protected]>

# Conflicts:
#	src/Scanner.js

* Commit files

Signed-off-by: Emanuele Feliziani <[email protected]>

* Add comment

Signed-off-by: Emanuele Feliziani <[email protected]>

* Minor change

Signed-off-by: Emanuele Feliziani <[email protected]>

* Improve stopping the scanner

Signed-off-by: Emanuele Feliziani <[email protected]>

* More accuracy improvements after review

Signed-off-by: Emanuele Feliziani <[email protected]>

* Final improvements

Signed-off-by: Emanuele Feliziani <[email protected]>

* Fix test

Signed-off-by: Emanuele Feliziani <[email protected]>

---------

Signed-off-by: Emanuele Feliziani <[email protected]>
  • Loading branch information
GioSensation authored Sep 19, 2023
1 parent 21aa20e commit f3eccad
Show file tree
Hide file tree
Showing 31 changed files with 5,154 additions and 1,065 deletions.
578 changes: 380 additions & 198 deletions dist/autofill-debug.js

Large diffs are not rendered by default.

578 changes: 380 additions & 198 deletions dist/autofill.js

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion integration-test/pages/email-autofill.html
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
<p id="demo"></p>

<div>
Small inputs should be ignored: <input id="emailSmall" type="email" style="width: 30px;">
Small inputs should be ignored: <input id="emailSmall" type="email" style="width: 25px;">
</div>

<div class="dialog">
Expand Down
2 changes: 1 addition & 1 deletion src/DeviceInterface/InterfacePrototype.js
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ class InterfacePrototype {
await this.postInit()

if (this.settings.featureToggles.credentials_saving) {
initFormSubmissionsApi(this.scanner.forms)
initFormSubmissionsApi(this.scanner.forms, this.scanner.matching)
}
}

Expand Down
13 changes: 7 additions & 6 deletions src/DeviceInterface/initFormSubmissionsApi.js
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
import {SUBMIT_BUTTON_SELECTOR} from '../Form/selectors-css.js'
import {buttonMatchesFormType, getText} from '../autofill-utils.js'
import {buttonMatchesFormType, getTextShallow} from '../autofill-utils.js'
import {extractElementStrings} from '../Form/label-util.js'

/**
* This is a single place to contain all functionality relating to form submission detection
*
* @param {Map<HTMLElement, import("../Form/Form").Form>} forms
* @param {import("../Form/matching").Matching} matching
*/
export function initFormSubmissionsApi (forms) {
export function initFormSubmissionsApi (forms, matching) {
/**
* Global submit events
*/
Expand Down Expand Up @@ -44,13 +45,13 @@ export function initFormSubmissionsApi (forms) {
matchingForm?.submitHandler('global pointerdown event + matching form')

if (!matchingForm) {
const selector = SUBMIT_BUTTON_SELECTOR + ', a[href="#"], a[href^=javascript], *[onclick]'
const selector = matching.cssSelector('submitButtonSelector') + ', a[href="#"], a[href^=javascript], *[onclick], [class*=button i]'
// check if the click happened on a button
const button = /** @type HTMLElement */(event.target)?.closest(selector)
if (!button) return

const text = getText(button)
const hasRelevantText = /(log|sign).?(in|up)|continue|next|submit/i.test(text)
const text = getTextShallow(button) || extractElementStrings(button).join(' ')
const hasRelevantText = matching.getDDGMatcherRegex('submitButtonRegex')?.test(text)
if (hasRelevantText && text.length < 25) {
// check if there's a form with values
const filledForm = [...forms.values()].find(form => form.hasValues())
Expand Down
18 changes: 11 additions & 7 deletions src/Form/Form.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import {
isEventWithinDax,
isLikelyASubmitButton,
isPotentiallyViewable, buttonMatchesFormType,
safeExecute, getText, wasAutofilledByChrome, shouldLog
safeExecute, getTextShallow, wasAutofilledByChrome, shouldLog
} from '../autofill-utils.js'

import {getInputSubtype, getInputMainType, createMatching, safeRegex} from './matching.js'
Expand Down Expand Up @@ -219,8 +219,8 @@ class Form {
formValues.credentials.username = formValues.identities.phone
} else {
// If we still don't have a username, try scanning the form's text for an email address
this.form.querySelectorAll('*:not(select):not(option)').forEach((el) => {
const elText = getText(el)
this.form.querySelectorAll(this.matching.cssSelector('safeUniversalSelector')).forEach((el) => {
const elText = getTextShallow(el)
// Ignore long texts to avoid false positives
if (elText.length > 70) return

Expand Down Expand Up @@ -362,11 +362,15 @@ class Form {
}

categorizeInputs () {
const selector = this.matching.cssSelector('FORM_INPUTS_SELECTOR')
const selector = this.matching.cssSelector('formInputsSelector')
if (this.form.matches(selector)) {
this.addInput(this.form)
} else {
const foundInputs = this.form.querySelectorAll(selector)
let foundInputs = this.form.querySelectorAll(selector)
// If the markup is broken form.querySelectorAll may not return the fields, so we select from the parent
if (foundInputs.length === 0 && this.form instanceof HTMLFormElement && this.form.length > 0) {
foundInputs = this.form.parentElement?.querySelectorAll(selector) || foundInputs
}
if (foundInputs.length < MAX_INPUTS_PER_FORM) {
foundInputs.forEach(input => this.addInput(input))
} else {
Expand All @@ -379,12 +383,12 @@ class Form {
}

get submitButtons () {
const selector = this.matching.cssSelector('SUBMIT_BUTTON_SELECTOR')
const selector = this.matching.cssSelector('submitButtonSelector')
const allButtons = /** @type {HTMLElement[]} */([...this.form.querySelectorAll(selector)])

return allButtons
.filter((btn) =>
isPotentiallyViewable(btn) && isLikelyASubmitButton(btn) && buttonMatchesFormType(btn, this)
isPotentiallyViewable(btn) && isLikelyASubmitButton(btn, this.matching) && buttonMatchesFormType(btn, this)
)
}

Expand Down
69 changes: 43 additions & 26 deletions src/Form/FormAnalyzer.js
Original file line number Diff line number Diff line change
@@ -1,16 +1,7 @@
import { removeExcessWhitespace, Matching } from './matching.js'
import { constants } from '../constants.js'
import { matchingConfiguration } from './matching-configuration.js'
import { getText, isLikelyASubmitButton } from '../autofill-utils.js'

const loginRegex = new RegExp(/sign(ing)?.?in(?!g)|log.?(i|o)n|log.?out|unsubscri|(forgot(ten)?|reset) (your )?password|password (forgotten|lost)|unlock|logged in as|mfa-submit-form/i)
const signupRegex = new RegExp(
/sign(ing)?.?up|join|\bregist(er|ration)|newsletter|\bsubscri(be|ption)|contact|create|start|enroll|settings|preferences|profile|update|checkout|guest|purchase|buy|order|schedule|estimate|request|new.?customer|(confirm|retype|repeat) password|password confirm?/i
)
const conservativeSignupRegex = new RegExp(/sign.?up|join|register|enroll|newsletter|subscri(be|ption)|settings|preferences|profile|update/i)
const strictSignupRegex = new RegExp(/sign.?up|join|register|(create|new).+account|enroll|settings|preferences|profile|update/i)
const resetPasswordLink = new RegExp(/(forgot(ten)?|reset|don't remember) (your )?password|password forgotten/i)
const loginProvidersRegex = new RegExp(/ with /i)
import { getTextShallow, isLikelyASubmitButton } from '../autofill-utils.js'

class FormAnalyzer {
/** @type HTMLElement */
Expand Down Expand Up @@ -126,16 +117,16 @@ class FormAnalyzer {
shouldCheckUnifiedForm = false,
shouldBeConservative = false
}) {
const matchesLogin = /current.?password/i.test(string) || loginRegex.test(string) || resetPasswordLink.test(string)
const matchesLogin = /current.?password/i.test(string) || this.matching.getDDGMatcherRegex('loginRegex')?.test(string) || this.matching.getDDGMatcherRegex('resetPasswordLink')?.test(string)

// Check explicitly for unified login/signup forms
if (shouldCheckUnifiedForm && matchesLogin && strictSignupRegex.test(string)) {
if (shouldCheckUnifiedForm && matchesLogin && this.matching.getDDGMatcherRegex('conservativeSignupRegex')?.test(string)) {
this.increaseHybridSignal(strength, signalType)
return this
}

const signupRegexToUse = shouldBeConservative ? conservativeSignupRegex : signupRegex
const matchesSignup = /new.?password/i.test(string) || signupRegexToUse.test(string)
const signupRegexToUse = this.matching.getDDGMatcherRegex(shouldBeConservative ? 'conservativeSignupRegex' : 'signupRegex')
const matchesSignup = /new.?password/i.test(string) || signupRegexToUse?.test(string)

// In some cases a login match means the login is somewhere else, i.e. when a link points outside
if (shouldFlip) {
Expand All @@ -162,6 +153,24 @@ class FormAnalyzer {
})
}

evaluateUrl () {
const path = window.location.pathname

const matchesLogin = this.matching.getDDGMatcherRegex('loginRegex')?.test(path)
const matchesSignup = this.matching.getDDGMatcherRegex('conservativeSignupRegex')?.test(path)

// If the url matches both, do nothing: the signal is probably confounding
if (matchesLogin && matchesSignup) return

if (matchesLogin) {
this.decreaseSignalBy(1, 'url matches login')
}

if (matchesSignup) {
this.increaseSignalBy(1, 'url matches signup')
}
}

evaluatePageTitle () {
const pageTitle = document.title
this.updateSignal({string: pageTitle, strength: 2, signalType: `page title: ${pageTitle}`, shouldCheckUnifiedForm: true})
Expand All @@ -185,7 +194,7 @@ class FormAnalyzer {
this.evaluatePageTitle()
this.evaluatePageHeadings()
// Check for submit buttons
const buttons = document.querySelectorAll(this.matching.cssSelector('SUBMIT_BUTTON_SELECTOR'))
const buttons = document.querySelectorAll(this.matching.cssSelector('submitButtonSelector'))
buttons.forEach(button => {
// if the button has a form, it's not related to our input, because our input has no form here
if (button instanceof HTMLButtonElement) {
Expand All @@ -198,7 +207,7 @@ class FormAnalyzer {
}

evaluateElement (el) {
const string = getText(el)
const string = getTextShallow(el)

if (el.matches(this.matching.cssSelector('password'))) {
// These are explicit signals by the web author, so we weigh them heavily
Expand All @@ -207,12 +216,13 @@ class FormAnalyzer {
strength: 5,
signalType: `explicit: ${el.getAttribute('autocomplete')}`
})
return
}

// check button contents
if (el.matches(this.matching.cssSelector('SUBMIT_BUTTON_SELECTOR'))) {
if (el.matches(this.matching.cssSelector('submitButtonSelector') + ', *[class*=button]')) {
// If we're confident this is the submit button, it's a stronger signal
let likelyASubmit = isLikelyASubmitButton(el)
let likelyASubmit = isLikelyASubmitButton(el, this.matching)
if (likelyASubmit) {
this.form.querySelectorAll('input[type=submit], button[type=submit]').forEach(
(submit) => {
Expand All @@ -225,6 +235,7 @@ class FormAnalyzer {
}
const strength = likelyASubmit ? 20 : 2
this.updateSignal({string, strength, signalType: `submit: ${string}`})
return
}
// if an external link matches one of the regexes, we assume the match is not pertinent to the current form
if (
Expand All @@ -233,13 +244,16 @@ class FormAnalyzer {
el.matches('button[class*=secondary]')
) {
let shouldFlip = true
if (
resetPasswordLink.test(string) || // Don't flip forgotten password links
loginProvidersRegex.test(string) // Don't flip login providers links
) {
let strength = 1
// Don't flip forgotten password links
if (this.matching.getDDGMatcherRegex('resetPasswordLink')?.test(string)) {
shouldFlip = false
strength = 3
} else if (this.matching.getDDGMatcherRegex('loginProvidersRegex')?.test(string)) {
// Don't flip login providers links
shouldFlip = false
}
this.updateSignal({string, strength: 1, signalType: `external link: ${string}`, shouldFlip})
this.updateSignal({string, strength, signalType: `external link: ${string}`, shouldFlip})
} else {
// any other case
// only consider the el if it's a small text to avoid noisy disclaimers
Expand All @@ -250,14 +264,17 @@ class FormAnalyzer {
}

evaluateForm () {
// Check page url
this.evaluateUrl()

// Check page title
this.evaluatePageTitle()

// Check form attributes
this.evaluateElAttributes(this.form)

// Check form contents (skip select and option because they contain too much noise)
this.form.querySelectorAll('*:not(select):not(option):not(script)').forEach(el => {
// Check form contents (noisy elements are skipped with the safeUniversalSelector)
this.form.querySelectorAll(this.matching.cssSelector('safeUniversalSelector')).forEach(el => {
// Check if element is not hidden. Note that we can't use offsetHeight
// nor intersectionObserver, because the element could be outside the
// viewport or its parent hidden
Expand All @@ -266,7 +283,7 @@ class FormAnalyzer {
})

// A form with many fields is unlikely to be a login form
const relevantFields = this.form.querySelectorAll(this.matching.cssSelector('GENERIC_TEXT_FIELD'))
const relevantFields = this.form.querySelectorAll(this.matching.cssSelector('genericTextField'))
if (relevantFields.length >= 4) {
this.increaseSignalBy(relevantFields.length * 1.5, 'many fields: it is probably not a login')
}
Expand Down
4 changes: 2 additions & 2 deletions src/Form/formatters.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ const FOUR_DIGIT_YEAR_REGEX = /(\D)\1{3}|\d{4}/i
* @returns {string}
*/
const formatCCYear = (input, year, form) => {
const selector = form.matching.cssSelector('FORM_INPUTS_SELECTOR')
const selector = form.matching.cssSelector('formInputsSelector')
if (
input.maxLength === 4 ||
checkPlaceholderAndLabels(input, FOUR_DIGIT_YEAR_REGEX, form.form, selector)
Expand All @@ -34,7 +34,7 @@ const formatCCYear = (input, year, form) => {
const getUnifiedExpiryDate = (input, month, year, form) => {
const formattedYear = formatCCYear(input, year, form)
const paddedMonth = `${month}`.padStart(2, '0')
const cssSelector = form.matching.cssSelector('FORM_INPUTS_SELECTOR')
const cssSelector = form.matching.cssSelector('formInputsSelector')
const separator = matchInPlaceholderAndLabels(input, DATE_SEPARATOR_REGEX, form.form, cssSelector)?.groups?.separator || '/'

return `${paddedMonth}${separator}${formattedYear}`
Expand Down
4 changes: 2 additions & 2 deletions src/Form/input-classifiers.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ import { getInputSubtype, createMatching } from './matching.js'
import { Form } from './Form.js'
import InterfacePrototype from '../DeviceInterface/InterfacePrototype.js'

import {SUBMIT_BUTTON_SELECTOR} from './selectors-css.js'
import {createAvailableInputTypes} from '../../integration-test/helpers/utils.js'

/**
Expand Down Expand Up @@ -171,7 +170,8 @@ describe.each(testCases)('Test $html fields', (testCase) => {
baseWrapper.innerHTML = testContent
document.title = title

const buttons = document.querySelectorAll(SUBMIT_BUTTON_SELECTOR)
const matching = createMatching()
const buttons = document.querySelectorAll(matching.cssSelector('submitButtonSelector'))
buttons.forEach((button) => {
// We're doing this so that isPotentiallyViewable(button) === true. See jest.setup.js for more info
// @ts-ignore
Expand Down
6 changes: 3 additions & 3 deletions src/Form/label-util.js
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
import {removeExcessWhitespace} from './matching.js'
const EXCLUDED_TAGS = ['SCRIPT', 'NOSCRIPT', 'OPTION', 'STYLE']
const EXCLUDED_TAGS = ['BR', 'SCRIPT', 'NOSCRIPT', 'OPTION', 'STYLE']

/**
* Extract all strings of an element's children to an array.
* "element.textContent" is a string which is merged of all children nodes,
* which can cause issues with things like script tags etc.
*
* @param {HTMLElement} element
* @param {Element} element
* A DOM element to be extracted.
* @returns {string[]}
* All strings in an element.
Expand Down Expand Up @@ -39,4 +39,4 @@ const extractElementStrings = (element) => {
return [...strings]
}

export {extractElementStrings}
export {extractElementStrings, EXCLUDED_TAGS}
Loading

0 comments on commit f3eccad

Please sign in to comment.