Name collision detection gives mostly false positives #378

shonfeder · 2024-10-03T11:42:29Z

The name collision detection lint mostly gives false positives. We should tune it down to be less eager.

As an example (frm ocaml/opam-repository#26658) it is giving

Warning in dblp-api.0.1.1: Possible name collision with package 'bap-api'
Warning in dblp.0.1.1: Possible name collision with package 'rlp'
Warning in dblp.0.1.1: Possible name collision with package 'nlp'
Warning in dblp.0.1.1: Possible name collision with package 'lp'
Warning in dblp.0.1.1: Possible name collision with package 'dmap'
Warning in dblp.0.1.1: Possible name collision with package 'dlm'
Warning in dblp.0.1.1: Possible name collision with package 'dbm'
Warning in dblp.0.1.1: Possible name collision with package 'dbf'
Warning in dblp.0.1.1: Possible name collision with package 'dap'
Warning in dblp.0.1.1: Possible name collision with package 'bap'

Given that the intent is to alert to possible malicious punning, these warnings are totally off the mark.

The text was updated successfully, but these errors were encountered:

punchagan · 2024-10-03T13:59:43Z

There's more context on when and how this functionality was introduced here and here.

Tuning it to make it less eager may be helpful, especially given that these checks run on newly published packages and we want to make the experience friendly for newcomers.

But, also, I wonder if preventing typo name squatting is a case where the cost of a false negative is much higher than some false positives. And we probably want to add a note (for the package authors) to convey that a failing lint for this check is not something to worry about.

punchagan · 2024-10-04T07:16:07Z

There's some prior work done on other package archives (like PyPI, npm and Rust's crates) in this [paper=(https://arxiv.org/pdf/2003.03471), and the packages based on / related to it: typogard and typomania.

The paper (and the packages) primarily focus on malicious typo-squatting, and the package repositories are much larger than opam. But, we could adapt the Typosquatting Signals (Sec 3.3) explored in the paper for our use case 1 2. They use a concept of popular (and unpopular) packages for detecting malicious typosquatting, but we probably don't need that for our use case given we aren't doing strictly for malicious typosquatting checks, our repository size and the manual approval process for package addition/updates.

shonfeder · 2024-10-17T01:03:18Z

Something like the typo checking seems useful, but I think it is high priority.

It looks like the main use case that motivated the addition of this check was to detect collisions between names that identical modulo [-_].

I propose for the immediate term we disable the levenstein distance check. AFAIK, this has not proven useful in any cases, and it seems to create constant noise.
CC @raphael-proust .

shonfeder mentioned this issue Oct 3, 2024

2 packages from smimram/ocaml-dblp at 0.1.1 ocaml/opam-repository#26658

Merged

punchagan changed the title ~~Name collision detection gives mosly false positives~~ Name collision detection gives mostly false positives Oct 6, 2024

punchagan added the linting label Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Name collision detection gives mostly false positives #378

Name collision detection gives mostly false positives #378

shonfeder commented Oct 3, 2024 •

edited

Loading

punchagan commented Oct 3, 2024

punchagan commented Oct 4, 2024 •

edited

Loading

shonfeder commented Oct 17, 2024 •

edited

Loading

Name collision detection gives mostly false positives #378

Name collision detection gives mostly false positives #378

Comments

shonfeder commented Oct 3, 2024 • edited Loading

punchagan commented Oct 3, 2024

punchagan commented Oct 4, 2024 • edited Loading

shonfeder commented Oct 17, 2024 • edited Loading

shonfeder commented Oct 3, 2024 •

edited

Loading

punchagan commented Oct 4, 2024 •

edited

Loading

shonfeder commented Oct 17, 2024 •

edited

Loading