Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name collision detection gives mostly false positives #378

Open
shonfeder opened this issue Oct 3, 2024 · 3 comments
Open

Name collision detection gives mostly false positives #378

shonfeder opened this issue Oct 3, 2024 · 3 comments
Labels

Comments

@shonfeder
Copy link
Contributor

shonfeder commented Oct 3, 2024

The name collision detection lint mostly gives false positives. We should tune it down to be less eager.

As an example (frm ocaml/opam-repository#26658) it is giving

Warning in dblp-api.0.1.1: Possible name collision with package 'bap-api'
Warning in dblp.0.1.1: Possible name collision with package 'rlp'
Warning in dblp.0.1.1: Possible name collision with package 'nlp'
Warning in dblp.0.1.1: Possible name collision with package 'lp'
Warning in dblp.0.1.1: Possible name collision with package 'dmap'
Warning in dblp.0.1.1: Possible name collision with package 'dlm'
Warning in dblp.0.1.1: Possible name collision with package 'dbm'
Warning in dblp.0.1.1: Possible name collision with package 'dbf'
Warning in dblp.0.1.1: Possible name collision with package 'dap'
Warning in dblp.0.1.1: Possible name collision with package 'bap'

Given that the intent is to alert to possible malicious punning, these warnings are totally off the mark.

@punchagan
Copy link
Contributor

There's more context on when and how this functionality was introduced here and here.

Tuning it to make it less eager may be helpful, especially given that these checks run on newly published packages and we want to make the experience friendly for newcomers.

But, also, I wonder if preventing typo name squatting is a case where the cost of a false negative is much higher than some false positives. And we probably want to add a note (for the package authors) to convey that a failing lint for this check is not something to worry about.

@punchagan
Copy link
Contributor

punchagan commented Oct 4, 2024

There's some prior work done on other package archives (like PyPI, npm and Rust's crates) in this [paper=(https://arxiv.org/pdf/2003.03471), and the packages based on / related to it: typogard and typomania.

The paper (and the packages) primarily focus on malicious typo-squatting, and the package repositories are much larger than opam. But, we could adapt the Typosquatting Signals (Sec 3.3) explored in the paper for our use case 1 2. They use a concept of popular (and unpopular) packages for detecting malicious typosquatting, but we probably don't need that for our use case given we aren't doing strictly for malicious typosquatting checks, our repository size and the manual approval process for package addition/updates.

@punchagan punchagan changed the title Name collision detection gives mosly false positives Name collision detection gives mostly false positives Oct 6, 2024
@shonfeder
Copy link
Contributor Author

shonfeder commented Oct 17, 2024

Something like the typo checking seems useful, but I think it is high priority.

It looks like the main use case that motivated the addition of this check was to detect collisions between names that identical modulo [-_].

I propose for the immediate term we disable the levenstein distance check. AFAIK, this has not proven useful in any cases, and it seems to create constant noise.
CC @raphael-proust .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants