Allow querying for a partial number #263

MagiX13 · 2025-01-31T12:59:45Z

The feature

Hello,

I discovered SpamBlocker today and via that also Phoneblock. I wish I had discovered both at an earlier time to save me some sanity and time.

With the Phoneblock integration, I noticed that the full phone number is sent to the service. This comes with some privacy implications of the caller (and some potentially associated GDPR issues).

Would it be possible to allow for a partial request and then filter locally (see for example the have i been pwned password check implementation)?

I could imagine that passing {domestic:X} to only pass along the first X digits of the domestic number and then filtering for the full {domestic} in the following ParseQueryResult call would be a reasonable approach. That way, the full number would not be passed to the respective service - preserving the privacy of the caller - while still being able to block spammers.

I have also opened an issue on the PhoneBlock side to ask for querying with a prefix: haumacher/phoneblock#139

The text was updated successfully, but these errors were encountered:

aj3423 · 2025-01-31T16:10:56Z

Interesting, thank you for the suggestion.

... passing {domestic:X} to only pass along the first X digits of the domestic number ...

There will be a problem if we only use the number prefix. Spammers usually "get" their numbers in a same number range (same prefix). For example, if you search 123456***, the server will probably return 1000 numbers, that'll be less performant and bandwith consuming(for both server and user).

From your link, they use the prefix of the SHA1 hash, and

hashes are fairly uniformally distributed

I think using hash-prefix would be better than number-prefix. And since the phone numbers and passwords have similarities that they are both about 8~12 characters, that algorithm should also work for us.

I just come up with another solution, unlike the passwords, in our particular case, there're only numbers, we can use both prefix + suffix. For example, the number 1234567890, we query 123*890. I don't think there will be too many numbers have same prefix and suffix, and maybe this is easier for the server to implement. But there is a problem with short numbers.

So I think the best soultion is hash-prefix.

To support this, I just need to add some new tags, something like: {k_anonymity({domestic})} or {sha1_prefix_5({domestic})}. And I'd expect the API to return something like:

{
  "hash1": { "votes": 10, "rating": "C_POLL", ... },
  "hash2": { "votes": 20, "rating": "A_LEGITIMATE", ... },
}

It can be parsed using JsonPath in the ParseQueryResult.
I'll add this when the upstream API is ready.

haumacher · 2025-01-31T16:32:58Z

I also prefer hashes over prefixes, DB load and bandwith is not acceptable for prefix search.

But from PhoneBlock's perspective, measuring SPAM number activity is essential. Therefore, I'll only offer an API for querying with full hashes. From those, it is sufficiently hard to guess the number, if it is not yet in the DB. But it is possible to measure activity if the number is already listed (which means that there is a SPAM suspicion against that number).

aj3423 · 2025-01-31T17:30:28Z

But from PhoneBlock's perspective, measuring SPAM number activity is essential. Therefore, I'll only offer an API for querying with full hashes.

Got it, I agree.

it is sufficiently hard to guess the number, if it is not yet in the DB.

Maybe we can improve the sufficiently hard to almost impossible by hashing it more times. I mean, they hash the password only once because they send it partially. If we send the entire hash, we should hash it like 100 times, many websites or hash databases can easily solve single-round hashes, but not 100-round hashes.

haumacher · 2025-01-31T18:14:52Z

Multiple hash rounds and algorithms requiring massive amount of memory, e.g. ARGON2 - this is a good choice when storing password hashes. But in my opinion this is overkill for phone numbers. PhoneBlock was designed to serve 50000 users from a RaspberryPI - even if it's no longer running on my desk, I'm not willing to waste resources for almost no benefit.

SHA1 is a good choice - resource efficient and does the job for providing some privacy assuming no maliciousness.

aj3423 · 2025-01-31T19:31:04Z

Maybe 3 rounds would surfice, the difference is negligible between 3 and 100 rounds but significant between 1 and 3, and it won't impact the performance.

Just disscussing the possibilities here, I'm fine with single round SHA1.

haumacher · 2025-01-31T19:53:50Z

I think there is no benefit with a small number of rounds, but it makes the required hash computation harder to describe and implement, since it is non-standard then. This hash computation must be implemented by all API users.

I've got a test version up and running: https://phoneblock.net/pb-test/api/

The new API is /check and there is also a /hash API for testing and debugging the hashing algorithm.

aj3423 · 2025-01-31T20:29:17Z

I think there is no benefit with a small number of rounds, but it makes the required hash computation harder to describe and implement, since it is non-standard then. This hash computation must be implemented by all API users.

I agree, actually, that + you added before the number is awesome, reducing the chance of number recovery.

I've got a test version up and running: https://phoneblock.net/pb-test/api/

That was quick... wasn't expecting that, I'll test it tomorrow.

MagiX13 · 2025-01-31T22:39:51Z

That was indeed super quick! Many thanks.

On the activity info: I somewhat understand and hope at some point the collection/project gets big enough that this will no longer be needed. In the end, some users will never be able to make use of such an approach (e.g. Fritzbox/ab) but those that rely on other tools could benefit.

On the DB load: The full blocklist is not that large is it? I downloaded it from /api/blocklist today and it had ~20k values and is just around 1MB. This could easily be kept in memory and updated with all (relevant, e.g. new additions/votes/removals) incoming requests so that the DB is only queried once on restart of the application and kept in sync with all further updates, or even caching upon each /api/blocklist call might be an option. Even if the project grew significantly over night, the memory footprint of the full blocklist wouldn't be that large/grow that fast.

If this is stored in memory, the query time for the /api/blocklist, the /api/check or even the prefixed call would be super quick as it wouldn't need to query the database itself... I played around on a Raspberry Pi 4 and got ~150k individual items/attempts per second when querying a sparsely populated python dict with 1M key-values. I think that sort of performance would suffice for quite some time.

On the bandwidth: you could require a minimum prefix length to be requested. I took a quick look and with 4 (phone number) digits around 25% of all requests would give just one result and for 5 digits this would be 40%. With hash-prefixes, requiring just the first three digits should have ~5 phone numbers on average, with 4 you would already have unique/no results on average.

aj3423 · 2025-02-01T08:12:02Z

It's done in the action build: https://github.com/aj3423/SpamBlocker/actions/runs/13086130495
The preset API PhoneBlock uses sha1 hash by default.

@haumacher The test API works fine, the action apk would work when the production API is ready.

with 4 (phone number) digits around 25% of all requests would give just one result and for 5 digits this would be 40%.

The average result for the number-prefix is pointless, they are not uniformally distributed, usually crowded with same prefix. Not sure if it's a DDoS vunerability, one can "report" lots of fake numbers with same prefix(with long comments), then do a massive query.

MagiX13 · 2025-02-01T09:42:55Z

The average result for the number-prefix is pointless, they are not uniformally distributed, usually crowded with same prefix.

Hashes make whatever non-random distribution phone numbers have become basically uniformly distributed.
So let's take a look at the maximum results for different hash prefix length. For a hash prefix of the database, the maximum number of phone numbers with a 5 character hash prefix is 3, for 4 characters it is 5 and for 3 characters it's 14. Even looking at medians of those hashes that I get from the database (for longer prefix length it will be super sparse, so actual medians are probably 0, 0 and 5) gives just 1, 1 and 5 respectively.

I however understand that the activity information is more relevant for now and will shut up 😄

aj3423 · 2025-02-01T12:36:44Z

Hashes make whatever non-random distribution phone numbers have become basically uniformly distributed.

It will also have the DDoS issue. For their password solution, they only allow querying, people can't commit new password. In our case, we allow reporting new numbers, one can report lots of numbers that have same hash prefix, such numbers can be easily generated with a python script:

import hashlib

prefix = "abcde"
number = 1000000000
while True:
    s = str(number)
    hash_value = hashlib.sha1(s.encode()).hexdigest()

    number += 1
    if hash_value.startswith(prefix):
        print(number, hash_value)

It generates 1 number per second with only 1 CPU core.

The full hash solution seems to be our best bet.

I however understand that the activity information is more relevant for now and will shut up 😄

I also forgot about that, I'll also shut up 😄

haumacher · 2025-02-02T17:51:44Z

The hash-lookup change is live.

For the SpamBlocker-PhoneBlock integration, I've got another suggestion, to make things safer and easier:

I now allow to generate API-Keys from the PhoneBlock settings page.. These API-Keys can be used for API calls as Bearer-Tokens. This does not require to enter the PhoneBlock user name and password to other apps and prevents transmitting theses credentials in HTTP basic auth requests.

For SpamBlocker, using an API-Key instead of the username/password combination makes also the setup process easier, since only a single information must be copied from the website to the app. Please consider updating your setup helper to request an API-Key instead of a username/password combination.

aj3423 · 2025-02-02T18:02:02Z

@haumacher Glad you've made that change, tomorrow I'll apply it to the PhoneBlock preset.

aj3423 · 2025-02-03T12:04:06Z

@haumacher Done, now it uses API Key instead of username/password. https://github.com/aj3423/SpamBlocker/actions/runs/13112930951

MagiX13 added the new feature label Jan 31, 2025

aj3423 closed this as completed Feb 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow querying for a partial number #263

Allow querying for a partial number #263

MagiX13 commented Jan 31, 2025

aj3423 commented Jan 31, 2025

haumacher commented Jan 31, 2025

aj3423 commented Jan 31, 2025 •

edited

Loading

haumacher commented Jan 31, 2025

aj3423 commented Jan 31, 2025

haumacher commented Jan 31, 2025

aj3423 commented Jan 31, 2025

MagiX13 commented Jan 31, 2025

aj3423 commented Feb 1, 2025

MagiX13 commented Feb 1, 2025

aj3423 commented Feb 1, 2025

haumacher commented Feb 2, 2025

aj3423 commented Feb 2, 2025

aj3423 commented Feb 3, 2025

Allow querying for a partial number #263

Allow querying for a partial number #263

Comments

MagiX13 commented Jan 31, 2025

The feature

aj3423 commented Jan 31, 2025

haumacher commented Jan 31, 2025

aj3423 commented Jan 31, 2025 • edited Loading

haumacher commented Jan 31, 2025

aj3423 commented Jan 31, 2025

haumacher commented Jan 31, 2025

aj3423 commented Jan 31, 2025

MagiX13 commented Jan 31, 2025

aj3423 commented Feb 1, 2025

MagiX13 commented Feb 1, 2025

aj3423 commented Feb 1, 2025

haumacher commented Feb 2, 2025

aj3423 commented Feb 2, 2025

aj3423 commented Feb 3, 2025

aj3423 commented Jan 31, 2025 •

edited

Loading