-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow querying for a partial number #263
Comments
Interesting, thank you for the suggestion.
There will be a problem if we only use the number prefix. Spammers usually "get" their numbers in a same number range (same prefix). For example, if you search 123456***, the server will probably return 1000 numbers, that'll be less performant and bandwith consuming(for both server and user). From your link, they use the prefix of the SHA1 hash, and
I think using hash-prefix would be better than number-prefix. And since the phone numbers and passwords have similarities that they are both about 8~12 characters, that algorithm should also work for us. I just come up with another solution, unlike the passwords, in our particular case, there're only numbers, we can use both prefix + suffix. For example, the number 1234567890, we query 123*890. I don't think there will be too many numbers have same prefix and suffix, and maybe this is easier for the server to implement. But there is a problem with short numbers. So I think the best soultion is hash-prefix. To support this, I just need to add some new tags, something like: {
"hash1": { "votes": 10, "rating": "C_POLL", ... },
"hash2": { "votes": 20, "rating": "A_LEGITIMATE", ... },
} It can be parsed using JsonPath in the ParseQueryResult. |
I also prefer hashes over prefixes, DB load and bandwith is not acceptable for prefix search. But from PhoneBlock's perspective, measuring SPAM number activity is essential. Therefore, I'll only offer an API for querying with full hashes. From those, it is sufficiently hard to guess the number, if it is not yet in the DB. But it is possible to measure activity if the number is already listed (which means that there is a SPAM suspicion against that number). |
Got it, I agree.
Maybe we can improve the sufficiently hard to almost impossible by hashing it more times. I mean, they hash the password only once because they send it partially. If we send the entire hash, we should hash it like 100 times, many websites or hash databases can easily solve single-round hashes, but not 100-round hashes. |
Multiple hash rounds and algorithms requiring massive amount of memory, e.g. ARGON2 - this is a good choice when storing password hashes. But in my opinion this is overkill for phone numbers. PhoneBlock was designed to serve 50000 users from a RaspberryPI - even if it's no longer running on my desk, I'm not willing to waste resources for almost no benefit. SHA1 is a good choice - resource efficient and does the job for providing some privacy assuming no maliciousness. |
Maybe 3 rounds would surfice, the difference is negligible between 3 and 100 rounds but significant between 1 and 3, and it won't impact the performance. Just disscussing the possibilities here, I'm fine with single round SHA1. |
I think there is no benefit with a small number of rounds, but it makes the required hash computation harder to describe and implement, since it is non-standard then. This hash computation must be implemented by all API users. I've got a test version up and running: https://phoneblock.net/pb-test/api/ The new API is |
I agree, actually, that + you added before the number is awesome, reducing the chance of number recovery.
That was quick... wasn't expecting that, I'll test it tomorrow. |
That was indeed super quick! Many thanks. On the activity info: I somewhat understand and hope at some point the collection/project gets big enough that this will no longer be needed. In the end, some users will never be able to make use of such an approach (e.g. Fritzbox/ab) but those that rely on other tools could benefit. On the DB load: The full blocklist is not that large is it? I downloaded it from If this is stored in memory, the query time for the On the bandwidth: you could require a minimum prefix length to be requested. I took a quick look and with 4 (phone number) digits around 25% of all requests would give just one result and for 5 digits this would be 40%. With hash-prefixes, requiring just the first three digits should have ~5 phone numbers on average, with 4 you would already have unique/no results on average. |
It's done in the action build: https://github.com/aj3423/SpamBlocker/actions/runs/13086130495 @haumacher The test API works fine, the action apk would work when the production API is ready.
The average result for the number-prefix is pointless, they are not uniformally distributed, usually crowded with same prefix. Not sure if it's a DDoS vunerability, one can "report" lots of fake numbers with same prefix(with long comments), then do a massive query. |
Hashes make whatever non-random distribution phone numbers have become basically uniformly distributed. I however understand that the activity information is more relevant for now and will shut up 😄 |
It will also have the DDoS issue. For their password solution, they only allow querying, people can't commit new password. In our case, we allow reporting new numbers, one can report lots of numbers that have same hash prefix, such numbers can be easily generated with a python script: import hashlib
prefix = "abcde"
number = 1000000000
while True:
s = str(number)
hash_value = hashlib.sha1(s.encode()).hexdigest()
number += 1
if hash_value.startswith(prefix):
print(number, hash_value) It generates 1 number per second with only 1 CPU core. The full hash solution seems to be our best bet.
I also forgot about that, I'll also shut up 😄 |
The hash-lookup change is live. For the SpamBlocker-PhoneBlock integration, I've got another suggestion, to make things safer and easier: I now allow to generate API-Keys from the PhoneBlock settings page.. These API-Keys can be used for API calls as Bearer-Tokens. This does not require to enter the PhoneBlock user name and password to other apps and prevents transmitting theses credentials in HTTP basic auth requests. For SpamBlocker, using an API-Key instead of the username/password combination makes also the setup process easier, since only a single information must be copied from the website to the app. Please consider updating your setup helper to request an API-Key instead of a username/password combination. |
@haumacher Glad you've made that change, tomorrow I'll apply it to the PhoneBlock preset. |
@haumacher Done, now it uses API Key instead of username/password. https://github.com/aj3423/SpamBlocker/actions/runs/13112930951 |
The feature
Hello,
I discovered SpamBlocker today and via that also Phoneblock. I wish I had discovered both at an earlier time to save me some sanity and time.
With the Phoneblock integration, I noticed that the full phone number is sent to the service. This comes with some privacy implications of the caller (and some potentially associated GDPR issues).
Would it be possible to allow for a partial request and then filter locally (see for example the have i been pwned password check implementation)?
I could imagine that passing
{domestic:X}
to only pass along the firstX
digits of the domestic number and then filtering for the full{domestic}
in the following ParseQueryResult call would be a reasonable approach. That way, the full number would not be passed to the respective service - preserving the privacy of the caller - while still being able to block spammers.I have also opened an issue on the PhoneBlock side to ask for querying with a prefix: haumacher/phoneblock#139
The text was updated successfully, but these errors were encountered: