-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full 128-bit collision between two files #80
Comments
Required context: https://peter.website/meow-hash-cryptanalysis Thank you for doing this. I'm impressed by the work you put into this, far beyond just producing proof of a break. Thoughts and comments in no particular order: As far as I recall I had instruction patterns without the 1-byte offset with some 60 to 70ish bits of security against patterns like these, but that wasn't 128 bits, so I had to get creative. I thought the offset was an improvement, but my brute force check couldn't run in reasonable time any more. Casey really wanted to hash 16 bytes per clock cycle, and I really wanted the key to do something meaningful, in hindsight that combination was obviously a stretch. I still feel like there is a lot of good ideas in the design. It doesn't seem like the attacks demonstrated are quite as damning as the comparison to other hash functions suggest they should be. I'm somewhat surprised that the known-key attacks took as much work as they did. |
This write-up is awesome! Just fantastic. As the resident non-cryptographer, other than just changing the README to remove the level 3 claim (and linking to the write-up?), are there recommendations for updates as we move toward v1? Before the crypto part, Meow was originally just supposed to be an asset matching tool, so that is still my primary concern, and was wondering if there were things we should be doing to make it less likely that Meow would collide by accident. Of course, if @NoHatCoder still wants to pursue level 3 security, that's fine with me too - but my priority is just to make sure that the hash requires an attacker before it fails, since the use cases I always envisioned for it have no attacker. And generally speaking, although there is a certain attitude sometimes of "well if you're just checking assets, use whatever", in my experience thus far I have found that trivially insecure hashes often do produce collisions by accident under heavy use, so, I think it's still worth it to try to have a fast hash that is "as strong as it can be". Thanks, |
I'd have to think very carefully to give more solid recommendations, but a few off the top of my head:
Overall though, I think aiming for MAC security at this speed will be tricky. It's possible I'm not aware of some advances in the field, but I think it's plausible achieving a 50 GiB/s single-threaded 128-bit MAC is in the realm of a serious research project (non-doctoral). Although I'd have to check how fast the fastest parallel Wegman-Carter style MACs run, perhaps 50 GiB/s is already common. If you do wish to attempt to fix up Meow hash for level 3 security without sacrificing any performance, perhaps you could:
|
It turns out that there aren't a whole lot of other suitable instructions, multiplication is the only real candidate, it won't allow us to do more instructions per clock, but maybe we can once in a while get a multiplication instead of an add. I assume that you are familiar with the pitfalls of multiplications. I won't rule it out completely, but I don't think it is going to save the day. For a lot of utility functions like compression, encryption and hashing it initially seems like a cool idea to build it for multi-core parallelism. In reality it turns out that for most cases the overhead of thread coordination is significant, and the program itself can utilize more cores much more efficiently by partitioning tasks in a different manner. So for hitting some marketing number, yes, for actual useful speed, not so much. Actually, 256 and 512 bit AES instructions are being introduced to X86, so utilizing those could be an option. The reasons not to do so is that wide instructions makes modern X86 processors clock down, so while a program that use a small hash here and there might be faster when hashing, everything else will run at a slower speed. Unless of course if the program is committed to using wide instructions, but it is such an awkward decision to take in a library. And of course it won't work on a lot of old processors, so you have to provide a 128 bit version, but with the large state design that will overflow the registers, so now we need a lot of extra mov instructions to keep it working. Recent benchmarks are hard to come by for most old functions, but I can't find anything suggesting that any of the resident MAC functions are anywhere close to this speed. Wikipedia says "3.1n + 780 Athlon cycles" for Poly1305 (apparently a 2000 Athlon was the best Bernstein could get his hands on in 2005), but if we check out the numbers at https://www.bearssl.org/speed.html there are some way deeper depths of inefficiency that Poly1305 can apparently go to if not carefully hand-optimized. It seems like we could ditch a lot of speed and still be handily faster. Speaking of changing the rules, I have also done some work on the question "What if we had a way better crypto instruction?", the process of first making a primitive, and then designing instructions for it doesn't seem to produce very good instructions. So https://github.com/NoHatCoder/Chaotic-Permutation-Circuit But of course this is ~10 years from mainstream use even if we convinced CPU manufacturers to adopt a new instruction today. I'm not sure if I'm the right person to do this, but nobody else is doing it (or they are doing something slightly wrong in a way that breaks critical properties), and I can't unsee the potential for getting a 2-3x speedup over AES-NI primitives. |
@cmuratori I'd estimate the combined chance that a non-adversarial change would match the attack pattern and actually succeed to produce the collision scenario is below 2^-128, but only just. I don't think it is likely that we will see any big "improvements", and given that 128 bits really is plenty for any level 1 use case I'm not particularly concerned. I think that if we downgrade to level 1 we should leave out the seed, ditch the shifted input, and concentrate a bit more on good ARM performance, if we used an XOR-AES-XOR permutation it could be implemented on both X86 and ARM without wasting the built-in XOR on either platform, toying a bit with that idea at the moment. We need a carefully worded update on the security status. Unlike a level 5 hash broken to 2^45 strength, which we would call trivially exploitable, this attack is inherently online, meaning that an attacker would typically have to send or tamper with 2^45 messages. Depending on exact settings this would be most often not practically feasible. Still I think we should recommend migration for level 3 purposes, especially as we can't exclude the possibility of an improved attack. But we don't need to cause too much panic, and I think it is important to point out that it is still a better and/or faster level 1 hash than pretty much anything else. |
That's a good point, what I was suggesting was a bit naive. Presumably you'd actually want to do something more like what SipHash does, and initialize your lanes something like:
Where you'd want to think carefully about the implications of this symmetry. Also this suggestion is relatively unimportant, because actually weak keys (like the |
Again as the resident non-cryptographer, I can only really add performance commentary:
That said, I do think "parallelizing" across 256 or 512-bit lanes might be the right way to go. AVX 256-bit instructions in particular are terrible, and they hate crossing the two 128-bit lanes. So something that was natively designed to process two independent 128-bit streams would vastly outperform something that wasn't on AVX. AVX-512, on the other hand, is very hard to say. It's extremely janky right now and basically doesn't exist on the desktop (except in laptops). So I think ti's way too early to say what would be a smart design there, although I suspect the same thing holds. So designing a "four wide" input pipe might not be a bad idea, just looking forward. I don't know how this relates to Neon, however.
- Casey |
Did a take on a disclaimer. I have worded it with a higher priority for authentication customers than hash table customers as the hash table case requires some pretty serious timing attacks that are very hard to pull off in the required quantity in the wild. Comments?
|
Seems good to me. |
For the time being, I went ahead and pasted that security warning into the README, just so there would be something up there while we work out what to do going forward. - Casey |
Maybe include a link to the analysis, seems weird not to have that. |
Done. - Casey |
The "about" section still states that its a level 3. Maybe change that too for the time being to not be confusing. |
Roger that - should be updated now? - Casey |
Looks good to me :) |
Hello,
Your web page for Meow hash says to report collisions on GitHub. I have two PNG files that
meow_search
reports as colliding:The files are here:
Thanks.
The text was updated successfully, but these errors were encountered: