Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement std::hash::Hasher trait on Digest #9

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rpmoore
Copy link

@rpmoore rpmoore commented Nov 12, 2023

This implements the std::hash::Hasher trait on Digest which allows users of the library to use crc64fast in places where std::hash::Hasher is used. While users of this can use the Newtype pattern and implement this themselves, by implementing this trait it makes it more accessible to users of the crate.

@kennytm
Copy link
Collaborator

kennytm commented Jan 18, 2024

Hi. Before we merge or close this can you provide where is that std::hash::Hasher used? AFAIK cryptographic hash algorithms don't implement std::hash::Hasher but rather digest::Digest.

The main purpose of choosing to use CRC is to be compatible with existing protocols / file format that already uses this for calculating checksum.

OTOH, the Hasher trait is mainly used for HashMap/HashSet. The main merit of a hash function used in hash table is how well it can quickly distinguish different keys within the table's lifetime, rather than producing a stable result for a given input.

I don't think CRC makes a good hash table algorithm

  • in terms of security, it is very simple to produce colliding keys
  • in terms of speed, I don't think it can compete with FNV (fxhash) or ahash etc
  • additionally, keys used for hash tables are often pretty short (so that hash calculation can still be treated as an O(1) operation), but crc64fast is designed for processing long input strings.

(Additionally our reference implementation crc v3 did not implement Hasher either. crc32fast did implement Hasher since the beginning but a rationale was not given.)

@rpmoore
Copy link
Author

rpmoore commented Jan 31, 2024

I'm using the trait to checksum files (or portions of files) to detect bit rot or mutations without using a cryptographically secure hash for speed purposes. I was following what crc32fast had done as it made being able to choose between either of them easier without having to wrap both in my own type. Reading the Hasher trait docs seem to imply it being for generic hashing of byte streams, so I didn't think it improper to implement, but if it's intended usage is only for Hash Tables then I agree CRC is not a good choice for Hash Tables. If not the Hasher trait though, is there a better trait to represent checksumming? A quick search doesn't surface anything that's an obvious choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants