Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement crc32 #14

Open
4 of 7 tasks
folkertdev opened this issue Feb 1, 2024 · 8 comments
Open
4 of 7 tasks

implement crc32 #14

folkertdev opened this issue Feb 1, 2024 · 8 comments

Comments

@folkertdev
Copy link
Collaborator

folkertdev commented Feb 1, 2024

  • a basic implementation that is obviously correct to test against
  • an efficient rust version
  • move the std::arch::x86_64::_mm_crc32_u32 intrinsic into the crc32 module, make it platform-independent (/src/deflate/hash_calc.rs).

Then of course we also need simd-accelerated versions. zlib-ng has several in the /arch folder. In rust there is https://docs.rs/crc32fast/latest/crc32fast/. I don't think we want that as a dependency but it may provide inspiration.

  • SSE
  • AVX2
  • AVX512
  • Neon
@ahomescu
Copy link
Contributor

ahomescu commented Feb 7, 2024

I started prototyping this, have an MVP for the first item on the list but have some quick comments.

First, after reading the docs for _mm_crc32_u32 it looks to me like it implements CRC32-C which is a different polynomial than CRC32 that zlib uses. crc32fast actually uses pclmulqdq which is slightly more complicated. Do we need CRC-32C?

Second, how do you feel about quickcheck for testing? It would be pretty useful here.

Third, afaict adler32_rust is a manually unrolled version of naive_adler32. Did you have that in mind for crc32, or something more advanced (e.g. a larger table that processes 2-4 bytes at a time)?

ahomescu added a commit to immunant/zlib-rs that referenced this issue Feb 7, 2024
Add a naive implementation of CRC-32 based
on the C code from the PNG specification.
ahomescu added a commit to immunant/zlib-rs that referenced this issue Feb 7, 2024
Add a naive implementation of CRC-32 based
on the C code from the PNG specification.
@folkertdev
Copy link
Collaborator Author

Cool that you're picking this up!


The mm_crc32_u32 function is used not for gzip, but for the hash chaining in zlib-ng on systems that support it. I suspect the particulars of what polynomial is used don't particularly matter here, you really just want a fast decent hash.

My proposal would be to just use mm_crc32_u32 as an intrinsic. The fallback is

#define HASH_CALC(s, h, val) h = ((val * 2654435761U) >> HASH_SLIDE);

which does look different to the intel docs for _mm_crc32_u32. But that does not really matter so long as the same function is used consistently. I was under the impression that the fallback would have the same behavior as _mm_crc32_u32 but that seems to not be the case.


We're very mindful of dependencies for the final crate/dynamic library, but dev-dependencies should be OK.

I haven't really used quickcheck (in rust, I have in Haskell) but let's give it a go!


I think it makes sense to look at the zlib-ng codebase next to see how they implement crc32. We can be reasonably confident they have one of the fastest implementations out there.

Navigating this code is not awesome, there is lots of macro expansion that makes it hard to understand what is happening. The generic crc32 implementation is here

I think that is a good jumping-off point. You'll need some other things like the huge tables that crc32 uses, they are in that repository as well. Also it's probably a good thing to look at what architecture-specific features are implemented (in particular the type signatures).

@thedataking
Copy link

We're very mindful of dependencies for the final crate/dynamic library

I don't think we want that as a dependency but it may provide inspiration.

Very sorry if this is a dumb question. Andrei and I (Per) were taking last night and we weren't sure what your concerns with crc32fast were. It seems widely used (including in Android) and is permissively licensed so it must be something else we're not aware of.

ahomescu added a commit to immunant/zlib-rs that referenced this issue Feb 8, 2024
Add a naive implementation of CRC-32 based
on the C code from the PNG specification.
@folkertdev
Copy link
Collaborator Author

No that is a very reasonable question. Many things are coming together here

  • from sudo-rs and ntpd-rs we know that having fewer dependencies makes the distribution process easier.
  • we believe we can have zero dependencies for this project (in a release build)
  • there is the risk of supply chain attack. I'm personally not too worried about that actually causing issues in this case, but we make that structurally impossible by having no dependencies
  • fewer dependencies -> faster builds

In this particular case, I think crc32fast also does not do exactly what we need. What I've noticed so far without diving deep into the exact implementation details:

  • no fused crc32/memcpy: the adler and crc implementations can first add a simd value into the checksum calculation and then copy it to a destination. this is configured with the COPY constant https://github.com/zlib-ng/zlib-ng/blob/develop/arch/x86/crc32_fold_pclmulqdq_tpl.h#L20
  • limited support, most relevantly that there is no avx512 version. We don't consider these targets in current milestones, but zlib-ng also supports powerpc and s390)

So, I'd be totally happy to use crc32fast if we'd just be creating a simple checksum somewhere, but crc is integral to what zlib does and it's used in some custom ways, so I think be in control ourselves.

Separately I'm a big fan of upstreaming features we do add.

does that make sense?

@thedataking
Copy link

thedataking commented Feb 12, 2024

does that make sense?

yes. Thank you for explaining the reasoning. I agree that supply chain attacks is something that must be prevented. However, as we've discussed, vendoring crates or relying on crate versions that have been vetted might be worth considering as a general approach. Google has open sourced their crate audits and it includes crc32fast version 1.3.2 [0]

Of course that doesn't address your concern that the crate doesn't do everything needed so I am not arguing that for this particular case, using an existing crate is optimal. Putting that aside, how do you think about a general policy of "have as few dependencies as a general rule but reuse functionality available from mature and audited crates"?

[0] https://github.com/google/rust-crate-audits/blob/main/manual-sources/google3-audits.toml

ahomescu added a commit to immunant/zlib-rs that referenced this issue Feb 13, 2024
Add a naive implementation of CRC-32 based
on the C code from the PNG specification.
Add an optimized word-size implementation
based on Kadatch and Jenkins.
Add optimized interleaved implementation from
section 4.11 of the same paper.
ahomescu added a commit to immunant/zlib-rs that referenced this issue Feb 13, 2024
Add a naive implementation of CRC-32 based
on the C code from the PNG specification.
Add an optimized word-size implementation
based on Kadatch and Jenkins.
Add optimized interleaved implementation from
section 4.11 of the same paper.
@folkertdev
Copy link
Collaborator Author

I think my personal taste is to be quite hardcore about owning the logic that is core to your project. You should know how it works, and it's likely that you can be more efficient than a generic solution when you have a specific problem.

As an example, nptd-rs uses tokio and rustls, massive dependencies and not something we'd just write ourselves obviously. But for the parsing and configuring of sockets and the clock, that is all our own lowlevel code that builds on libc and the standard library. We did use nix at some point but we knew we could do a better job by doing it ourselves.

ntpd-rs also used clap for command line argument parsing. I think clap is fine, it is in widespread use and I trust its development team. But, it's a very large dependency, that for us did more than we needed. It also releases breaking changes quite frequently, so it is disliked by the debian/fedora package maintainers. It's big, so we didn't really want to vendor/maintain it, so we just ripped it out. The code is nicer now, it builds that little bit faster, the binary is that little bit smaller, and we're more in control of the code we actually ship.

Of course you have to be pragmatic, but given how easy it is to accumulate dependencies, I think being skeptical of adding one is the right baseline.

btw for what dependencies we use we mostly go by popularity in the rust ecosystem (clap, tokio, rustls are all widely used and actively maintained), occasionally informed by packaging concerns (this can also be a reason not to upgrade a dependency). So far we've not had reasons to stick to a particular list of audited libraries, but that seems like a decent approach that saves you quite a bit of work.

ahomescu added a commit to immunant/zlib-rs that referenced this issue Feb 14, 2024
Add a naive implementation of CRC-32 based
on the C code from the PNG specification.
Add an optimized word-size implementation
based on Kadatch and Jenkins.
Add optimized interleaved implementation from
section 4.11 of the same paper.
ahomescu added a commit to immunant/zlib-rs that referenced this issue Feb 14, 2024
Add a naive implementation of CRC-32 based
on the C code from the PNG specification.
Add an optimized word-size implementation
based on Kadatch and Jenkins.
Add optimized interleaved implementation from
section 4.11 of the same paper.
ahomescu added a commit to immunant/zlib-rs that referenced this issue Feb 14, 2024
Add a naive implementation of CRC-32 based
on the C code from the PNG specification.
Add an optimized word-size implementation
based on Kadatch and Jenkins.
Add optimized interleaved implementation from
section 4.11 of the same paper.
ahomescu added a commit to immunant/zlib-rs that referenced this issue Feb 15, 2024
Add a naive implementation of CRC-32 based
on the C code from the PNG specification.
Add an optimized word-size implementation
based on Kadatch and Jenkins.
Add optimized interleaved implementation from
section 4.11 of the same paper.
folkertdev pushed a commit that referenced this issue Feb 16, 2024
Add a naive implementation of CRC-32 based
on the C code from the PNG specification.
Add an optimized word-size implementation
based on Kadatch and Jenkins.
Add optimized interleaved implementation from
section 4.11 of the same paper.
folkertdev pushed a commit that referenced this issue Feb 16, 2024
Add a naive implementation of CRC-32 based
on the C code from the PNG specification.
Add an optimized word-size implementation
based on Kadatch and Jenkins.
Add optimized interleaved implementation from
section 4.11 of the same paper.
@brian-pane
Copy link

brian-pane commented Dec 17, 2024

Is anyone still working on a pure-Rust crc32 implementation? If not, I'm willing to try creating two implementations: a simple, slow version as a baseline, and then a fast version inspired by the one in zlib-ng.

And should it be CRC32 or CRC32-C? It seems like the current implementation in hash_calc.rs does CRC32-C on x86 and CRC32 on ARM, although the ARM docs list both CRC32 and CRC32-C instructions available.

@folkertdev
Copy link
Collaborator Author

we have a pure-rust implementation, that has the expected simd support on the targets we care most about. The implementation lives here. However, if you believe you can make it faster, or support more platforms, by all means have a got at that.

For the logic in hash_calc.rs, we simply follow the behavior of zlib-ng, but for arm/aarch64 we haven't done sophisticated benchmarking for which is faster. I believe zlib-ng actually fell back to using a software implementation of crc32 because using the instruction is faster, but combined with the branching logic for whether to use the fallback or the instruction, the instruction is overall slower (for them, we should absolutely verify that for ourselves).

We've been working on proper benchmarking (on dedicated hardware (apple m2 and a recent amd), with graphs etc), so we'll be in a much better shape data-wise to evaluate the various options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants