Skip to content

Commit

Permalink
no_std, std-compatible type aliases, docs (#35)
Browse files Browse the repository at this point in the history
* Added `Clone` and `Debug` traits to `GxHasher` and `GxBuildHasher`.

* Updated deps.

* Made clippy (almost) happy.

* rustfmt.

* Added no_std compatibility, removed `Gx` prefix from type aliases, `cargo rdme` support, README spelling, grammar, etc.

* `cargo rdme` options

* Added cargo features section in docs.

* Fixed tests and doc tests.

* Missing line-breaks.

---------

Co-authored-by: Olivier Giniaux <[email protected]>
  • Loading branch information
virtualritz and ogxd authored Dec 25, 2023
1 parent e67b860 commit 49a85c0
Show file tree
Hide file tree
Showing 14 changed files with 456 additions and 185 deletions.
4 changes: 4 additions & 0 deletions .cargo-rdme.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
heading-base-level = 0

[intralinks]
strip-links = true
20 changes: 11 additions & 9 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ categories = ["algorithms", "data-structures", "no-std"]
exclude = ["article/*"]

[features]
default = ["std"]
# The 256-bit state GxHash is faster for large inputs than the default 128-bit state implementation, but faster on smaller hashes.
# Please not however that the 256-bit GxHash and the 128-bit GxHash don't generate the same hashes for a same input.
# Requires AVX2 and VAES (X86).
Expand All @@ -21,23 +22,24 @@ avx2 = []
bench-csv = []
bench-md = []
bench-plot = []
std = []

[dependencies]
rand = "0.8"

[dev-dependencies]
rstest = "0.18.2"
lazy_static = { version = "1.4" }
# Benchmarks
criterion = { version = "0.5.1" }
# Other hash algorithms, for comparison.
ahash = "0.8.6"
t1ha = "0.1.0"
twox-hash = "1.6.3"
# Benchmarks
criterion = { version = "0.5.1" }
fnv = "1.0.3"
highway = "1.1.0"
seahash = "4.1.0"
lazy_static = { version = "1.4" }
metrohash = "1.0.6"
fnv = "1.0.3"
rstest = "0.18.2"
seahash = "4.1.0"
t1ha = "0.1.0"
twox-hash = "1.6.3"

[dev-dependencies.plotters]
version = "0.3.5"
Expand All @@ -58,4 +60,4 @@ harness = false

[[bench]]
name = "hashset"
harness = false
harness = false
173 changes: 123 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,54 +1,130 @@
# GxHash

[![Build & Test](https://github.com/ogxd/gxhash/actions/workflows/build_test.yml/badge.svg)](https://github.com/ogxd/gxhash/actions/workflows/build_test.yml)

GxHash is a [**blazingly fast**](#performance) and [**robust**](#robustness) non-cryptographic hashing algorithm.
* [Usage](#usage)
* [Cargo Features](#cargo-features)
* [Features](#features)
* [Blazingly Fast 🚀](#blazingly-fast-)
* [Highly Robust 🗿](#highly-robust-)
* [Portability](#portability)
* [Supported Architectures](#supported-architectures)
* [Stability of Hashes](#stability-of-hashes)
* [Security](#security)
* [DOS Resistance](#dos-resistance)
* [Multicollisions Resistance](#multicollisions-resistance)
* [Cryptographic Properties](#cryptographic-properties)
* [Benchmarks](#benchmarks)
* [Contributing](#contributing)
* [Publication](#publication)

<!-- cargo-rdme start -->

A [blazingly fast](#blazingly-fast-) and [robust](#highly-robust-) non-cryptographic hashing algorithm.

## Usage
```bash
cargo add gxhash
```
Used directly as a hash function:

Directly as a hash function:

```rust
use gxhash::{gxhash32, gxhash64, gxhash128};

let bytes: &[u8] = "hello world".as_bytes();
let seed = 1234;

println!(" 32-bit hash: {:x}", gxhash::gxhash32(&bytes, seed));
println!(" 64-bit hash: {:x}", gxhash::gxhash64(&bytes, seed));
println!("128-bit hash: {:x}", gxhash::gxhash128(&bytes, seed));
```
Used in `HashMap`/`HashSet`:

GxHash provides an implementation of the [`Hasher`](core::hash::Hasher) trait.
For convenience and interop with crates which require a `std::collection::HashMap`, the type aliases `HashMap`, `HashSet` are provided:

```rust
// Type alias for HashSet::<String, GxBuildHasher>
let mut hashset = gxhash::GxHashSet::default();
hashset.insert("hello world");
use gxhash::{HashMap, HashMapExt};

let mut map: HashMap<&str, i32> = HashMap::new();
map.insert("answer", 42);
```

## Cargo Features

* `avx2` -- Enables AVX2 support for the `gxhash128` and `gxhash64` functions.
* `std` -- Enables the `HashMap`/`HashSet` container convenience type aliases. This is on by default. Disable to make the crate `no_std`:

```toml
[dependencies.gxhash]
...
default-features = false
```

## Features

### Blazingly Fast 🚀
Up to this date, GxHash is the fastest non-cryptographic hashing algorithm of its class, for all input sizes. This performance is possible mostly thanks to heavy usage of SIMD intrinsics, high ILP construction and a small bytecode (easily inlined and cached).
See the [benchmarks](#benchmarks).
### Blazingly Fast 🚀

### Highly Robust 🗿
GxHash uses several rounds of hardware-accelerated AES block cipher for efficient bit mixing.
Thanks to this, GxHash passes all [SMHasher](https://github.com/rurban/smhasher) tests, which is the de facto quality benchmark for non-cryptographic hash functions, gathering most of the existing algorithms. GxHash has low collisions, uniform distribution and high avalanche properties.
As of this writing, GxHash is the fastest, non-cryptographic hashing algorithm of its class, for all input sizes. This performance is possible foremost due
to heavy usage of SIMD intrinsics, high ILP construction and a small bytecode (easily inlined and cached).

Check out the [paper](https://github.com/ogxd/gxhash-rust/blob/main/article/article.pdf) for more technical details.
See the [benchmarks](https://github.com/ogxd/gxhash#benchmarks).

### Highly Robust 🗿

GxHash uses several rounds of hardware-accelerated AES block cipher for efficient bit mixing.
Thanks to this, GxHash passes all [SMHasher](https://github.com/rurban/smhasher) tests, which is the de facto quality benchmark for non-cryptographic hash
functions, gathering most of the existing algorithms. GxHash has low collisions, uniform distribution and high avalanche properties.

Check out the [paper](https://github.com/ogxd/gxhash/blob/main/article/article.pdf) for more technical details.

## Portability

### Architecture Compatibility
### Supported Architectures

GxHash is compatible with:
- X86 processors with `AES-NI` intrinsics
- ARM processors with `NEON` intrinsics
> **Warning**
> Other platforms are currently not supported (there is no fallback). The behavior on these platforms is undefined.

### Hashes Stability
All generated hashes for a given version of GxHash are stable, meaning that for a given input the output hash will be the same across all supported platforms. An exception to this is the AVX2 version of GxHash (nightly).
* x86 processors with `AES-NI` intrinsics.
* ARM processors with `NEON` intrinsics.

> **⚠️ Warning**
>
> Other platforms are currently not supported (there is no fallback). Currently the crate does not build on these. If you add support for a new platform,
> a PR is highly welcome.
### Stability of Hashes

All generated hashes for a given version of GxHash are stable. This means that for a given input the output hash will be the same across all supported
platforms.

*An exception to this is the AVX2 version of GxHash (requires a `nightly` toolchain).*

## Security

### DOS Resistance

GxHash is a seeded hashing algorithm, meaning that depending on the seed used, it will generate completely different hashes. The default `HasherBuilder`
(`GxHasherBuilder::default()`) uses seed randomization, making any `HashMap`/`HashSet` more DOS resistant, as it will make it much more difficult for
attackers to be able to predict which hashes may collide without knowing the seed used. This does not mean however that it is completely DOS resistant.
This has to be analyzed further.

### Multicollisions Resistance

GxHash uses a 128-bit internal state (and even 256-bit with the `avx2` feature). This makes GxHash
[a widepipe construction](https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction#Wide_pipe_construction) when generating hashes of size
64-bit or smaller. Which, among other useful properties, are inherently more resistant to multicollision attacks. See
[this paper](https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf) for more details.

### Cryptographic Properties

GxHash is a non-cryptographic hashing algorithm, thus it is not recommended to use it as a cryptographic algorithm (it is e.g. not a replacement for SHA).
It has not been assessed if GxHash is preimage resistant and how difficult it is to be reversed.

<!-- cargo-rdme end -->

## Benchmarks

To run the benchmarks locally use one of the following:
[![Benchmark](https://github.com/ogxd/gxhash/actions/workflows/bench.yml/badge.svg)](https://github.com/ogxd/gxhash/actions/workflows/bench.yml)

To run the benchmarks locally do one of the following:

```bash
# Benchmark throughput
cargo bench --bench throughput
Expand All @@ -60,47 +136,44 @@ cargo bench --bench throughput --features bench-md
cargo bench --bench throughput --features bench-plot
```

GxHash is continuously benchmarked on X86 and ARM Github runners.
[![Benchmark](https://github.com/ogxd/gxhash/actions/workflows/bench.yml/badge.svg)](https://github.com/ogxd/gxhash/actions/workflows/bench.yml)
GxHash is continuously benchmarked on X86 and ARM Github runners.

**Lastest Benchmark Results:**

**Lastest Benchmark Results:**
![aarch64](./benches/throughput/aarch64.svg)
![x86_64](./benches/throughput/x86_64.svg)
![x86_64-avx2](./benches/throughput/x86_64-avx2.svg)

## Security

### DOS Resistance
GxHash is a seeded hashing algorithm, meaning that depending on the seed used, it will generate completely different hashes. The default `HasherBuilder` (`GxHasherBuilder::default()`) uses seed randomization, making any `HashMap`/`HashSet` more DOS resistant, as it will make it much more difficult for attackers to be able to predict which hashes may collide without knowing the seed used. This does not mean however that it is completely DOS resistant. This has to be analyzed further.
## Contributing

### Multicollisions Resistance
GxHash uses a 128-bit internal state (and even 256-bit with the `avx2` feature). This makes GxHash [a widepipe construction](https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction#Wide_pipe_construction) when generating hashes of size 64-bit or smaller, which had amongst other properties to be inherently more resistant to multicollision attacks. See [this paper](https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf) for more details.
* Feel free to submit PRs
* Repository is entirely usable via `cargo` commands
* Versioning is the following
* Major for stability breaking changes (output hashes for a same input are different after changes)
* Minor for API changes/removal
* Patch for new APIs, bug fixes and performance improvements

### Cryptographic Properties
GxHash is a non-cryptographic hashing algorithm, thus it is not recommended to use it as a cryptographic algorithm (it is not a replacement for SHA). It has not been assessed if GxHash is preimage resistant and how difficult it is to be reversed.
> **🛈 Note**
>
> [`cargo-asm`](https://github.com/gnzlbg/cargo-asm) is an easy way to view the actual generated assembly code (`cargo asm gxhash::gxhash::gxhash64`).
> *Note that `#[inline]` should be removed; otherwise the resp. method won't be seen by the tool.*
## Contributing
> **🛈 Note**
>
> [AMD μProf](https://www.amd.com/en/developer/uprof.html) gives some useful insights on per-instruction time spent.
- Feel free to submit PRs
- Repository is entirely usable via `cargo` commands
- Versioning is the following
- Major for stability breaking changes (output hashes for a same input are different after changes)
- Minor for API changes/removal
- Patch for new APIs, bug fixes and performance improvements
## Publication

> ℹ️ [cargo-asm](https://github.com/gnzlbg/cargo-asm) is an easy way to view the actual generated assembly code (`cargo asm gxhash::gxhash::gxhash64`) (method `#[inline]` should be removed otherwise it won't be seen by the tool)
> ℹ️ [AMD μProf](https://www.amd.com/en/developer/uprof.html) gives some useful insights on time spent per instruction.
*Author's note:*

## Publication
> Author note:
> I'm committed to the open dissemination of scientific knowledge. In an era where access to information is more democratized than ever, I believe that science should be freely available to all – both for consumption and contribution. Traditional scientific journals often involve significant financial costs, which can introduce biases and can shift the focus from purely scientific endeavors to what is currently trendy.
> I'm committed to the open dissemination of scientific knowledge. In an era where access to information is more democratized than ever, I believe that science should be freely available to all – both for consumption and contribution. Traditional scientific journals often involve significant financial costs, which can introduce biases and can shift the focus from purely scientific endeavors to what is currently trendy.
>
> To counter this trend and to uphold the true spirit of research, I have chosen to share my work on "gxhash" directly on GitHub, ensuring that it's openly accessible to anyone interested. Additionally, the use of a free Zenodo DOI ensures that this research is citable and can be referenced in other works, just as traditional publications are.
> To counter this trend and to uphold the true spirit of research, I have chosen to share my work on "gxhash" directly on GitHub, ensuring that it's openly accessible to anyone interested. Additionally, the use of a free Zenodo DOI ensures that this research is citable and can be referenced in other works, just as traditional publications are.
>
> I strongly believe in a world where science is not behind paywalls, and I am in for a more inclusive, unbiased, and open scientific community.
Publication:
Publication:
[PDF](https://github.com/ogxd/gxhash-rust/blob/main/article/article.pdf)

Cite this publication / algorithm:
Cite this publication/algorithm:
[![DOI](https://zenodo.org/badge/690754256.svg)](https://zenodo.org/badge/latestdoi/690754256)
12 changes: 6 additions & 6 deletions benches/hashset.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,11 @@ use ahash::AHashSet;
use criterion::{criterion_group, criterion_main, Criterion};
use fnv::FnvHashSet;
use gxhash::*;
use twox_hash::xxh3;
use std::collections::HashSet;
use std::hash::{BuildHasherDefault, BuildHasher};
use std::hash::{BuildHasher, BuildHasherDefault};
use twox_hash::xxh3;

fn hashmap_insertion(c: &mut Criterion) {

// Short keys
benchmark_for_string(c, "gxhash");

Expand All @@ -29,7 +28,7 @@ fn benchmark_for_string(c: &mut Criterion, string: &str) {
iterate(b, string, &mut set);
});

let mut set: HashSet::<String, GxBuildHasher> = GxHashSet::<String>::default();
let mut set: HashSet<String, GxBuildHasher> = GxHashSet::<String>::default();
group.bench_function("GxHash", |b| {
iterate(b, string, &mut set);
});
Expand All @@ -54,7 +53,8 @@ fn benchmark_for_string(c: &mut Criterion, string: &str) {

#[inline(never)]
fn iterate<T>(b: &mut criterion::Bencher<'_>, string: &str, set: &mut HashSet<String, T>)
where T: BuildHasher
where
T: BuildHasher,
{
// If hashmap is empty, it may skip hashing the key and simply return false
// So we add a single value to prevent this optimization
Expand All @@ -67,4 +67,4 @@ fn iterate<T>(b: &mut criterion::Bencher<'_>, string: &str, set: &mut HashSet<St
}

criterion_group!(benches, hashmap_insertion);
criterion_main!(benches);
criterion_main!(benches);
10 changes: 5 additions & 5 deletions benches/ilp.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ fn baseline(input: &[u64]) -> u64 {
while i < input.len() {
h = hash(h, input[i]);

i = i + 1;
i += 1;
}
h
}
Expand All @@ -29,7 +29,7 @@ fn unrolled(input: &[u64]) -> u64 {
h = hash(h, input[i + 3]);
h = hash(h, input[i + 4]);

i = i + 5;
i += 5;
}
h
}
Expand All @@ -46,7 +46,7 @@ fn temp(input: &[u64]) -> u64 {

h = hash(h, tmp);

i = i + 5;
i += 5;
}
h
}
Expand All @@ -56,7 +56,7 @@ fn laned(input: &[u64]) -> u64 {
let mut h2: u64 = OFFSET;
let mut h3: u64 = OFFSET;
let mut h4: u64 = OFFSET;
let mut h5: u64 = OFFSET;
let mut h5: u64 = OFFSET;
let mut i: usize = 0;
while i < input.len() {
h1 = hash(h1, input[i]);
Expand All @@ -65,7 +65,7 @@ fn laned(input: &[u64]) -> u64 {
h4 = hash(h4, input[i + 3]);
h5 = hash(h5, input[i + 4]);

i = i + 5;
i += 5;
}
hash(hash(hash(hash(h1, h2), h3), h4), h5)
}
Expand Down
Loading

0 comments on commit 49a85c0

Please sign in to comment.