Skip to content

Commit 12af150

Browse files
Implement Hasher functionality #12 (#16)
* Implement Hasher functionality #12 * upgrade deps and bump version to 0.2.0 * debug gh pytest not running on arm * disable index when installing pycrc32 package for testing * debugging arm install * debug log platform * update CI workflow * fix README
1 parent c690d22 commit 12af150

File tree

7 files changed

+214
-28
lines changed

7 files changed

+214
-28
lines changed

.github/workflows/CI.yml

Lines changed: 28 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@ jobs:
2525
matrix:
2626
target: [x86_64, x86, aarch64, armv7, s390x, ppc64le]
2727
steps:
28-
- uses: actions/checkout@v4
29-
- uses: actions/setup-python@v5
28+
- uses: actions/checkout@v3
29+
- uses: actions/setup-python@v4
3030
with:
3131
python-version: '3.10'
3232
- name: Build wheels
@@ -42,21 +42,37 @@ jobs:
4242
name: wheels
4343
path: dist
4444
- name: pytest
45+
if: ${{ startsWith(matrix.target, 'x86_64') }}
4546
shell: bash
4647
run: |
4748
set -e
48-
pip install pycrc32 --find-links dist --force-reinstall
49+
pip install pycrc32 --find-links dist --force-reinstall --no-index
4950
pip install pytest
5051
pytest
52+
- name: pytest
53+
if: ${{ !startsWith(matrix.target, 'x86') && matrix.target != 'ppc64' }}
54+
uses: uraimo/[email protected]
55+
with:
56+
arch: ${{ matrix.target }}
57+
distro: ubuntu22.04
58+
githubToken: ${{ github.token }}
59+
install: |
60+
apt-get update
61+
apt-get install -y --no-install-recommends python3 python3-pip
62+
pip3 install -U pip pytest
63+
run: |
64+
set -e
65+
pip3 install pycrc32 --find-links dist --force-reinstall --no-index
66+
pytest
5167
5268
windows:
5369
runs-on: windows-latest
5470
strategy:
5571
matrix:
5672
target: [x64, x86]
5773
steps:
58-
- uses: actions/checkout@v4
59-
- uses: actions/setup-python@v5
74+
- uses: actions/checkout@v3
75+
- uses: actions/setup-python@v4
6076
with:
6177
python-version: '3.10'
6278
architecture: ${{ matrix.target }}
@@ -72,10 +88,11 @@ jobs:
7288
name: wheels
7389
path: dist
7490
- name: pytest
91+
if: ${{ !startsWith(matrix.target, 'aarch64') }}
7592
shell: bash
7693
run: |
7794
set -e
78-
pip install pycrc32 --find-links dist --force-reinstall
95+
pip install pycrc32 --find-links dist --force-reinstall --no-index
7996
pip install pytest
8097
pytest
8198
@@ -85,8 +102,8 @@ jobs:
85102
matrix:
86103
target: [x86_64, aarch64]
87104
steps:
88-
- uses: actions/checkout@v4
89-
- uses: actions/setup-python@v5
105+
- uses: actions/checkout@v3
106+
- uses: actions/setup-python@v4
90107
with:
91108
python-version: '3.10'
92109
- name: Build wheels
@@ -101,17 +118,18 @@ jobs:
101118
name: wheels
102119
path: dist
103120
- name: pytest
121+
if: ${{ !startsWith(matrix.target, 'aarch64') }}
104122
shell: bash
105123
run: |
106124
set -e
107-
pip install pycrc32 --find-links dist --force-reinstall
125+
pip install pycrc32 --find-links dist --force-reinstall --no-index
108126
pip install pytest
109127
pytest
110128
111129
sdist:
112130
runs-on: ubuntu-latest
113131
steps:
114-
- uses: actions/checkout@v4
132+
- uses: actions/checkout@v3
115133
- name: Build sdist
116134
uses: PyO3/maturin-action@v1
117135
with:

Cargo.lock

Lines changed: 23 additions & 15 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "pycrc32"
3-
version = "0.1.3"
3+
version = "0.2.0"
44
edition = "2021"
55
authors = ["cybuerg <[email protected]>"]
66
description = "Python module for SIMD-accelerated CRC32 checksum computation"

README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,57 @@ data = b"123456789"
2121
print(f"crc32 for {data!r} is {crc32(data)}")
2222
```
2323

24+
### Advanced Checksum Calculation with `Hasher`
25+
For scenarios that require more flexibility, such as processing large amounts of data or computing the checksum in stages, you can use the `Hasher` class:
26+
```python
27+
from pycrc32 import Hasher
28+
29+
# Create a new Hasher instance
30+
hasher = Hasher()
31+
32+
# Update the hasher with data chunks
33+
hasher.update(b"123456")
34+
hasher.update(b"789")
35+
36+
# Finalize the computation and get the checksum
37+
checksum = hasher.finalize()
38+
print(f"Checksum: {checksum}")
39+
40+
# Reset the hasher to compute another checksum
41+
hasher.reset()
42+
hasher.update(b"The quick brown fox jumps over the lazy dog")
43+
new_checksum = hasher.finalize()
44+
print(f"New checksum: {new_checksum}")
45+
```
46+
47+
You can also initialize a `Hasher` with a specific initial CRC32 state:
48+
```python
49+
initial_crc = 12345678
50+
hasher = Hasher.with_initial(initial_crc)
51+
52+
hasher.update(b"additional data")
53+
final_checksum = hasher.finalize()
54+
print(f"Final checksum with initial state: {final_checksum}")
55+
```
56+
57+
To combine checksums from different data blocks without needing to concatenate the data, use the `combine` method:
58+
```python
59+
hasher1 = Hasher()
60+
hasher1.update(b"Data block 1")
61+
checksum1 = hasher1.finalize()
62+
63+
hasher2 = Hasher()
64+
hasher2.update(b"Data block 2")
65+
checksum2 = hasher2.finalize()
66+
67+
# Combine checksums from hasher1 into hasher2
68+
hasher1.combine(hasher2) # Combine the state of hasher2 into hasher1
69+
70+
# The final checksum after combination
71+
combined_checksum = hasher1.finalize()
72+
print(f"Combined checksum: {combined_checksum}")
73+
```
74+
2475
## Speed
2576
The performance of `pycrc32` has been benchmarked on a trusty old Intel i7-8550U using 32MB of random input data. Below is a comparison of the median computation times across different libraries:
2677

pycrc32/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
from .pycrc32 import crc32
1+
from .pycrc32 import Hasher, crc32

pycrc32/tests/test_crc32.py

Lines changed: 64 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import pytest
22

3-
from pycrc32 import crc32
3+
from pycrc32 import Hasher, crc32
44

55

66
@pytest.mark.parametrize(
@@ -14,3 +14,66 @@
1414
)
1515
def test_crc32(input_bytes, expected_crc):
1616
assert crc32(input_bytes) == expected_crc
17+
18+
19+
@pytest.mark.parametrize(
20+
"input_bytes,expected_crc",
21+
[
22+
(b"123456789", 3421780262),
23+
(b"", 0),
24+
(b"a", 3904355907),
25+
(b"The quick brown fox jumps over the lazy dog", 1095738169),
26+
],
27+
)
28+
def test_Hasher_simple_usage(input_bytes, expected_crc):
29+
hasher = Hasher()
30+
hasher.update(input_bytes)
31+
assert hasher.finalize() == expected_crc
32+
33+
34+
def test_Hasher_with_initial():
35+
initial_crc = 3421780262 # CRC for "123456789"
36+
additional_data = b" continuation"
37+
combined_crc = crc32(b"123456789 continuation")
38+
39+
hasher = Hasher.with_initial(initial_crc)
40+
hasher.update(additional_data)
41+
assert hasher.finalize() == combined_crc
42+
43+
44+
def test_Hasher_reset():
45+
input_bytes = b"The quick brown fox jumps over the lazy dog"
46+
hasher = Hasher()
47+
hasher.update(input_bytes)
48+
first_crc = hasher.finalize()
49+
50+
# Reset and reuse the hasher for the same input
51+
hasher.reset()
52+
hasher.update(input_bytes)
53+
second_crc = hasher.finalize()
54+
55+
assert first_crc == 1095738169
56+
assert second_crc == 1095738169
57+
58+
59+
@pytest.mark.parametrize(
60+
"input_bytes1,input_bytes2",
61+
[
62+
(b"123456789", b" continuation"),
63+
(b"The quick brown fox", b" jumps over the lazy dog"),
64+
],
65+
)
66+
def test_Hasher_combine(input_bytes1, input_bytes2):
67+
hasher1 = Hasher()
68+
hasher1.update(input_bytes1)
69+
crc1 = hasher1.finalize()
70+
71+
hasher2 = Hasher()
72+
hasher2.update(input_bytes2)
73+
hasher2.finalize()
74+
75+
combined_crc = crc32(input_bytes1 + input_bytes2)
76+
hasher_combined = Hasher.with_initial(crc1)
77+
hasher_combined.update(input_bytes2)
78+
79+
assert hasher_combined.finalize() == combined_crc

0 commit comments

Comments
 (0)