__ __
/\ \ /\ \__
___ __ __ ___\ \ \/'\ ___ ___ __ __\ \ ,_\
/'___\/\ \/\ \ /'___\ \ , < / __`\ / __`\ /'_ `\ /'__`\ \ \/
/\ \__/\ \ \_\ \/\ \__/\ \ \\`\ /\ \L\ \/\ \L\ \/\ \L\ \/\ __/\ \ \_
\ \____\\ \____/\ \____\\ \_\ \_\ \____/\ \____/\ \____ \ \____\\ \__\
\/____/ \/___/ \/____/ \/_/\/_/\/___/ \/___/ \/___L\ \/____/ \/__/
/\____/
\_/__/
A very fast website copy script using a cuckoo hash table & xxhash & DAG. There are still many problems. I feel sad about disappearing websites, and I’m thinking of ways to save them even faster.
Websites are our memories.
Let everyone rise up and preserve disappearing historical websites, leaving them for the future.
For all geeks and for those who love the internet. If you find an interesting website, please contact me.
Furthermore, with the -w
option, you can set higher priorities based on the URL. I don't think other website mirroring software has this feature.
Collisions are avoided by the cuckoo hash table and generated by the ultra-fast xxhash. It consists of xxh32 and xxh64 as different hash values.
deps
pip install maturin
pip install -r requirements.txt
You can build the CuckooHashtables implemented in Rust and install it using pip. This will allow you to call it from your Python code. If you prefer not to install it globally, you can also install it from within a virtual environment.
maturin build
pip install target/wheels/your_package_name.whl
chmod +x main.py
or
pip install target/wheels/your_package_name.whl --force-reinstall
chmod +x main.py
python3 ./main.py
usage: main.py [-h] [-c CONNECTIONS] [-w WEIGHTS [WEIGHTS ...]]
[-v EXCLUDE [EXCLUDE ...]]
url output_dir