Skip to content

Copy sites locally very fast using cuckoo hashtables and xxhash and DAG

License

Notifications You must be signed in to change notification settings

haturatu/cuckooget

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cuckooget

                      __                                      __      
                     /\ \                                    /\ \__   
  ___   __  __    ___\ \ \/'\     ___     ___      __      __\ \ ,_\  
 /'___\/\ \/\ \  /'___\ \ , <    / __`\  / __`\  /'_ `\  /'__`\ \ \/  
/\ \__/\ \ \_\ \/\ \__/\ \ \\`\ /\ \L\ \/\ \L\ \/\ \L\ \/\  __/\ \ \_ 
\ \____\\ \____/\ \____\\ \_\ \_\ \____/\ \____/\ \____ \ \____\\ \__\
 \/____/ \/___/  \/____/ \/_/\/_/\/___/  \/___/  \/___L\ \/____/ \/__/
                                                   /\____/            
                                                   \_/__/             

What

A very fast website copy script using a cuckoo hash table & xxhash & DAG. There are still many problems. I feel sad about disappearing websites, and I’m thinking of ways to save them even faster.

Websites are our memories.
Let everyone rise up and preserve disappearing historical websites, leaving them for the future.
For all geeks and for those who love the internet. If you find an interesting website, please contact me.

Furthermore, with the -w option, you can set higher priorities based on the URL. I don't think other website mirroring software has this feature.

Collisions are avoided by the cuckoo hash table and generated by the ultra-fast xxhash. It consists of xxh32 and xxh64 as different hash values.

Install

deps

pip install maturin
pip install -r requirements.txt

You can build the CuckooHashtables implemented in Rust and install it using pip. This will allow you to call it from your Python code. If you prefer not to install it globally, you can also install it from within a virtual environment.

maturin build
pip install target/wheels/your_package_name.whl

chmod +x main.py

or

pip install target/wheels/your_package_name.whl --force-reinstall

chmod +x main.py

Usage

python3 ./main.py

usage: main.py [-h] [-c CONNECTIONS] [-w WEIGHTS [WEIGHTS ...]]
               [-v EXCLUDE [EXCLUDE ...]]
               url output_dir

About

Copy sites locally very fast using cuckoo hashtables and xxhash and DAG

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published