Scans recursively a path to match given sha1 checksums. Usefull to find duplicate files, or to find relevant/irrelevant/unknown files.
hscan -d <PATH> -db <PATH>
-d string
Directory to scan recursively
-db string
Directory containing text files with sha1 to search (1 checksum by line)
You have the file dbpath/sha1.txt
Searching for files having those checksums in the directory test/
hscan -d test -db dbpath
# result :
Loading database file "dbpath/sha1.txt"... 3 uniq checksum found in "46.975µs"
Scanning path "tmp"...
1964 files - 0 unreadable files - 492 dirs - 0 unreadable dirs - 3 matches
sha1tmp.txt : 3 matches
Total : 3 matches
Done in 292.09673ms
Matching files, unknown files, and errors are written in real time into result.csv
# sha1,dbfile,filename,error
,,/home/jeff/tmp/mysqltmp/undo_001,open /home/jeff/tmp/mysqltmp/undo_001: permission denied
A SQLite3 database named result.db
with the same data as the CSV is created at the end of the process.
Get the latest release or download and install from source :
git config --global --add url."[email protected]:".insteadOf ""
go get
cd ~/go/src/
# Linux
env GOOS=linux GOARCH=amd64 go build hscan.go
# Windows
env GOOS=windows GOARCH=amd64 go build -o hscan.exe hscan.go
# Raspberry Pi
env GOARM=7 GOARCH=arm go build hscan.go
go install
go test
Tried on :
- OS : Linux
- HDD : 128 Gb SSD + 2 Tb HDD
- CPU: Intel(R) Xeon(R) CPU E5-1660 v3 @ 3.00GHz
- Memory: 32 Gb
Loading a NIST/NSRL file of 1,2Gb containing 29,459,433 took 22.14s. Scanning 2Tb and 128 Gb of data took 1h32m34s. This depends on the data stored and the free space on the drive. Further tests will be done shortly.
$> hscan -d / -db bases_hash/
Loading database file "bases_hash/nsrl_sha1_uniq.txt"... 29459433 uniq checksum found in "22.146464941s"
Scanning path "/"...
2012574 files - 12091 unreadable files - 274715 dirs - 2510 unreadable dirs - 287870 matches
nsrl_sha1_uniq.txt : 287870 matches
Total : 287870 matches
Done in 1h32m34.505006098s