Skip to content

Latest commit

 

History

History
74 lines (53 loc) · 2.76 KB

README.md

File metadata and controls

74 lines (53 loc) · 2.76 KB

sdhash

Tests codecov Go Report Card GoDoc Release Language License

sdhash is a tool that processes binary data and produces similarity digests using bloom filters. Two binary files with common parts produces two similar digests. sdhash is able to compare the similarity digests to produce a score. A score close to 0 means that two file are very different, a score equals to 100 means that two file are equal.

Features

  • calculate similarity digests of many files in a short time
  • compare a large amount of digests using precalculated indexes
  • the comparison can also be made during the digest process
  • same results of original sdhash with similar performance, but entirely rewritten in go language

Getting started

The sdhash package is available as binaries and as a library.

Binaries

The binaries for all platforms are available on the Releases page.

Library

  1. Install sdhash package with the command below
$ go get -u github.com/eciavatta/sdhash
  1. Import it in your code and start play around
package main

import (
	"fmt"
	"github.com/eciavatta/sdhash"
)

func main() {
	factoryA, _ := sdhash.CreateSdbfFromFilename("a.bin")
	sdbfA := factoryA.Compute()

	factoryB, _ := sdhash.CreateSdbfFromFilename("b.bin")
	sdbfB := factoryB.Compute()

	fmt.Println(sdbfA.String())
	fmt.Println(sdbfB.String())
	fmt.Println(sdbfA.Compare(sdbfB))
}

Documentation

The library documentation is published at pkg.go.dev/github.com/eciavatta/sdhash. How sdhash works is described in this paper, and here you can find a tutorial of the original version of sdhash.

License

sdhash is originally created by Vassil Roussev and Candice Quates and is licensed under Apache-2.0 License. The implementation in golang was made by Emiliano Ciavatta and is also licensed under Apache-2.0 License.