Description

fdup.py is a simple and fast program that finds duplicate files.

Why?

Because it is amazingly fast. Much faster than fdupes, which is written in C and much more readable than fslint/findup.

Python is not a limiting factor, but disc speed is. Therefore a sane algorithm to find/sort out potential duplicate files is much more important than the language used. In the end it is all about the algorithm and disc performance. Fstat, disc IO, hashing is in Python nearly as fast as in C, don't worry.

Usage

$ find $PWD -type f | ./fdup.py

or to exclude the time find needs:

$ find $PWD -type f > files.txt $ ./fdup.py < files.txt

RESULTS

Testdirectory is my $HOME which contained 62022 files. There are 18680 duplicate files (empty files, duplicates from svn and git repos)

Program	user	system	cpu (%)	total
fdup	3.38s	6.10s	5	3:01.89
fslint	18.04s	9.20s	12	3:41.20
fdupes	62.35s	15.46s	20	6:16.49
duff	22.59s	4.42s	6	7:18.13
dupseek	18.33s	6.55s	4	8:30.35
ftwin	15.94s	7.50s	3	9:57.91

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
fdup.py		fdup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Why?

Usage

RESULTS

About

Releases

Packages

Languages

rck/fdup

Folders and files

Latest commit

History

Repository files navigation

Description

Why?

Usage

RESULTS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages