Skip to content

Latest commit

 

History

History
80 lines (58 loc) · 4.15 KB

README.md

File metadata and controls

80 lines (58 loc) · 4.15 KB

dedupe Go

A tool for efficiently scanning over large numbers of files, to find and remove duplicates. Mac, Linux, and Windows binaries are here.

I wrote this for a bit of practice using Go, and to help me finally get around to cleaning up that mess of photo backups.

Use Case

Here's an example use case for this tool: I have spent a while organising most of my photos, by year etc. These are stored in /photos. I have a copy in /backup from when I was part of the way through the re-organization. I have a directory /unsorted with photos that were downloaded from iCloud (in /unsorted/icloud), Amazon Photos (in /unsorted/amazon), Google Photos and others. Some photos are only backed up in one place, some are duplicated, it's a big mess.

What I'd like to do is remove all duplicates from /backup and /unsorted so that I can organise the ones that aren't in /photos yet. I can do that with just one command:

dedupe --trash=/trash /photos /backup /unsorted

I created a /trash directory to move the duplicates that are 'trashed'. They won't be deleted, and I can undo the move of some/all of them by manually copying them back to where they were originally. Duplicates will be moved to /trash/backup, /trash/unsorted/amazon, etc. based on where they started.

If I'm nervous about what could get moved I could leave off the --trash option, and it would just find duplicates and output what would be moved.

dedupe /photos /backup /unsorted

It decides what to move based on the order of the directories- higher priority first. If I have some directories within /backup that are better organised even than my main /photos directory I can make those higher priority:

dedupe --trash=/trash /backup/2015move /backup/2020aprilfun /photos /backup /unsorted

Downloads

If you don't want to build it yourself you can pull down a precompiled binary from here:

Options

Here's the help text which the tool outputs, if it's unclear let me know or file an issue:

Usage: dedupe [OPTION] DIRECTORY...
       dedupe --trash=<trash> [OPTION]... DIRECTORY...

Search DIRECTORY(ies)... for duplicate files and optionally moves them to <trash> without user interaction.

DIRECTORY order is used for priority, highest first. Higher priority files are left untouched and lower priority files are moved. 

Mandatory parameters:

Options:
        --trash             root directory for moved duplicates, (default: files not moved)
        --compare-time      compare file modification time (default: false)
        --compare-name      compare file name (default: false)
        --compare-size      compare file size (default: true)
        --compare-hash      compare file hash (default: true)
        --compare-contents  compare whole file contents (default: false)
        --min-size          minimum file size, bytes or human readable e.g. 4M, 5G (default 1)
        --follow-symlinks   follow symbolic links, false ignores them (default false)
        --verbose           emit verbose information (default: false)
        --version           output version and license information and exit

Advanced options:
        --scan-buffer       size of the scan buffer (default: 100)
        --scanners          number of scanner coroutines (default: 2)
        --match-buffer      size of the match buffer (default: 100)
        --matchers          number of matcher coroutines (default: 4)
        --move-buffer       size of the move buffer (default: 100)
        --movers            number of mover coroutines (default: 2)
        --max-cpus          maximum CPUs to use (default: system setting)

About

Copyright 2020 Alan Davies <[email protected]>
Distributed under the MIT license <https://opensource.org/licenses/MIT>.
See <https://github.com/glxxyz/dedupe> for documentation and help.