Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 compare duplicated images using file hashes instead of file sizes #9

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Synology

This repository contains a set of Python scripts that streamline processes not supported by Synology services (e.g. Synology Photos). Each script is responsible for its own goal and is documented in a README file located in the script directory.
[![Pylint](https://github.com/filipliwinski/Synology/actions/workflows/pylint.yml/badge.svg)](https://github.com/filipliwinski/Synology/actions/workflows/pylint.yml)

This repository contains a set of Python scripts that streamline processes not supported by Synology services (e.g. Synology Photos). Each script is responsible for its own purpose and is documented in a README file located in the script's directory.

## How to contribute?

Expand Down Expand Up @@ -28,5 +30,4 @@ Synology Photos shows photos stored in hidden folders (unix style). This script

### Photo Slideshow

It is possible to display a slideshow directly in the Synology Photos web interface, but it is very limited (it can be only run for one Album/tag, the sliseshow speed cannot be adjused, nor the order of the photos). The script will allow to select multiple tags, specifiy start and end dates and will cache the images on the device if needed.

It is possible to display a slideshow directly in the Synology Photos web interface, but it is very limited (it can be only run for one Album/tag, the speed of the slideshow cannot be adjused, nor the order of the photos). The script will allow to select multiple tags, specifiy start and end dates and will cache the images on the device if needed.
2 changes: 1 addition & 1 deletion src/photo_dumper/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This Python script allows you to import photos to your Synology NAS form a local

[Synology Photos](https://www.synology.com/en-uk/dsm/feature/photos) is a free service available on Synology DSM 7+ which allows to store photos and videos on a local NAS. It comes with a very nice web and mobile clients which are great for viewing and sharing photos, but not so great if you want to move a lot of photos to your NAS at once, while preserving the division of directories by year and month.

This script checks for `.jpg` and `.jpeg` files in the specified source directory, reads the original date taken from the EXIF data of the photos and saves them in the target directory in folders based on the year and month they were taken. It handles duplicated files and verifies the files for corruption after they have been transferred. The script renames the transferred files to include the date and time the photo was taken and the file size, e.g. `IMG_20230216_211349_03771730.JPG`, where `20230216` is the date in format `yyyyMMdd`, `211349` is the time in format `hhmmss` and `03771730` is the size of the file in bytes.
This script checks for `.jpg` and `.jpeg` files in the specified source directory, reads the original date taken from the EXIF data of the photos and saves them in the target directory in folders based on the year and month they were taken. It handles duplicated files and verifies the files for corruption after they have been transferred. The script renames the transferred files to include the date and time the photo was taken (format used by Synology Photos mobile apps when uploading files) and the file size in bytes, e.g. `IMG_20230216_211349_03771730.JPG`, where `20230216` is the date in format `yyyyMMdd`, `211349` is the time in format `hhmmss` and `03771730` is the size of the file.

### Remarks

Expand Down
4 changes: 3 additions & 1 deletion src/photo_dumper/file_stats.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Copyright (c) Filip Liwiński
# Licensed under the MIT License. See the LICENSE file in the project root for license information.

""" Collects statistics of file operations."""

class FileStats:
Expand Down Expand Up @@ -72,4 +73,5 @@ def __str__(self):
COPIED: {self.copied}
DUPLICATES: {self.duplicates}
CONFLICTS: {self.conflicts}
UNSUPPORTED: {self.unsupported}"""
UNSUPPORTED: {self.unsupported}
TOTAL: {self.total}"""
41 changes: 31 additions & 10 deletions src/photo_dumper/photo_dumper.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,17 @@
import shutil
import sys
import logging
import hashlib
from datetime import datetime
import piexif

from tqdm import tqdm
from version import __version__
from file_stats import FileStats

EXIF_DATE_TIME_ORGINAL = "0x9003"
TARGET_FILE_NAME_PREFIX = "IMG"
TARGET_FILE_FORMAT = "JPG"

def _get_original_date_taken(photo_file_path):
"""
Expand All @@ -24,7 +28,7 @@ def _get_original_date_taken(photo_file_path):
exif_data = piexif.load(photo_file_path)

# Get the value of the DateTimeOriginal tag (0x9003) from the EXIF data
date_taken = exif_data["Exif"].get(0x9003)
date_taken = exif_data["Exif"].get(EXIF_DATE_TIME_ORGINAL)

# If the tag is present, convert the value to a datetime object and return it
if date_taken:
Expand All @@ -35,28 +39,42 @@ def _get_original_date_taken(photo_file_path):
last_modified = os.path.getmtime(photo_file_path)
return datetime.fromtimestamp(last_modified)

def _calculate_file_hash(file_path):
"""
Calculates the hash of the provided file using the SHA-256 alghoritm.
"""

sha256 = hashlib.sha256()

with open(file_path, "rb") as file:
file_bytes = file.read()
sha256.update(file_bytes)

file_hash = sha256.hexdigest()

return file_hash

def _check_file_uniqueness(file_path, destination_file_path):
"""
Returns True if the file is not a duplicate and a file with a given name
does not exist in the target location. If the file is a duplicate, returns False
and if a file with the given name already exists, returns None.

Uses file size to determine uniqueness.
Uses file hash to determine uniqueness.
"""

if not os.path.isfile(destination_file_path):
# The file is not a duplicate
return True

desctination_file_size = os.path.getsize(destination_file_path)
file_size = os.path.getsize(file_path)
destination_file_hash = _calculate_file_hash(destination_file_path)
file_hash = _calculate_file_hash(file_path)

if desctination_file_size == file_size:
if destination_file_hash == file_hash:
# The file is a duplicate
return False

# The file is not a duplicate, but the file with a given name already exists
# The file is not a duplicate, but a file with a given name already exists
return None


Expand All @@ -78,9 +96,10 @@ def _verify_and_copy_file(file, source_file_path, target_directory, dry_run, fil

source_file_size = os.path.getsize(source_file_path)
target_file_name = (
f"IMG_{creation_date.strftime('%Y%m%d')}_"
f"{TARGET_FILE_NAME_PREFIX}_"
f"{creation_date.strftime('%Y%m%d')}_"
f"{creation_date.strftime('%H%M%S')}_"
f"{source_file_size:08d}.JPG")
f"{source_file_size:08d}.{TARGET_FILE_FORMAT}")
target_file_path = f"{target_folder_path}\\{target_file_name}"
is_unique = _check_file_uniqueness(
source_file_path, target_file_path)
Expand All @@ -89,7 +108,7 @@ def _verify_and_copy_file(file, source_file_path, target_directory, dry_run, fil
# The file is unique, but a different one with the same name exists
file_stats.report_conflict()
logging.warning(
"%s skipped (conflict - a file with this name already exists)",
"%s skipped (conflict - a file with the given name already exists)",
source_file_path)
else:
if is_unique:
Expand Down Expand Up @@ -157,7 +176,9 @@ def main():
filename=f"photo_dumper_{current_timestamp}.log",
filemode="w")

logging.info("Photo Dumper v.%s", __version__)
script_name = f"Photo Dumper v.{__version__}"
logging.info(script_name)
print(script_name)

source_directory = sys.argv[1]
target_directory = sys.argv[2]
Expand Down