Skip to content
This repository has been archived by the owner on Feb 16, 2023. It is now read-only.

Feature Request: Combine documents #426

Closed
Zocker1999NET opened this issue Jan 24, 2021 · 3 comments
Closed

Feature Request: Combine documents #426

Zocker1999NET opened this issue Jan 24, 2021 · 3 comments

Comments

@Zocker1999NET
Copy link

Zocker1999NET commented Jan 24, 2021

My scanner is not aware of which scanned PNGs/PDFs belong together or not, and due to I simply want all my documents to be "OCRed" and searchable using paperless even before I was able to sort/combine them manually, it would be great if this could be integrated into paperless itself.

How could this be implemented on the UI:

  1. Select the documents you want: Screenshot_20210124_134659
  2. Click on a "Combine" button

What happens in the background:

  1. Combine the original documents (not the archived versions!) for example using ImageMagisk: convert "$@" pdf:-
  2. Delete all old entries of the selected documents
  3. Reprocess the new document as it was simply placed into the consume directory

Known issues with this implementation:

  • The original source files maybe cannot be currently handled, so they may be lost. Possible workaround: Before combining the originals to a PDF document, pack them together into a zip/tar archive, store that as "original document" and enable paperless to work with zip/tar archives if possible
  • Will most likely not support formats not supported by ImageMagisk like Office documents, however should be able to combine JPEGs/PNGs/PDFs/TIFFs. Possible workaround: Before combining using ImageMagisk them, convert each file not supported by ImageMagisk to a PDF reusing current existing strategies.
@jonaswinkler
Copy link
Owner

See #335

@Zocker1999NET
Copy link
Author

Okay, closing this as duplicate

@henfri
Copy link

henfri commented Feb 6, 2022

My scanner creates a filename plus suffix for each set of Documents I feed to it.
E.g.
set 1:
Receipt_004942.jpg
Receipt_004942_2.jpg
Receipt_004942_3.jpg
set 2:
Receipt_004946.jpg
Receipt_004946_2.jpg
Receipt_004946_3.jpg
Receipt_004946_4.jpg
Receipt_004946_5.jpg
Receipt_004946_6.jpg

I.e. every group of documents that I send in one go gets a new number. Following pages get a suffix _2, _3, ...

Are you sure that your scanner cannot do something similar?
For that, I have created a script, that could help:
#457 (comment)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants