A simple, easy-to-script tool for merging multiple PDF files into one document using a YAML configuration file.
The program is written in Python, using the PyPDF4 library.
There are many good utilities for splitting and merging PDF files. For instance, if you prefer one with a GUI, PDF Arranger is a good choice. However, I had slightly different requirements:
- Composing several documents from a similar set of files, but each time with slight modifications.
- Creating a structure of bookmarks, so that it is easier to navigate the larger document.
- Original PDF features such as hyperlinks and the page orientation should remain intact.
- Simple YAML configuration file structure
- Adding PDF metadata
- Creating bookmarks, also with nested structures
- Checking the version of merged PDF files and setting the minimum in the output PDF
This program requires Python 3.6+.
Download and install from https://www.python.org/downloads/.
The best is to use Homebrew and install using
brew install python
Install Python using your distribution's package manager.
Usually the package is named python3
.
It is not strictly required but strongly advised to create a virtual environment for installing Python packages for a specific purpose.
Create one in any preferred location using
virtualenv -p python3 pdfconfig
The last argument can also be changed if preferred. Then activate the new environment:
source pdfconfig/bin/activate
pip install pdf-config
A configuration file is set up in YAML syntax with the following components:
metadata: # Optional
title: My document title
author: ME
# Additionally supported:
# creator, keywords, producer, subject
# Hard-coded version to set in the header. Set to 'auto' or leave out
# entirely for using the maximum version of all input documents.
version: '1.6'
paths: # Optional
# List of paths to look up any files that do not contain a path
# specification. The current directory is checked first, then the following
# directories are checked in that order.
- ~/my-pdfs # User home directory can be referred to.
- $ADDITIONAL_PDF_PATH # Environment variables are also supported.
contents: # The only required element
# Each list entry can contain any of the following:
# bookmark: The bookmark title
# document: The name (and path) of the input file.
# contents: An additional list of contents. Any bookmarks in this sub-structure
# are placed under this bookmark, if present.
- bookmark: First
document: first.pdf
- bookmark: Second
contents:
- bookmark: Second doc 1
document: sd1.pdf
- bookmark: Second doc 2
document: sd2.pdf
- bookmark: Third
document: ~/pdfpath/third.pdf # Relative and absolute paths are supported.
contents:
- bookmark: Third doc 1
document: $PDF_T1 # Environment variables are also expanded.
- bookmark: Third doc 2
document: $PDF_T2
The order of metadata
, paths
, and contents
above is not relevant.
The resulting PDF bookmark structure will be
|-First
|
|-Second
| |-Second doc 1
| |-Second doc 2
|
|-Third
|-Third doc 1
|-Third doc 2
Second
points to the same page as Second doc 1
, whereas Third
and Third doc 1
point to different pages, since Third
inserts pages on
its own.
With the configuration stored in sample.yaml
and the PDF files in place,
start the merging process by running
pdfconfig sample.yaml
This will merge the listed PDF files into sample.pdf
. For changing the
output name, simply append it to the end of the line; e.g. run
pdfconfig sample.yaml path/to/output.pdf
In Windows, use pdfconfig.exe
.
For more explanation, run
pdfconfig -- --help