Skip to content

PavlosMelissinos/pdfboxing

This branch is 9 commits ahead of, 70 commits behind dotemacs/pdfboxing:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
Pavlos Melissinos
Oct 11, 2021
5d76933 · Oct 11, 2021
Jul 13, 2021
Dec 12, 2013
Apr 4, 2015
Oct 11, 2021
Oct 9, 2021
Jan 2, 2021
Jul 8, 2019
Oct 9, 2021
Jan 15, 2021
Jan 2, 2021
Sep 4, 2020

Repository files navigation

pdfboxing

Clojure PDF manipulation library & wrapper for PDFBox.

  • "Clojure CLI"
  • "Leiningen version"
  • "Continuous Integration status"
  • License
  • Dependencies Status
  • Downloads

Usage

Extract text

(require '[pdfboxing.text :as text])
(text/extract "test/pdfs/hello.pdf")

Extract text from specific regions

(require '[pdfboxing.text :as text])
(let [areas [{:x           0
              :y           100
              :w           350
              :h           50
              :page-number 0}
             {:x           0
              :y           580
              :w           540
              :h           100
              :page-number 0}]]
  (text/extract-by-areas "test/pdfs/clojure-1.pdf" areas))

results in

=> ("Clojure is a dynamic programming language\n" "Rationale\nFeatures\nDownload\nGetting Started\nDocumentation\nClojureScript\nClojureCLR\n")

Then you can easily turn the result into a map using zipmap to get the following:

;; Result of (zipmap [:description :links] text-extract)

{:description "Clojure is a dynamic programming language\n", :links "Rationale\nFeatures\nDownload\nGetting Started\nDocumentation\nClojureScript\nClojureCLR\n"}

Merge multiple PDFs

(require '[pdfboxing.merge :as pdf])
(pdf/merge-pdfs :input ["test/pdfs/clojure-1.pdf" "test/pdfs/clojure-2.pdf"] :output "foo.pdf")

Merge multiple images into single PDF

You can use either merge-images-from-path for providing images in form of vector of string paths or merge-images-from-byte-array to provide them as a vector of byte arrays. Each image will be inserted into its own page.

(require '[pdfboxing.merge :as pdf])
(pdf/merge-images-from-path ["image1.png" "image2.png"] "output.pdf")

Split a PDF into mutliple PDDocuments

 (require '[pdfboxing.split :as pdf])

List of PDDocument pages 1 through 8

 (pdf/split-pdf :input "test/pdfs/multi-page.pdf" :start 1 :end 8)

Splits the PDF into single pages as a list of PDDocument

 (pdf/split-pdf :input "test/pdfs/multi-page.pdf")

Splits the PDF in half and writes them to disk as multi-page-1.pdf and multi-page-2.pdf

 (pdf/split-pdf-at :input "test/pdfs/multi-page.pdf")

Splits into two PDFs, the first having 5 pages and second has rest

 (pdf/split-pdf-at :input "test/pdfs/multi-page.pdf" :split 5)

List form fields of a PDF

To list fields and values:

(require '[pdfboxing.form :as form])
(form/get-fields "test/pdfs/interactiveform.pdf")
{"Emergency_Phone" "", "ZIP" "", "COLLEGE NO DEGREE" "", ...}

Fill in PDF forms

To fill in form's field supply a hash map with field names and desired values. It will create a copy of fillable.pdf as new.pdf with the fields filled in:

(require '[pdfboxing.form :as form])
(form/set-fields "test/pdfs/fillable.pdf" "test/pdfs/new.pdf" {"Text10" "My first name"})

Rename form fields of a PDF

To rename PDF form fields, supply a hash map where the keys are the current names and the values new names:

(require '[pdfboxing.form :as form])
(form/rename-fields "test/pdfs/interactiveform.pdf" "test/pdfs/addr1.pdf" {"Address_1" "NewAddr"})

Get page count of a PDF document

(require '[pdfboxing.info :as info])
(info/page-number "test/pdfs/interactiveform.pdf")

Get info about a PDF document

Such as title, author, subject, keywords, creator & producer

(require '[pdfboxing.info :as info])
(info/about-doc "test/pdfs/interactiveform.pdf")

Draw lines on a PDF document

Supply a PDF document, a name for the output PDF document, the coordinates where the line should be drawn along with the page number on which the line should be drawn

(require '[pdfboxing.draw :as draw])
(draw/draw-line :input-pdf "test/pdfs/clojure-1.pdf"
                :output-pdf "ninja.pdf"
                :coordinates {:page-number 0
                              :x 0
                              :y 160
                              :x1 650
                              :y1 160})

Compatibility with PDFBox's PDDocuments

The following functions referenced above have direct compatibility with PDFBox's internal PDDocument type:

  • text/extract
  • pdf/split-pdf
  • form/get-fields
  • form/set-fields
  • form/rename-fields
  • info/page-number
  • draw/draw-line

This allows you to substitute each filepath (of each function's input) referenced above with a PDDocument type. This is helpful for example in the case that you were to want to split a PDF up by pages and then extract the text from only the 3rd page:

(require '[pdfboxing.text :as text])
(require '[pdfboxing.split :as split])
(-> (split/split-pdf :input "test/pdfs/multi-page.pdf")
    (nth 2)
    text/extract)

About

Nice wrapper of PDFBox in Clojure

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Clojure 100.0%