Skip to content

Commit

Permalink
README: history and motivation
Browse files Browse the repository at this point in the history
  • Loading branch information
matteosecli authored Oct 30, 2017
1 parent 00c2d10 commit ffefca9
Showing 1 changed file with 25 additions and 0 deletions.
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,28 @@ Convert 'input.pdf' in PDF/A-1B format and validate the result:
```
./pdf2archive --validate input.pdf
```

## History & Motivation
This script was born as a necessity, when I had to convert the LaTeX-produced PDF of my MSc Thesis into a PDF/A-1B.

Once upon a time, the delivery of the Thesis had to be done manually, by burning a CD-ROM with the Thesis PDF on it. I don't need to say that it was extremely old-fasioned and inefficient, as you had to deliver the CD-ROM to the secretariat in person. Finally, in 2015, my university decided to activate the online submission of the PDF: you just had to upload your PDF and you were done, completely hassle-free.

Then one year ago, some _enlightened mind_ in whoever knows what administrative office, decided that a regular PDF was not easy enough; so, the university began to require the much more _satanic_ PDF/A-1B. Of course, they had to provide a set of instructions for us mere mortal, so that we could produce valid PDF/A-1B files; and indeed they did, by uploading a [_fantastic document_](http://www.biblioteca.unitn.it/282/tesi-di-laurea). If you took the (click)bait and read the PDF (not PDF/A-1B, eh!) instructions at the previous linked page, you might have noticed the _absolute completeness_ of the information contained in it: there are instructions to transform a PDF into a PDF/A-1B by either using a Windows-only free program (yeah, I know) or an obsolete OpenOffice plugin that doesn't work anymore or _paid_, commercial programs that work at most only on Windows and MacOS. No free, cross-platform alternative because hey, _everyone_ loves Windows! Naturally, you can directly produce a PDF/A-1B version of your Thesis. The document lists some easy instructions to perform a direct export into a PDF/A-1B from either Microsoft Word (or Excel, because there are people who of course write their thesis in Excel) or OpenOffice. Because _everyone_ on Earth, especially people who do Physics or Maths, write their thesis in Microsoft Word... they look _sooo beautiful_, in particular when you have to put footnotes, citations, table of contents, when Word spreads the text in a page in a zebra-style, and when you write those amazing equations in Comic Sans that get rendered as 10 DPI jpeg's. "And people who use LaTeX"? "Latex? What latex? I don't do that kind of dirty sex stuff"! - would say the guy who wrote that document.

So you could imagine me and my friends, on the last available day for the Thesis delivery, still struggling trying to figure out how to convert. There is a [nice site](https://docupub.com/pdfconvert/) that converts PDF's into PDF/A-1B files, but there are some points:
+ your Thesis gets filled with metadata from that site, which is not nice for an official document
+ the file size limit is 10 Mb, so if you do a more experimental Thesis which is full of images you're out
+ this solution depends on someone else resources; if the site goes down tomorrow, you're in deep s***
+ it only works online, no offline alternative if you're on the move
+ you have to send personal data to an unknown site
+ you don't know what operations are being performed on your file and your data on the other side of the line

By digging around on Google, you can find people saying that you can perform the conversion via Ghostscript by just turning on a couple of switches; unfortunately, this doesn't work (the online system, Esse3, keeps saying that the file is not valid) and the matter is slightly more complicated and poorly documented. The failure in producing a valid PDF/A-1B is connected to the complex set of requirements needed, especially font embedding, metadata and color space. This script is just a collection of all the things one should to in order to obtain (in most of the cases) a valid PDF/A-1B document from a regular PDF file, in the hope that it simplifies all the process. It also contain a [free, open source validator](http://verapdf.org) that can validate the resulting file (this validator was not included in the official instructions I've linked before, which instead point to paid, commercial products).

There are still cases in which this script produces valid PDF/A-1B files which are rejected by the system because they are "too complex" and the validator used by Esse3 (our 80's-style online system programmed by [Topo Gigio](https://en.wikipedia.org/wiki/Topo_Gigio)) goes in timeout; unfortunately there is no solution that we can implement, as it's a server problem. The suspect is that they're using the commercial version of an [online validator](https://www.pdf-online.com/osa/validate.aspx) (which seems to be the only free one still working), which has the same timeout problem when validating "too complex" PDF files. This small script, instead, shows that
+ you just need a couple of lines to implement a conversion script
+ the university could just require a regular PDF file and perform the conversion with _two_ Ghostscript commands, hassle-free for the students
+ this process would be _free_, as Ghostscript is open-source
+ the inclusion of a validator is trivial, and the _free_, open source validator included here (in contrast to the validator they're using, which is _probably_ a commercial solution) can easily handle "too complex" PDF files without going in timeout

So, now the question is: what is the university still waiting for?

0 comments on commit ffefca9

Please sign in to comment.