-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow upload of PDFs to SIPI #1581
Comments
From the cines web site cited by @mrivoal , this PDF is considered as archivable (see the Validation tab). I used their tools to try to fix potential errors in the PDF (Correction PDF tab):
Both fixed files are now accepted by Sipi. |
we also used Acrobat to convert the file into a PDFa but Sipi does not accept it. |
@lrosenth Any news about that? 1/ is there a test in Sipi to not accept particular PDFs? Because a minima, without any PDF quality considerations I cannot import all the pdfs by using the route 2/ is there a well-known way to convert about 1000 pdfs. In this post I used PDFtk (no so easy to install on linux/mac os systems, but I think it is possible) and Ghostscript does not make the job correctly for all our pdfs. The Ghostscript solution seems to me more flexible, I used this command line (found somewhere...): Maybe you have some advices about these parameters? (I'm going to prepare some scripts to test all that and provide to you all our pdfs.) |
I think this is the wrong repository for this issue. The docker image of sipi you use contains a completely different set of scripts that are custom to Knora. The upload script that you are using is this one: https://github.com/dasch-swiss/knora-api/blob/develop/sipi/scripts/upload.lua. |
I've moved the issue to the |
@subotic Why? I thought this issue was about getting Sipi to validate PDF files. |
Isn't it about a problem with uploading PDFs? Since this logic is implemented in a |
Which logic? The Lua script just calls Sipi’s C++ functions to process the file. If Sipi can’t parse a particular PDF file, there’s nothing the Lua script can do about it. |
OK, I had a look at it. It requires at leastsome major changes in
Currently all upload files are being converted to a sipi image object and then converted to a JPEG2000 file. In order to cope with PDF's (and also .ttl's, .xnml's etc.) we have to build in a switch which checks the MIME type and performs
To rememer, PDF's that are in the images-folder can be served
|
No, https://github.com/dasch-swiss/knora-api/blob/develop/sipi/scripts/upload.lua#L116 |
Support for uploading PDF files was added in #1206. |
These changes were also announced on Discuss DaSCH: https://discuss.dasch.swiss/t/support-non-image-files-in-knora-api-v2/33/2 |
@benjamingeer did you try to reproduce the bug? |
@gfoo I haven't tried to reproduce it, but if it works with one PDF file and not with another PDF file, I think the problem has to be in the C++ code in Sipi. |
@lrosenth (or to whom it may concern) any news regarding this issue? We still have a lot of things to fix to be able to release Lumieres.Lausanne but if we can't import PDFs, then we can't release anything. |
I've reproduced this and can confirm that it depends on the content of the PDF file. Moved to dasch-swiss/sipi#319. |
@benjamingeer thanks for the test |
Is there any rules for Sipi upload route to accept or not a PDF?
I've an error for example with this file: Programme_Colloque_Bertrand.pdf
Using the binary, the PDF sounds ok:
But using the upload route (the call is stuck):
The text was updated successfully, but these errors were encountered: