Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knora integration #267

Merged
merged 25 commits into from
Nov 9, 2018
Merged

Knora integration #267

merged 25 commits into from
Nov 9, 2018

Conversation

lrosenth
Copy link
Collaborator

@lrosenth lrosenth commented Oct 19, 2018

I added the url http://iiiif.server/directory/imgid.jp2/knora.json which returns a json containing
the required image information:

{
"width": 512,
"height": 512,
"mimetype": "image/tiff",
"origname": "lena512.tif"
}

I added unittests for this feature and for getting info.json

Required for dasch-swiss/dsp-api#1011.

Copy link
Contributor

@benjamingeer benjamingeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just tested this by uploading a file using the upload.lua route we wrote:

https://github.com/dhlab-basel/Knora/blob/555c63142e36eb26fa2ab6310f47f515356bee4b/sipi/scripts/upload.lua

I used this command to upload a TIFF file:

curl -v -F [email protected] 'http://localhost:1024/upload?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.Onul6wAxhPDrV9KJh9qEpKBE4K9a3RXvtw_pgYODWic'

The upload route responded with:

{
   "B7bfRzyuFIm-D7EfW8LQnWn.jp2": "http://localhost:1024/tmp/B7bfRzyuFIm-D7EfW8LQnWn.jp2"
}

I then tried to use that URL with the knora.json route:

http://localhost:1024/tmp/B7bfRzyuFIm-D7EfW8LQnWn.jp2/knora.json

But it only returned width and height, not mimetype or origname:

{
"width": 771,
"height": 720
}

@lrosenth
Copy link
Collaborator Author

lrosenth commented Oct 29, 2018 via email

Copy link
Contributor

@benjamingeer benjamingeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we found out: this actually works, but when you convert the file to JPEG 2000 from Lua, the metadata is not added to the .jp2 file.

I think that when Sipi converts the file, it should also use SipiImage::checkMimeTypeConsistency to make sure that the submitted file extension matches the file’s real MIME type.

Copy link
Contributor

@benjamingeer benjamingeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the complex form of SipiImage.new, it works, but if we use the simple form, the original file metadata is not included in the converted file:

https://dhlab-basel.github.io/Sipi/documentation/lua.html#sipiimage-new-filename

Shouldn't the metadata always be included, even if we use the simple form of SipiImage.new?

@lrosenth
Copy link
Collaborator Author

lrosenth commented Oct 31, 2018 via email

@benjamingeer
Copy link
Contributor

we do not have the original name directly at disposition

But what about the original MIME type?

@lrosenth
Copy link
Collaborator Author

lrosenth commented Oct 31, 2018 via email

@benjamingeer
Copy link
Contributor

When reading (SipiImage.new()) the file, the mimetype is determined (using the magic library) and inserted

I don't think so. Using the complex form of SipiImage.new, the resulting jp2 file contains SIPI:density.tiff|image/tiff|sha256|6b188905b8def1f1158e8ef0263475c98dea512af4dc14dfca6e8f6cc1e53cb1|IGNORE_ICC|NULL.

Using the simple form of SipiImage.new, the resulting jp2 file doesn't contain SIPI: or image/tiff.

@lrosenth
Copy link
Collaborator Author

lrosenth commented Oct 31, 2018 via email

@lrosenth
Copy link
Collaborator Author

After going through the code and thinking it over I am a bit reluctant to include an automatic generation of mimetype & checksum when reading a file that does not include this information in the header. The reason is as follows:

The main application of SIPI is to serve images using the IIIF standard. The master image the server accesses for this purpose should be J2K, but can also be JPG's, TIFF's or even PNG's.
The process of reading and possibly transforming the image must be as efficient as possible in order to reduce latency in loading the image(s) (which is especially important for serving to openseadragon). Thus, given a J2K as base image, SIPI reads only the necessary fragments of the (possibly large) image, transforms them to JPEG and serves them.

If such a master image has not been produced by SIPI (and, we as well as others will have such images) it will not contain the additional – non-standard – information. Thus, while reading each fragment SIPI would have to determine the mimetype (which means an additional full file access [open, read, analyze, close] as well as determining the checksum which can be very time consuming for large files) and turn down efficiency while serving IIIF conform images. On the other hand, requiring that each master file has to have this additional metadata would certainly reduce the versatility and usefulness of SIPI.

Therefore I believe the best way is that we enforce the addition of the additional metadata if SIPI is used to convert an image (which is already the case on the command line and can be done for uploads in the Lua-script as it is dome now). But SIPI still can deal with images that do not have this information.

In the special case of the integration to Knora, we have to oblige the users to use SIPI for image conversion (even if I'm a bit hesitant – but I think this is the best solution) because Knora/SIPI must have this information present. By the way this implies that we have to reconvert all old J2K from the old salsah because at this time the J2L-images did not include this information. It's in the MySQL database. But I can live with this ;-)

So I would suggest that we leave SIPI in this respect how it is at the moment: Command-line transformation and uplaods with the knora-upload route add this information to the header, but SIPI (standalone) can deal with images without this information. For Knora we will require that the images are being produces using the upload or sipi command line. J2K produced with Kakadu and other J2K-libraries will have to use SIPI to add this meatdata in order to be integrated into SIPI..

What do You think?

@benjamingeer
Copy link
Contributor

Therefore I believe the best way is that we enforce the addition of the additional metadata if SIPI is used to convert an image (which is already the case on the command line and can be done for uploads in the Lua-script as it is dome now).

If I understand correctly, when you convert an image from Lua, you must use the Lua function SipiImage.new. The problem is that currently, Sipi will add the metadata only if you use the complex form of that function. If someone uses Lua to convert an image, and they use the simple form of the Lua function SipiImage.new, the metadata is not added.

I agree that we should enforce the addition of this metadata, and I think the best way to do this would be to have the simple form of the Lua function SipiImage.new add this metadata, just like the complex form does.

Or maybe I'm not understanding what you're saying.

@benjamingeer
Copy link
Contributor

benjamingeer commented Oct 31, 2018

Wait, do you mean that every time Sipi serves a IIIF image, the Lua function SipiImage.new is called?

If so, then perhaps we should separate the "simple" and "complex" forms of that function into two different functions, which could be called newForServing and newForConversion or something like that.

To me, "enforcing" means that it's not possible for the programmer to do it wrong. So I think that it should not be possible, in Lua, to convert an image without including as much metadata as possible. My understanding has been that one of Sipi's goals is to always preserve image metadata when converting images.

@lrosenth
Copy link
Collaborator Author

lrosenth commented Oct 31, 2018 via email

@benjamingeer
Copy link
Contributor

But we should give the option to NOT include it for performance reasons.

Can you clarify when that would happen?

@lrosenth
Copy link
Collaborator Author

lrosenth commented Oct 31, 2018 via email

@benjamingeer
Copy link
Contributor

Usually Sipi serves the images without having to use the Lua function. But in some special cases this could be necessary. In these cases the lua function SipiImage.new() must be used in the preflight script. Therefore I would add the additional metadata as an option with default ‘true’ (=add it). Si in these very special special cases it is possible to skip this step

OK, that makes sense to me, as long as this is clear in the documentation.

@lrosenth
Copy link
Collaborator Author

lrosenth commented Oct 31, 2018 via email

@benjamingeer
Copy link
Contributor

For dasch-swiss/dsp-api#1011, I need a Lua function like server.fs.mv, which just moves a file from the temporary directory to a permanent storage directory. Could you please add this, too?

@lrosenth
Copy link
Collaborator Author

lrosenth commented Oct 31, 2018 via email

@benjamingeer
Copy link
Contributor

Sorry, I have another request. Sipi needs to be able to delete old temporary files. So I'll need two more Lua functions:

  • A function to get the last modification date of a file (in seconds since the epoch)
  • A function to get the current time (in seconds since the epoch)

@lrosenth
Copy link
Collaborator Author

lrosenth commented Nov 1, 2018 via email

@lrosenth
Copy link
Collaborator Author

lrosenth commented Nov 5, 2018

Done!

Benjamin Geer added 3 commits November 6, 2018 13:47
- Add server.fs.readdir C++ function for Lua.
- Add config setting for maximum temp file age.
- Remove unimplemented log levels TRACE and OFF from docs and configs.
- Fix incorrect log level names in Lua scripts.
- Fix inconsistencies in log level names (use syslog's names everywhere).
- Add e2e test of clean_tempdir().
@benjamingeer
Copy link
Contributor

benjamingeer commented Nov 7, 2018

Two questions:

  1. In our upload.lua script for Knora, we first save the uploaded file using server.copyTmpfile. Then we load it into a SipiImage using SipiImage.new, and finally we convert it to JPEG 2000, saving it in another file. Do we really need to save the upload to a file first? If the upload is already in memory, would it be possible to construct a SipiImage directly from the uploaded data in memory? For example, could SipiImage.new take an element of server.uploads as a parameter?
  2. I would like to have upload.lua check the uploaded file's MIME type before converting it. If the file is not an image, we shouldn't try to convert it to JPEG 2000. And I'd like to reject, say, Windows .EXE files and not even save them in the temporary directory. Could we make a C++ function that we could call from Lua, which would use libmagic and return the file's real MIME type? If we do what I'm suggesting in (1), this would have to use magic_buffer (to look at the uploaded data in memory) instead of magic_file.

@lrosenth
Copy link
Collaborator Author

lrosenth commented Nov 7, 2018 via email

Benjamin Geer and others added 10 commits November 8, 2018 11:14
- Log all internal server errors in send_error.
- Fix documentation syntax.
…knora-integration

# Conflicts:
#	scripts/upload.lua
#	test/_test_data/scripts/upload.lua
See https://tools.ietf.org/html/rfc7231#section-4.3.5:

"A payload within a DELETE request message has no defined semantics;
sending a payload body on a DELETE request might cause some existing
implementations to reject the request."
Copy link
Contributor

@benjamingeer benjamingeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything seems to work now, thank you! I think we can merge this if it looks OK to you.

Copy link
Contributor

@benjamingeer benjamingeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, actually just one more thing:

Usually Sipi serves the images without having to use the Lua function. But in some special cases this could be necessary. In these cases the lua function SipiImage.new() must be used in the preflight script. Therefore I would add the additional metadata as an option with default ‘true’ (=add it). Si in these very special special cases it is possible to skip this step

Did you change your mind about this? I can't find it in the code. From the documentation, it looks as if the simple form always includes the additional metadata if the image comes from an upload, but not if it comes from a file. Is that right? (So I don't have to use the complex form if I pass an upload index to SipiImage.new?)

@lrosenth
Copy link
Collaborator Author

lrosenth commented Nov 9, 2018 via email

Copy link
Contributor

@benjamingeer benjamingeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, great, I just clarified the docs about that a little. Now I think this is really OK to merge, if it seems OK to you.

@lrosenth lrosenth merged commit 712d81f into develop Nov 9, 2018
@lrosenth lrosenth deleted the knora-integration branch November 9, 2018 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants