Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api-v2): Add support for PDF files #1206

Merged
merged 101 commits into from
Nov 28, 2019
Merged

Conversation

benjamingeer
Copy link

@benjamingeer benjamingeer commented Feb 6, 2019

  • Update knora-base:
    • Add pageCount, dimX, and dimY to DocumentFileValue.
    • Make originalFilename and originalMimeType optional,
      because we only get them for images.
    • Increment knora-base version.
  • Update knora-api complex ontology:
    • Add documentFileValueHasPageCount, documentFileValueHasDimX,
      and documentFileValueHasDimY.
  • Remove internalSipiFileServerGetUrl and externalSipiFileServerGetUrl
    from Settings, because all file types can now be stored together in Sipi.
  • Change StandoffResponderV2 to look for XSL transformations at
    internalSipiBaseUrl (Sipi's config.imgroot).
  • Update Lua scripts for Sipi:
  • Add tests to KnoraSipiIntegrationV2ITSpec:
    • Create a resource with a PDF file.
    • Change a PDF file value.
  • Update docs.

Resolves #1202.
Fixes #1430.

Needs dasch-swiss/sipi#283.
Needs dasch-swiss/sipi#309.

@tobiasschweizer
Copy link
Contributor

I am in the office next week. So let me know if I can help.

@benjamingeer benjamingeer self-assigned this Mar 20, 2019
@subotic subotic changed the title Support non-image files in API v2 feat(api-v2): Add support for non-image files Mar 24, 2019
@subotic
Copy link
Collaborator

subotic commented May 23, 2019

This pull request has been mentioned on Discuss DaSCH. There might be relevant details there:

https://discuss.dasch.swiss/t/support-non-image-files-in-knora-api-v2/33/1

@benjamingeer
Copy link
Author

@subotic There's a test failure here because CacheServiceManager timed out waiting for Redis:

Screenshot 2019-11-27 at 09 27 26

So I increased the Redis connection timeout:

cf17688#diff-854c0ddadaf9cb6bd2c1c30a504ddb97R59

@subotic subotic modified the milestones: 2019-11, 2019-12 Nov 27, 2019
@benjamingeer
Copy link
Author

@tobiasschweizer Can you review this?

@tobiasschweizer
Copy link
Contributor

Ok, will do it now!

by adding IIIF parameters to `temporaryBaseIIIFUrl`. For example, to get
In the case of an image file, the client may now wish to get a thumbnail of each
uploaded image, to allow the user to confirm that the correct files have been uploaded.
This can be done by adding IIIF parameters to `temporaryBaseIIIFUrl`. For example, to get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the mapping between Sipi's response and SipiUploadResponse happen? Is this a spray.json config?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SipiUploadResponse is defined in KnoraSipiIntegrationV2ITSpec and is only used there. The client receives this response, and Knora never sees it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thx.

override def toOntologySchema(targetSchema: OntologySchema): DocumentFileValueContentV2 = copy(ontologySchema = targetSchema)

override def toJsonLDValue(targetSchema: ApiV2Schema, projectADM: ProjectADM, settings: SettingsImpl, schemaOptions: Set[SchemaOption]): JsonLDValue = {
val fileUrl: String = s"${settings.externalSipiBaseUrl}/${projectADM.shortcode}/${fileValue.internalFilename}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could Knora also create the fragment identifier? Is it possible to refer to a part of a PDF in an annotation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not currently possible, but something could be added to knora-base for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make an issue for this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1298,31 +1298,31 @@ class ResourcesResponderV2(responderData: ResponderData) extends ResponderWithSt
resource: ReadResourceV2 = resources.toResource(gravsearchTemplateIri)

_ = if (resource.resourceClassIri.toString != OntologyConstants.KnoraBase.TextRepresentation) {
throw BadRequestException(s"Resource $gravsearchTemplateIri is not a ${OntologyConstants.KnoraBase.XSLTransformation}")
throw BadRequestException(s"Resource $gravsearchTemplateIri is not a text file")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this context, a Gravsearch template is expected.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified the error message in b4acacf.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thx.

success, newFilePath = helper.filename_hash(fullImgName);
if not success then
server.sendStatus(500)
server.log(gaga, server.loglevel.LOG_ERR)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was gaga even defined? ;-)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course not. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything else wold have disappointed me ;-)

return
end

--success, check = fullImg:mimetype_consistency(originalMimeType, originalFilename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does Sipi check the actual mime type of the file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't in this Lua script; this code was already commented out, but I don't know why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be commented in again. Otherwise, the check is not performed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the integration test won't work anymore, I think. I commented it out. You are of course welcome to find a solution for the test that allows for the mime-type check to be added :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there will also be situations where a problem cannot be fixed at once. But we should keep track of such problems by creating issues.


server.log("upload.lua: wrote image file to " .. tmp_storage_file_path, server.loglevel.LOG_DEBUG)
else
-- It's not an image file. Just move it to its temporary storage location.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can Sipi check that the mime type is correct?

Copy link
Contributor

@tobiasschweizer tobiasschweizer Nov 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is only implemented for SipiImage in Sipi. Is there a more generic implementation needed for any file type? The current implementation relies on libmagic.

Copy link
Author

@benjamingeer benjamingeer Nov 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to have a more generic implementation. We could open an issue for this in Sipi.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please do so and assign it to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tobiasschweizer
Copy link
Contributor

tobiasschweizer commented Nov 28, 2019

In the logs (https://github.com/dasch-swiss/knora-api/pull/1206/checks?check_run_id=324265543) I see two 500 errors. Does this mean that fullImg:mimetype_consistency failed (success is not true)?

@benjamingeer
Copy link
Author

Does this mean that fullImg:mimetype_consistency failed?

Yes. Sipi says:

fullImg:mimetype_consistency() failed: SipiImage.mimetype_consistency(): Sipi image error at [/sipi/src/SipiImage.cpp: 255]: Unsupported file type: "tmp_9188444826479066987.bin

I guess that's why that code was commented out.

@benjamingeer
Copy link
Author

The file with the .bin suffix is created by FileUtil.createTempFile, which is called by the v1 resources and values routes.

@@ -124,6 +124,20 @@ if media_type == IMAGE then
return
end

local check
success, check = fullImg:mimetype_consistency(originalMimeType, originalFilename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you need to add

ocal submitted_mimetype
success, submitted_mimetype = server.parse_mimetype(originalMimeType)

local check
success, check = fullImg:mimetype_consistency(submitted_mimetype.mimetype, originalFilename)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added that in a1a37cc, but Sipi returns the same HTTP 500 error. I actually think that since this is an API v1 or Sipi bug, and this PR only adds functionality to API v2, there's no reason to hold back this PR because of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok , and you are going to change the v1 behaviour anyway. If your fix doesn't solve the problem, could you open an issue in Sipi?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, this problem was there before, and this PR doesn't affect it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but the code shouldn't have been just commented out in https://github.com/dasch-swiss/knora-api/pull/1459/files#diff-4a1a2aedd2d8b7dee2f824e33b243ca1R123

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok for me to make separate issues (Knora and/or Sipi) and move on with this PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but that's a different PR. You can't expect every PR to fix every bug that already existed before the PR. You're reviewing this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but that's a different PR. You can't expect every PR to fix every bug that already existed before the PR. You're reviewing this PR.

I agree. But this is kind of a critical issue (Sipi is expected to recreate the image file in its original format), this is why I kept insisting. If it had been a detail, it would not have mattered that much.

@benjamingeer
Copy link
Author

@tobiasschweizer
Copy link
Contributor

The file with the .bin suffix is created by FileUtil.createTempFile, which is called by the v1 resources and values routes.

I think this is this the file that Knora temporarily stores and informs Sipi about so it can read the file. So this doesn't work any more as of Sipi 2.0.1. I do not understand why.

@benjamingeer
Copy link
Author

Thanks for checking this very boring PR. :)

@tobiasschweizer
Copy link
Contributor

.> Thanks for checking this very boring PR. :)

I made me feel nostalgic ...good old api v1 times ...

@benjamingeer benjamingeer merged commit 4204ff7 into develop Nov 28, 2019
@benjamingeer benjamingeer deleted the wip/1202-upload-non-images branch November 28, 2019 11:09
@daschbot
Copy link
Collaborator

This pull request has been mentioned on Discuss DaSCH. There might be relevant details there:

https://discuss.dasch.swiss/t/support-non-image-files-in-knora-api-v2/33/2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants