-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ocrd workspace malfunctioning #1148
Comments
The The reason for the problem here is that there is a I think I'll work around this by just ignoring files unrelated to pages. |
I see and it makes sense. Why are such files not problematic when running an ocr-d processor over the workspace? How do processors handle such files? |
Such files are not problematic, I just didn't account for them in the We don't have any processors that take document-global files as input but there is no fundamental reason why non-page-specific files were a problem. For example, if you add a bunch of files to a grp We do have ocrd_pagetopdf which produces such files, though. |
We do have various processors which handle document-wide files:
There's even an old spec issue about that. |
I just wanted to emphasize that document-wide files are not a problem per se, we have processors that produce them and you can process arbitrary files, which might be document-wide, as long as they are in a file group. You just cannot use the
We do disallow multi-page TIFF files but that was mostly to avoid making the I do see the benefit of supporting PDF as input, that is a very common use case. I see no reason why we could not have a
Indeed, sorry this has been open for so long. I'll answer over there. |
Note that |
To reproduce the issue, use the following mets. Download the MAX file group.
Output:
(venv38-operandi) mm@MM-Notebook:~/repos/ocrd_benchmarking/VD17$ ./build_workspace_zip.sh None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None
The images of the MAX file group are downloaded successfully, regardless of the
None
. Then:For some reason, the
None
values are returned to thesorted()
method which leads to that error. The issue seems specific to workspaces that have hashes forpage_id
.The text was updated successfully, but these errors were encountered: