-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
override instead of wrap TesserocrRecognize in other processors #191
Conversation
- instead of wrapping a foreign instance and delegating the `process` call, inherit `process` but reparse parameters after rewriting them in the constructor - only overwrite the `process` docstring (This is necessary because the inner instance had its own `workspace` instance. Also, we don't want to rewrite `moduledir` and the version parser everywhere.)
Ah, it actually did! (We just had no event on this repo trigger the CI since the last get_processor / run_processor changes in core.) |
Turns out I had to do a lot more to fix the CI:
|
Ok, I ran into OCR-D/core#998. Now, I could wait for the fix to be merged, or avoid the |
Codecov Report
@@ Coverage Diff @@
## master #191 +/- ##
==========================================
- Coverage 26.97% 0.00% -26.98%
==========================================
Files 11 12 +1
Lines 1416 1377 -39
Branches 333 346 +13
==========================================
- Hits 382 0 -382
- Misses 981 1371 +390
+ Partials 53 6 -47
... and 1 file with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested this code together with the processing-server PR and everything works now. So this PR seems to solve my problems regarding the tesser-ocrd-processors.
Additionally I have briefly gone through the python code changes (I ignored all ci-cd changes). I didn't find anything which I could remark here (except my question).
@kba if you make a release out of OCR-D/core#999 today, I'll revert b357217 here and require that new core version. |
@kba if you make a release out of OCR-D/core#999 today, I'll revert b357217 here and require that new core version. v2.47.0 has been released now. |
Too early, because we need OCR-D/core#1004. |
Thanks! I'll adapt...
This is not about Docker, but Python API. |
This reverts commit b357217.
Hotfix incoming. |
Odd. CI failure did come back:
That's despite |
Aha! It turns out that my OCR-D/core#999 was premature: we also pass an unnecessary
Now, the big story here is: none of these places is needed (or used) – neither currently, nor for OCR-D/core#974 nor OCR-D/core#884! |
Other questions (hopefully it is ok to ask it here):
|
The reason is that here in ocrd_tesserocr many processors are basically just re-parameterizations (simplifications) of the "goliath" ocrd-tesserocr-recognize. This helps the user cope with complexity (and avoids code duplication while sustaining the older single-step processors) – see README. To implement this, you have to know that ocrd.Processor does parameter parsing/instantiation/validation on the constructor: So you need to
Now, in the old pattern, where every ocrd.Processor's subclass constructor just overwrites the kwarg for
Yes, it must. Otherwise, the parameter validator in the superclass won't see the actual cmdline values. Also, the non-processing contexts (help, dump json, show resources) in the superclass constructor need to see the actual tool json.
We could of course change the API in core to separate the parameter instantiation, but that would entail changing all processors (lots of diverse codebases with various maintainers). We have to do that for the new
Yes. We could define a method (say) For the processor delegation pattern then, since in the new API, we will also have a |
With this PR (and ocrd 2.48.0) I seem to have problems relating to $TESSDATA_PREFIX:
|
You need |
Ah. Never had it working on this system, and I wrongly assumed it was a new bug. I copied everything from /usr/share/.../tessdata for now, that fixed |
- improve subclassing - isolate workspaces from each other - use pytest-xdist for parallel tests - workaround for os.getenv failure after tmpdir removal, caused by Processor.__init__'s os.chdir - also test additional processors and parameters - also validate results (to some degree) - rely on (and ensure) Fraktur model being available
@kba this time, codecov broke – IIUC it says it now has 0% coverage, because I have added more processors to the test 🙄 – do you have any idea what's going on? |
Really confusing, I think the problem is that the last time code coverage was calculated on master, only four of the five python versions uploaded the results to codecov successfully. Now it cannot match the coverage reports to the four it last had in master. I cannot verify this because CircleCI does not retain logs of the last run on master. In other words, I think this will clear up once we merge to master. The code coverage calculation looks right:
(for python3.9) |
Understood. Thanks! I'll merge and release then. |
fixes #190
Perhaps we should also add a test case for the workspace mechanics. I have a feeling this should have failed long ago even without the Processing Server's instance caching.