-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calamari 2.2 #61
Comments
@maxnth @andbue @ChWick can answer this more accurately but there were quite a few refactorings and performance enhancements since the last 1.x release v1.0.5. It is not time-critical to upgrade as soon as possible but I think it would be good to get started assessing what the benefits are, what has changed and how to proceed with adapting. |
#54 is possibly related |
Hi, I don't think it should be too complicated to update ocr_calamari to 2.0. The whole preprocessing stuff is loaded according to the definition in the model, so I was wrong to assume you're somehow circumventing it – sorry about that! If it helps, you could have a look at my client that caches preprocessed lines in a hdf5 file. I wrote a reader and included it in the DataReaderFactory here. Prediction happens here. Other than setting up the dataset and the removal of the preprocessing (which you should avoid), this is in most parts taken from predict.py anyway. |
Just wanted to note that Calamari 2 depends on tfaip which requires Python >= 3.7 – which would remove support for the default Python version on OCR-D's (still) default target Ubuntu 18. |
I have opened Calamari-OCR/calamari#304 because Calamari 2.1.x depended on TF 2.4.x (PyPI-incompatible with Python 3.9...), but @andbue already updated Calamari 2.2(!) to remove this restriction. 👍 With this TF version hell I think I'll first update the test rigging to test on all Python versions 3.7-3.9 (maybe even 3.10). |
Heads up: I'm working on this |
@mikegerber, do already have something you could share (as a feature branch)? I guess there are multiple API changes to cope with, plus perhaps a need to deal with model migration? |
Hi everybody, |
Sorry for not keeping anyone up to date: I plan to work on this further in the coming week! |
Do you have any news for us @mikegerber? Have you looked at the native PAGE-XML output of Calamari 2 – is it re/usable? |
Sorry... I have been neglecting this. I try to finish this soon after my vacation. PRs welcome though, if they come in the meantime |
Combination of bad time management and serious illness (for months!) and the following back log lead to more delay... |
Blocked by #84 (the GPL issues), as it seems. 🙄 |
@bertsky in #87 (#87 (comment)):
This is all interesting, but does Calamari 2.x now have a valid license? Otherwise this is still blocked and I will not work on this. |
I don't get why the switch to GPL would be such a blocker. Is it for you personally or by requirement? Anyway, if you give me a definite answer then I could decide if I want to take over from here. |
There are multiple issues:
IF Calamari 2.x
THEN we could relicense to GPL. (Calamari 1.x is fine as it does not use tfaip.) |
I'll discuss abandoning maintainership of ocrd_calamari with @cneud, but this is going to wait until at least October (I have major surgery in September). The situation with Calamari 2.x is such that I won't use it (and a future ocrd_calamari for 2.x) due to the unresolved licensing problems; I can't legally use it. |
Just to be as precise as possible here: you cannot use it as long as there are legal inconsistencies, or as soon as GPL kicks in? |
I'm not a lawyer, and perhaps we should discuss this (after my health stuff :)) in a video call soon, this is how I see the situation. a. Calamari 2.x's license is invalid. It simply can't have an Apache license while using the GPL library tfaip.* In the hypothetical situation of Calamari going GPL**, I personally do not have a problem with a GPL'ed ocrd_calamari. There's some potentially blocking red tape involved (my employer and all contributors must agree), but I - as one of the main contributors - would do it. * This is clear IMHO, and it doesn't matter that it's Python and not using |
(Deliberately avoiding terms like "enforcing", I think that was used in the wrong way in discussions I'd also like stress I do not care that much about licenses, it just seems to be a serious and show-stopping "bug" that there's this problem. As long as it's open source it doesn't matter to me, except that I would avoid GPL for library code, because it would cause exactly this kind of legal situation.) |
Another way around it: Isolate GPLy code. If you just os.system('gpl-binary') then you also have no problem, when gpl-binary is GPL. Maybe worth checking, but it will be a pain to maintain properly. |
Short answer: Because the license is invalid. If it were GPL there would be the possibility of us (the developers of ocrd_calamari) to update IF we move to GPL too. |
Thanks for being so precise and sharing your concerns! I suggest we try to convince @andbue and @chreul to go GPL with Calamari-OCR, and then proceed with the OCR-D wrapper for 2.x here. Native PAGE-XML support (via dataset type for input and output) does help, but I'm not sure how we can ensure that OCR-D's incremental annotation principle can be guaranteed – we must not throw away information, even if it's irrelevant to the OCR. Also complicating the matter is the fact that OCR-D requires using to AlternativeImage on all hierarchy levels and adhering to Can you please elaborate on the state of your migration (esp. around these issues) so far (or back when you were working on it)? |
There seems to be another issue with that: Calamari-OCR/calamari#3 |
This 100% blocked by these licensing issues, I will not work on it further until these are resolved. |
Like I said on that thread:
So it's not another issue AFAICS.
I understood that much, but I would really like to know how far you got so far. (It would help in gauging what's the best way to proceed currently – bringing TF SavedModel format to old Calamari versions vs. bringing OCR-D to 2.x next.) |
Calamari 2.0 is out.
I don't see benefits from updating the dependency, other than staying uptodate/compatible.
The text was updated successfully, but these errors were encountered: