-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
follow-up to dev mailing list discuss + issue #3998: RFC: Consistent tprintf() use, ERROR: & WARNING: prefixes. #4020
base: main
Are you sure you want to change the base?
Conversation
…stdout from training utils (Issue tesseract-ocr#3998) Here's the extract/backport from my master branch: - use tprintf() for all error/warning/info/debug-diagnostics output --> fprintf(stderr, msg) --> tprintf(msg). This also anticipates migrating tesseract to using the fmt library (C++20) as done by @stweil in a dev branch. - all error messages are prefixed with 'ERROR: '. This enables users, f.e., to grep/filter the stderr/log using a regex like '/^EERROR: /' to filter out the error messages. Application code which overrides the tprintf() API is another example: there error messages can be easily picked up and sent to different monitor channels / GUI windows / etc. in environments where there's no stdout/stderr (MSWindows GUI) or where end users are not accustomed to watching stdout/stderr. - ditto for 'WARNING: ' level messages. CAVEAT: this is a manual transfer from my master branch + code review whilee I was at it. I hope I got them all... :-)
Notes:
|
… course, one only sees it when scrolling through the pullreq after the fact.
This is not right. |
Ok. Do note however, that a few of those prefixes aren't really prefixes as they are preceded by a "\n" that's part of the same message string, so a function will not cover 100% of cases. 😉 Given that, I suggest I do a macro as then the result remains the same at compile time, e.g. BTW: definitely not an ERROR() or error() macro as that has a huge chance of clashing with foreign code. ERROR, for example, is a 100% collision with a system-level ERROR define in Windows (been there, seen that, t-shirt, elsewhere) Therefore I propose |
BTW: there are two important levels for the tprintf() messages:
and the rest is not prefixed (ok, a rare 'INFO:' may be observed; I'm only consistent 95% of the time 😄 ) so all info/debug messages come without a prefix, like before. Meanwhile the important stuff (errors and warnings) can be easily filtered/routed in applications, where those are relevant to report/log without being obscured by the diag output noise. If you are of a different opinion re the above levels & prefixes 'rule', I'm all ears and I'll see how this patch can be tweaked. |
If current |
@egorpugin : before I tackle this, there's also I'm myself of two minds about this: I could do it as a If we can get some kind of a management decision on this refactor action, that would be great. 😄 Then I can do it in my own fork and backport it, like the stuff done so far, but I'd rather do the refactor once as it's... some work. And finding out I did it in a direction that's not liked, well... that'll cost me later for a feature I don't mind otherwise while I keep tracking your mainline repo: increased expected cost of git merge collisions. 😭 Anyway, so's you know why I ask this. Thanks! 👍 Side note: OK or do you want it done differently? Footnotes
|
Just leave it as is with In the second part you are talking about more like full featured logger rather than some one-function-printer. |
Ok, if I understood that correctly, you want to keep this mostly as-is. I'll do the tprintf() et al -> log_error() et al migration as a separate pullreq later. BTW: the key argument there is to NOT have spdlog/glog/whatever in tesseract itself, but to keep tesseract mostly as-is, BUT provide a somewhat easy way for OTHERS to modify or hook tesseract up to such systems if THEIR use of the tesseract library requires such measures. In other words:
Hope I made myself clear on the goal there -- which is ultimately what I need / use. Apologies for any confusion I might've caused. |
…s expected. Cause: omission in backporting this from tprintf+fmt code. Caught by CI.
How do you imagine hooking? |
Something simple. After all, tesseract (the application) doesn't need the fancy footwork, nor do many uses as a library. Currently what I've got is a blunt, hardcoded override in tprintf.cpp, but that's just for now as I'm still in the testing phase and mostly on the command line myself. In a while tesseract will be incorporated in a server, where you can fire requests at it. Which is where it becomes important to have tprintf/tlog diagnostics available per tesseract session1. Haven't coded it yet, but I reckon I might need access to the tesseract instance as an aide to identify which session we're accessing the logging/diagnostics of, plus some application-specifics (to be determined). Classically I'd do it as one or more C-style function pointers (callbacks), to be 'registered' via a new Tesseract API. But that's a little convoluted/messy re cleanup, and having had a look at current tprintf.cpp it looks like proper file closing ( Right now I'm considering doing it with a C++ class instance:
That way, little changes for tesseract itself; maybe I could clean up the current tprintf/tlog a little, but not too much. ( Thus I get myself a cleaner interface that's easier to use and maintain once I create the server: that one can do it the way I like then. (The idea there is to use SQLite for a lot of stuff (fast & cheap storage cross-platform and also suitable for non-server / desktop machines), so current thought is to dump the log/diagnostics in SQLite as well, so callers can inspect after the fact by firing queries at the server for whatever they want. Get the image. Get the OCR output in format X. Get the thresholded image or other intermediate. Get the diagnostics/log for it. That sort of thing. Anyway, all that's is out of scope for tesseract and no running code yet.) I could do a 'it might look like this' RFC/pullreq in the next bunch of weeks: I need to field-test this anyway and then you & others can have a look and comment/steer it towards possible integration (for the virtual base class + default class def that equals current behaviour). Any which way, I agree it shouldn't be made part of this pullreq. Better for me to file a new / follow-up PR for this. (Which was why I did the "ERROR: ..." stuff like I did; some of the interfacing is complicated enough to not do it all in one go. 😉 ) Footnotes
|
follow-up to dev mailing list discuss + issue #3998: RFC: Info messages to stdout from training utils (Issue #3998)
Here's the extract/backport from my master branch:
use tprintf() for all error/warning/info/debug-diagnostics output --> fprintf(stderr, msg) --> tprintf(msg). This also anticipates migrating tesseract to using the fmt library (C++20) as done by @stweil in a dev branch.
all error messages are prefixed with 'ERROR: '. This enables users, f.e., to grep/filter the stderr/log using a regex like
/^ERROR: /
to filter out the error messages.
Application code which overrides the tprintf() API is another example: there error messages can be easily picked up and sent to different monitor channels / GUI windows / etc. in environments where there's no stdout/stderr (MSWindows GUI) or where end users are not accustomed to watching stdout/stderr.
ditto for 'WARNING: ' level messages.
CAVEAT: this is a manual transfer from my master branch + code review while I was at it. I hope I got them all... :-)