-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suppress known git progress logs #259
base: master
Are you sure you want to change the base?
Suppress known git progress logs #259
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #259 +/- ##
==========================================
+ Coverage 98.73% 98.93% +0.19%
==========================================
Files 47 48 +1
Lines 2215 2247 +32
==========================================
+ Hits 2187 2223 +36
+ Misses 28 24 -4
☔ View full report in Codecov by Sentry. |
With a more precise logic for matching git progress report
this sounds a little too complicated... could you please point me to one of those logs? I wonder if we know for sure that it is going through that DataLad's GitProgress handler -- i.e. does it have all the normal prefixes for Datalad's logger? |
Here are a few examples of the logs.
You can find more examples of these logs by checking out The logic of this filter is based on the logic implemented by Datalad to recognized a git progress report. You can find this logic at https://github.com/datalad/datalad/blob/adcfcb6ae6e423a1860b7dde203c8caad32d6fb3/datalad/support/gitrepo.py#L496-L629. |
Co-authored-by: Yaroslav Halchenko <[email protected]>
ok, we troubleshoot it until its death -- PR against datalad: datalad/datalad#7521 . So, looking forward do following # in future - this should be sufficient
import datalad
datalad.enable_librarymode()
# for now:
import logging
datalad_lgr = logging.getLogger("datalad")
for h in datalad_lgr.handlers:
datalad_lgr.removeHandler(h) |
This reverts commit d0d6270.
dl_gitrepo_lgr = logging.getLogger("datalad.gitrepo") | ||
|
||
# Add a filter to the "datalad.gitrepo" logger to suppress known git progress reports | ||
dl_gitrepo_lgr.addFilter(SuppressKnownGitProgressReport()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't need any longer right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, it is still needed. I added the solution in your previous comment. However, it only removed the duplications of the "start messages" such as "Start counting objects".
I also discovered that the log level setting,
datalad-registry/docker-compose.dev.yml
Line 41 in 523963d
"celery -A datalad_registry.make_celery:celery_app worker --loglevel INFO --pool prefork" |
This PR requires further digging. I will come back to this PR after #252 which is more important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so here, I guess, now we are having log_progress
still causing logging of the progress... complimentary issue I guess. I guess we need to dig deeper, most likely adding a way to completely disable log_progress. It might be that some workaround like
def no_log_progress(*args, **kwargs):
pass
from datalad.support import gitrepo
gitrepo.log_progress = no_log_progress
would do the drill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution would take away the logs we want to take away and more. I think I still have to "start" and "finishing" logs, for example, to remain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, I still want an explanation for why changing the log level of the Celery worker command doesn't change the behavior of the worker's logger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is again datalad to blame: I added a note (see "edit 1") to datalad/datalad#7521
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael also pointed to the fact that we might want to use .propagate
to avoid propagation of messages from our datalad logger: https://docs.python.org/3/library/logging.html#logging.Logger
Findings in linkml/linkml#2323 (comment) can be useful in solving this problem. Particularly, setting the |
This PR suppresses known git progress logs in the worker.
The logging of the cloning operation in Datalad is hardcorded, https://github.com/datalad/datalad/blob/adcfcb6ae6e423a1860b7dde203c8caad32d6fb3/datalad/support/gitrepo.py#L1075, so I think by filtering the logs is a good way to removing git progress logs from the Celery worker's logs.
Let me know if there is a better way to remove git progress logs from the Celery worker's logs.