Skip to content
This repository has been archived by the owner on Dec 5, 2023. It is now read-only.

initialize logging #48

Open
ircwaves opened this issue May 11, 2022 · 9 comments
Open

initialize logging #48

ircwaves opened this issue May 11, 2022 · 9 comments

Comments

@ircwaves
Copy link
Member

ircwaves commented May 11, 2022

in cirruslib to cirrus.lib refactor the logging initialization got lost.

cirruslib.init used to say:

from .logging import get_task_logger

which caused cirrus.lib.logging to get loaded, and in the process inject the cirrus logging configs into the python logging system.

@jkeifer
Copy link
Member

jkeifer commented May 11, 2022

Seems like if you want the cirrus logging you can explicitly import cirrus.lib.logging, no? If we could resolve this by adding a note about this to the list of breaking changes in the changelog, I'm all for doing so.

Spoiler: I'm not a bit fan of having code in init files (except when wanting to pretend a subpackage is a module, which can be handy). In this case, as a namespaced package, cirrus.lib is the top-level package, and I would strongly argue against having any code in that init file. Particularly when that would implicitly reconfigure logging in a way that may not be desired by someone importing something other than logging.

Thoughts?

@matthewhanson
Copy link
Member

Yes, you can be explicit. I guess I mostly agree with you. I find it helpful in some cases to put imports into the top-level (esp with classes where each file is a Class, and then I like to put the imports in the init so you can import them from the module rather than the submodule, ie,

import mypackage.MyClass
rather than
import mypackage.myclass.MyClass

But in this case, you make a good point, this module isn't just importing some function or class definitions, it's setting the global logger config so it's probably something we want to avoid automatically doing behind the scenes.

@jkeifer
Copy link
Member

jkeifer commented May 12, 2022

So I don't totally disagree with making imports cleaner in the above example. I even do that in a number of places in cirrus-geo.

The biggest issue for me is doing that at the top-level of package. To import anything in mypackage, you will execute every module imported in mypackage.__init__. If you truly have only a set of related classes that you want to expose publicly and their modules are guaranteed to have no side effects on execution, then yeah it works. At least until someone changes one of those modules in an unexpected way and now importing anything from your package has side effects. Or if you add a new class/module that is not entirely related to the others, and is likely to be used independently: now imports from your package are slower than necessary because you are executing every module in your package just to get one thing from it. Python already struggles with slow startup times, I don't want to make it worse for no good reason.

This also means you can't import a package to interrogate it for metadata/path/whatever else you might need to get from it without executing code. In fact this was a problem with cirrus-lib, if I remember correctly: the way we are discovering the on-disk path to inject it into lambdas at build time was executing code when all I wanted was the installed location of the package. Or maybe it was discovering the package dependencies. Either way, it caused a problem.

Stepping slightly outside the normal use case, and code in your __init__ can cause problems, top-level or not. Importing in an __init__ just means that everything imported is, in effect, code in your __init__.

But it sounds like we are on the same page, so I'll stop preaching to the choir. :)

@ircwaves
Copy link
Member Author

Good points. I concur with the explicit-better-than-implicit behavior. Seems like a good motion to wire into the docs that cirrus.lib.logging must be done. And perhaps, as things move toward using the base classes here, that import can be tucked into there, such that it is explicit, and part of the boiler plate?

@ircwaves
Copy link
Member Author

that import can be tucked into there, such that it is explicit, and part of the boiler plate?

Looks like that is (partially) already the case in the feature/task-class branch:

from cirrus.lib.logging import get_task_logger

@ircwaves
Copy link
Member Author

ircwaves commented May 13, 2022

Looks like the other spots that need logging explicitly loaded are in cirrus-geo, like here:

# logging
logger = logging.getLogger("feeder.rerun")

Should we close this discussion with an issue for cirrus-geo and cirrus-docs to get explicit with logging?

@matthewhanson
Copy link
Member

I'm also wondering if it's bad practice to set the global logger on import:

https://github.com/cirrus-geo/cirrus-lib/blob/main/src/cirrus/lib/logging.py#L49

and instead that may be better as a function to configure the logger (or require user to call the set the config with the default dictionary).

I'm also open to rethinking how logging works in general if anyone has any strong opinions here.

@jkeifer
Copy link
Member

jkeifer commented May 13, 2022

@ircwaves I think the direction to go is to create classes for all components, which by default handle the logging setup (as you see is true for the Task class). Then we should convert all the components in cirrus-geo to use those new classes.

@matthewhanson I think a function to configure logging makes some sense (I was wondering if that would be a good solution myself).

That said, I think it would be worth spending some time to really dig into the logging, rethinking it, as you say (my experience with logging is that it can always use some love). However, I don't have any strong opinions about what changes should/would come out of that, so I'd suggest we save that work for a future project. Updates to the logging behavior would be much easier if everything is already using the component classes and getting logging setup by default.

tl;dr is that I'm of the opinion we make no changes at the moment, and instead focus any development time on finalizing the component classes.

@jkeifer
Copy link
Member

jkeifer commented May 13, 2022

I'll also note that the python templates for creating new feeders/tasks/functions via the cirrus cli include the logging boilerplate. So while of course I wouldn't say no to someone with a docs PR, I don't think we need to go out of the way to add that info. Particularly given the momentum behind the component classes and the fact that when we finish those we'll need to revise any existing documentation.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants