This document aims at providing a general developer guideline on different aspects of logging practices and conventions used in the Gardener codebase. It contains mostly Gardener-specific points and references other existing and commonly accepted logging guidelines for general advice. Developers and reviewers should consult this guide when writing, refactoring and reviewing Gardener code. If parts are unclear or new learnings arise, this guide should be adapted accordingly.
Historically, Gardener components have been using logrus.
There is a global logrus logger (logger.Logger
) that is initialized by components on startup and used across the codebase.
In most places, it is used as a printf
-style logger and only in some instances we make use of logrus' structured logging functionality.
In the process of migrating our components to native controller-runtime components (see gardener/gardener#4251), we also want to make use of controller-runtime's built-in mechanisms for streamlined logging. controller-runtime uses logr, a simple structured logging interface, for library-internal logging and logging in controllers.
logr itself is only an interface and doesn't provide an implementation out of the box. Instead, it needs to be backed by a logging implementation like zapr. Code that uses the logr interface is thereby not tied to a specific logging implementation and makes the implementation easily exchangeable. controller-runtime already provides a set of helpers for constructing zapr loggers, i.e., logr loggers backed by zap, which is a popular logging library in the go community. Hence, we are migrating our component logging from logrus to logr (backed by zap) as part of gardener/gardener#4251.
⚠️ logger.Logger
(logrus logger) is deprecated in Gardener and shall not be used in new code – use logr loggers when writing new code! (also see Migration from logrus to logr)ℹ️ Don't use zap loggers directly, always use the logr interface in order to avoid tight coupling to a specific logging implementation.
gardener-apiserver differs from the other components as it is based on the apiserver library and therefore uses klog – just like kube-apiserver. As gardener-apiserver writes (almost) no logs in our coding (outside the apiserver library), there is currently no plan for switching the logging implementation. Hence, the following sections focus on logging in the controller and admission components only.
To ensure a smooth migration to logr and make logging in Gardener components more consistent, the logcheck
tool was added.
It enforces (parts of) this guideline and detects programmer-level errors early on in order to prevent bugs.
Please check out the tool's documentation for a detailed description.
Similar to efforts in the Kubernetes project, we want to migrate our component logs to structured logging. As motivated above, we will use the logr interface instead of klog though.
You can read more about the motivation behind structured logging in logr's background and FAQ (also see this blog post by Dave Cheney). Also, make sure to check out controller-runtime's logging guideline with specifics for projects using the library. The following sections will focus on the most important takeaways from those guidelines and give general instructions on how to apply them to Gardener and its controller-runtime components. Note: some parts in this guideline differ slightly from controller-runtime's document.
❌ stop using printf
-style logging:
var logger *logrus.Logger
logger.Infof("Scaling deployment %s/%s to %d replicas", deployment.Namespace, deployment.Name, replicaCount)
✅ instead, write static log messages and enrich them with additional structured information in form of key-value pairs:
var logger logr.Logger
logger.Info("Scaling deployment", "deployment", client.ObjectKeyFromObject(deployment), "replicas", replicaCount)
Gardener components can be configured to either log in json
(default) or text
format:
json
format is supposed to be used in production, while text
format might be nicer for development.
# json
{"level":"info","ts":"2021-12-16T08:32:21.059+0100","msg":"Hello botanist","garden":"eden"}
# text
2021-12-16T08:32:21.059+0100 INFO Hello botanist {"garden": "eden"}
Components can be set to one of the following log levels (with increasing verbosity): error
, info
(default), debug
.
ℹ️ Note: some Gardener components don't feature a configurable log level and format yet. In this case, they log at
info
injson
format. We might add configuration options via command line flags that can be used in all components in the future though (see gardener/gardener#5191).
logr uses V-levels (numbered log levels), higher V-level means higher verbosity.
V-levels are relative (in contrast to klog
's absolute V-levels), i.e., V(1)
creates a logger, that is one level more verbose than its parent logger.
In Gardener components, the mentioned log levels in the component config (error
, info
, debug
) map to the zap levels with the same names (see here).
Hence, our loggers follow the same mapping from numerical logr levels to named zap levels like described in zapr, i.e.:
- component config specifies
debug
➡️ bothV(0)
andV(1)
are enabled - component config specifies
info
➡️V(0)
is enabled,V(1)
will not be shown - component config specifies
error
➡️ neitherV(0)
norV(1)
will be shown Error()
logs will always be shown
This mapping applies to the components' root loggers (the ones that are not "derived" from any other logger; constructed on component startup).
If you derive a new logger with e.g. V(1)
, the mapping will shift by one. For example, V(0)
will then log at zap's debug
level.
There is no warning
level (see Dave Cheney's post).
If there is an error condition (e.g., unexpected error received from a called function), the error should either be handled or logged at error
if it is neither handled nor returned.
If you have an error
value at hand that doesn't represent an actual error condition, but you still want to log it as an informational message, log it at info
level with key err
.
We might consider to make use of a broader range of log levels in the future when introducing more logs and common command line flags for our components (comparable to --v
of Kubernetes components).
For now, we stick to the mentioned two log levels like controller-runtime: info (V(0)
) and debug (V(1)
).
Controllers should use named loggers that include their name, e.g.:
controllerLogger := rootLogger.WithName("controller").WithName("shoot")
controllerLogger.Info("Deploying kube-apiserver")
results in
2021-12-16T09:27:56.550+0100 INFO controller.shoot Deploying kube-apiserver
Logger names are hierarchical. You can make use of it, where controllers are composed of multiple "subcontrollers", e.g., controller.shoot.hibernation
or controller.shoot.maintenance
.
Using the global logger logf.Log
directly is discouraged and should be rather exceptional because it makes correlating logs with code harder.
Preferably, all parts of the code should use some named logger.
In your Reconcile
function, retrieve a logger from the given context.Context
.
It inherits from the controller's logger (i.e., is already named) and is preconfigured with name
and namespace
values for the reconciliation request:
func (r *reconciler) Reconcile(ctx context.Context, request reconcile.Request) (reconcile.Result, error) {
log := logf.FromContext(ctx)
log.Info("Reconciling Shoot")
// ...
return reconcile.Result{}, nil
}
results in
2021-12-16T09:35:59.099+0100 INFO controller.shoot Reconciling Shoot {"name": "sunflower", "namespace": "garden-greenhouse"}
The logger is injected by controller-runtime's Controller
implementation and our controllerutils.CreateWorker
alike (if a logger is passed using controllerutils.WithLogger
). The logger returned by logf.FromContext
is never nil
. If the context doesn't carry a logger, it falls back to the global logger (logf.Log
), which might discard logs if not configured, but is also never nil
.
The controller implementation (controller-runtime / CreateWorker
) itself takes care of logging the error returned by reconcilers.
Hence, don't log an error that you are returning.
Generally, functions should not return an error, if they already logged it, because that means the error is already handled and not an error anymore.
See Dave Cheney's post for more on this.
- Log messages should be static. Don't put variable content in there, i.e., no
fmt.Sprintf
or string concatenation (+
). Use key-value pairs instead. - Log messages should be capitalized. Note: this contrasts with error messages, that should not be capitalized. However, both should not end with a punctuation mark.
-
Use
WithValues
instead of repeatedly adding key-value pairs for multiple log statements.WithValues
creates a new logger from the parent, that carries the given key-value pairs. E.g., use it when acting on one object in multiple steps and logging something for each step:log := parentLog.WithValues("infrastructure", client.ObjectKeyFromObject(infrastrucutre)) // ... log.Info("Creating Infrastructure") // ... log.Info("Waiting for Infrastructure to be reconciled") // ...
Note:
WithValues
bypasses controller-runtime's special zap encoder that nicely encodesObjectKey
/NamespacedName
andruntime.Object
values, see kubernetes-sigs/controller-runtime#1290. Thus, the end result might look different depending on the value and itsStringer
implementation. -
Use lowerCamelCase for keys. Don't put spaces in keys, as it will make log processing with simple tools like
jq
harder. -
Keys should be constant, human-readable, consistent across the codebase and naturally match parts of the log message, see logr guideline.
-
When logging object keys (name and namespace), use the object's type as the log key and a
client.ObjectKey
/types.NamespacedName
value as value, e.g.:var deployment *appsv1.Deployment log.Info("Creating Deployment", "deployment", client.ObjectKeyFromObject(deployment))
which results in
{"level":"info","ts":"2021-12-16T08:32:21.059+0100","msg":"Creating Deployment","deployment":{"name": "bar", "namespace": "foo"}}
Earlier, we often used
kutil.ObjectName()
for logging object keys, which encodes them into a flat string likefoo/bar
. However, this flat string cannot be processed so easily by logging stacks (orjq
) like a structured log. Hence, the use ofkutil.ObjectName()
for logging object keys is discouraged. Existing usages should be refactored to useclient.ObjectKeyFromObject()
instead. -
There are cases where you don't have the full object key or the object itself at hand, e.g., if an object references another object (in the same namespace) by name (think
secretRef
or similar). In such a cases, either construct the full object key including the implied namespace or log the object name under a key ending inName
, e.g.:var ( // object to reconcile shoot *gardencorev1beta1.Shoot // retrieved via logf.FromContext, preconfigured by controller with namespace and name of reconciliation request log logr.Logger ) // option a: full object key, manually constructed log.Info("Shoot uses SecretBinding", "secretBinding", client.ObjectKey{Namespace: shoot.Namespace, Name: shoot.Spec.SecretBindingName}) // option b: only name under respective *Name log key log.Info("Shoot uses SecretBinding", "secretBindingName", shoot.Spec.SecretBindingName)
Both options result in well-structured logs, that are easy to interpret and process:
{"level":"info","ts":"2022-01-18T18:00:56.672+0100","msg":"Shoot uses SecretBinding","name":"my-shoot","namespace":"garden-project","secretBinding":{"namespace":"garden-project","name":"aws"}} {"level":"info","ts":"2022-01-18T18:00:56.673+0100","msg":"Shoot uses SecretBinding","name":"my-shoot","namespace":"garden-project","secretBindingName":"aws"}
-
When handling generic
client.Object
values (e.g. in helper funcs), useobject
as key. -
When adding timestamps to key-value pairs, use
time.Time
values. By this, they will be encoded in the same format as the log entry's timestamp.
Don't usemetav1.Time
values, as they will be encoded in a different format by theirStringer
implementation. Pass<someTimestamp>.Time
to loggers in case you have ametav1.Time
value at hand. -
Same applies to durations. Use
time.Duration
values instead of*metav1.Duration
. Durations can be handled specially by zap just like timestamps. -
Event recorders not only create
Event
objects but also log them. However, both Gardener's manually instantiated event recorders and the ones that controller-runtime provides log todebug
level and use generic formats, that are not very easy to interpret or process (no structured logs). Hence, don't use event recorders as replacements for well-structured logs. If a controller records an event for a completed action or important information, it should probably log it as well, e.g.:log.Info("Creating ManagedSeed", "replica", r.GetObjectKey()) a.recorder.Eventf(managedSeedSet, corev1.EventTypeNormal, EventCreatingManagedSeed, "Creating ManagedSeed %s", r.GetFullName())
-
If the tested production code requires a logger, you can pass
logr.Discard()
orlogf.NullLogger{}
in your test, which simply discards all logs. -
logf.Log
is safe to use in tests and will not cause a nil pointer deref, even if it's not initialized vialogf.SetLogger
. It is initially set to aNullLogger
by default, which means all logs are discarded, unlesslogf.SetLogger
is called in the first 30 seconds of execution. -
Pass
zap.WriteTo(GinkgoWriter)
in tests where you want to see the logs on test failure but not on success, for example:logf.SetLogger(logger.MustNewZapLogger(logger.InfoLevel, logger.FormatJSON, zap.WriteTo(GinkgoWriter))) log := logf.Log.WithName("test")
These points might be helpful when refactoring existing code during the migration period:
- For migrating an existing controller to logr:
- Create a named logger (example).
- Pass
controllerutils.WithLogger
toCreateWorker
(example). This allowslogf.FromContext
to be used in reconcilers. - Use
logf.FromContext
inReconcile
to retrieve the logr logger and use it from there on (example). - Make sure to follow the other guidelines mentioned above as well (see Logging in Controllers).
- Libraries might expect a different logging implementation than the component which uses it. E.g., a controller that already uses logr might want to use the
flow
package which still uses logrus. In such cases:- You can consider refactoring the library along with the component itself, if feasible.
- It is acceptable for the migration period to use a logger derived from the respective global logger (
logger.Logger
orlogf.Log
) and pass it to the library. However, please add aTODO
for cleaning it up later on, once the migration is completed. E.g.:// TODO: switch to logr once flow package is migrated err := shootFlow.Run(flow.Opts{ Logger: logger.Logger.WithFields(logrus.Fields{"logger": "controller." + ControllerName, "name": shoot.Name, "namespace": shoot.Namespace}) })