From 030e6dc743a2fc18977dae644f42ef19ee8e093c Mon Sep 17 00:00:00 2001 From: Przemek Maciolek Date: Sat, 21 Mar 2020 13:18:00 +0100 Subject: [PATCH] Move the documentation to README.md --- processor/k8sprocessor/README.md | 284 ++++++++++++++++++++++++++++++- processor/k8sprocessor/config.go | 64 +++---- processor/k8sprocessor/doc.go | 97 ----------- 3 files changed, 315 insertions(+), 130 deletions(-) diff --git a/processor/k8sprocessor/README.md b/processor/k8sprocessor/README.md index 78d0cc8763a4..26b7d670a5af 100644 --- a/processor/k8sprocessor/README.md +++ b/processor/k8sprocessor/README.md @@ -1 +1,283 @@ -Documentation is published to [pkg.go.dev](https://pkg.go.dev/github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor?tab=doc) +## Kubernetes Processor + +The `k8sprocessor` allow automatic tagging of spans with k8s metadata. + +It automatically discovers k8s resources (pods), extracts metadata from them and adds theextracted +metadata to the relevant spans. The processor use the kubernetes API to discover all pods running +in a cluster, keeps a record of their IP addresses and interesting metadata. Upon receiving spans, +the processor tries to identify the source IP address of the service that sent the spans and matches +it with the in memory data. If a match is found, the cached metadata is added to the spans as attributes. + +### Config + +There are several top level sections of the processor config: + +- `passthrough` (default = false): when set to true, only annotates resources with the pod IP and +does not try to extract any other metadata. It does not need access to the K8S cluster API. +Agent/Collector must receive spans directly from services to be able to correctly detect the pod IPs. +- `pod_ip_debugging` (default = false): when set to true, enables verbose logs that help +with verification how the Pod IP is being assigned when doing metadata tagging +- `extract`: the section (see [below](#k8sprocessor-extract)) allows specifying extraction rules +- `filter`: the section (see [below](#k8sprocessor-filter)) allows specifying filters when matching pods + +#### Extract section + +Allows specifying extraction rules to extract data from k8s pod specs. + +- `metadata` (default = empty): specifies a list of strings that denote extracted fields. See +[example config](#k8sprocessor-example) for the list of fields. +*Note: `owners` is a special field which enables traversing the ownership tree to pull data such +as `deploymentSetName`, `serviceName`, `daemonSetName`, `statefulSetName`, etc.)* +- `tags` (default = empty): specifies an optional map of custom tags to be used. When provided, +specified fields use provided names when being tagged, e.g.: + ```yaml + tags: + containerId: my-custom-tag-for-container + node: kubernetes.node + ``` +- `annotations` (default = empty): a list of rules for extraction and recording annotation data. +See [field extract config](#k8sprocessor-field-extract) for an example on how to use it. +- `labels` (default = empty): a list of rules for extraction and recording label data. +See [field extract config](#k8sprocessor-field-extract) for an example on how to use it. + +#### Field Extract Config + +Allows specifying an extraction rule to extract a value from exactly one field. + +The field accepts a list of maps accepting three keys: `tag-name`, `key` and `regex` + +- `tag-name`: represents the name of the tag that will be added to the span. When not specified +a default tag name will be used of the format: `k8s..` For example, if +`tag-name` is not specified and the key is `git_sha`, then the span name will be `k8s.annotation.deployment.git_sha` + +- `key`: represents the annotation name. This must exactly match an annotation name. To capture +all keys, `*` can be used + +- `regex`: is an optional field used to extract a sub-string from a complex field value. +The supplied regular expression must contain one named parameter with the string "value" +as the name. For example, if your pod spec contains the following annotation, +`kubernetes.io/change-cause: 2019-08-28T18:34:33Z APP_NAME=my-app GIT_SHA=58a1e39 CI_BUILD=4120` +and you'd like to extract the GIT_SHA and the CI_BUILD values as tags, then you must specify +the following two extraction rules: + + ```yaml + procesors: + k8s-tagger: + annotations: + - tag_name: git.sha + key: kubernetes.io/change-cause + regex: GIT_SHA=(?P\w+) + - tag_name: ci.build + key: kubernetes.io/change-cause + regex: JENKINS=(?P[\w]+) + ``` + + this will add the `git.sha` and `ci.build` tags to the spans. It is also possible to generically fetch + all keys and fill them into a template. To substitute the original name, use `%s`. For example: + + ```yaml + procesors: + k8s-tagger: + annotations: + - tag_name: k8s.annotation/%s + key: * + ``` + +#### Filter section + +FilterConfig section allows specifying filters to filter pods by labels, fields, namespaces, nodes, etc. + +- `node` (default = ""): represents a k8s node or host. If specified, any pods not running on the specified +node will be ignored by the tagger. +- `node_from_env_var` (default = ""): can be used to extract the node name from an environment variable. +The value must be the name of the environment variable. This is useful when the node a Otel agent will +run on cannot be predicted. In such cases, the Kubernetes downward API can be used to add the node name +to each pod as an environment variable. K8s tagger can then read this value and filter pods by it. +For example, node name can be passed to each agent with the downward API as follows + + ```yaml + env: + - name: K8S_NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + ``` + + Then the NodeFromEnv field can be set to `K8S_NODE_NAME` to filter all pods by the node that the agent + is running on. More on downward API here: + https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/ +- `namespace` (default = ""): filters all pods by the provided namespace. All other pods are ignored. +- `fields` (default = empty): a list of maps accepting three keys: `key`, `value`, `op`. Allows to filter +pods by generic k8s fields. Only the following operations (`op`) are supported: `equals`, `not-equals`. +For example, to match pods having `key1=value1` and `key2<>value2` condition met for fields, one can specify: + + ```yaml + fields: + - key: key1 # `op` defaults to "equals" when not specified + value: value1 + - key: key2 + value: value2 + op: not-equals + ``` + +- `labels` (default = empty): a list of maps accepting three keys: `key`, `value`, `op`. Allows to filter +pods by generic k8s pod labels. Only the following operations (`op`) are supported: `equals`, `not-equals`, +`exists`, `not-exists`. For example, to match pods where `label1` exists, one can specify + + ```yaml + fields: + - key: label1 + op: exists + ``` + +#### Example config: + +```yaml +processors: + k8s_tagger: + passthrough: false + extract: + metadata: + # extract the following well-known metadata fields + - containerId + - containerName + - containerImage + - cluster + - daemonSetName + - deployment + - hostName + - namespace + - namespaceId + - node + - owners + - podId + - podName + - replicaSetName + - serviceName + - startTime + - statefulSetName + tags: + # It is possible to provide your custom key names for each of the extracted metadata: + containerId: k8s.pod.containerId + + annotations: + # Extract all annotations using a template + - tag_name: k8s.annotation.%s + key: "*" + labels: + # Extract all labels using a template + - tag_name: k8s.label.%s + key: "*" + + filter: + # The pods might be filtered, just uncomment the relevant section and + # fill it with actual value, e.g.: + # + # namespace: ns2 # only look for pods running in ns2 namespace + # node: ip-111.us-west-2.compute.internal # only look for pods running on this node/host + # node_from_env_var: K8S_NODE # only look for pods running on the node/host specified by the K8S_NODE environment variable + # labels: # only consider pods that match the following labels + # - key: key1 # match pods that have a label `key1=value1`. `op` defaults to "equals" when not specified + # value: value1 + # - key: key2 # ignore pods that have a label `key2=value2`. + # value: value2 + # op: not-equals + # fields: # works the same way as labels but for fields instead (like annotations) + # - key: key1 + # value: value1 + # - key: key2 + # value: value2 + # op: not-equals +``` + +### RBAC + +TODO: mention the required RBAC rules. + +### Deployment scenarios + +The processor supports running both in agent and collector mode. + +#### As an agent + +When running as an agent, the processor detects IP addresses of pods sending spans to the agent and uses this +information to extract metadata from pods and add to spans. When running as an agent, it is important to apply +a discovery filter so that the processor only discovers pods from the same host that it is running on. Not using +such a filter can result in unnecessary resource usage especially on very large clusters. Once the fitler is applied, +each processor will only query the k8s API for pods running on it's own node. + +Node filter can be applied by setting the `filter.node` config option to the name of a k8s node. While this works +as expected, it cannot be used to automatically filter pods by the same node that the processor is running on in +most cases as it is not know before hand which node a pod will be scheduled on. Luckily, kubernetes has a solution +for this called the downward API. To automatically filter pods by the node the processor is running on, you'll need +to complete the following steps: + +1. Use the downward API to inject the node name as an environment variable. +Add the following snippet under the pod env section of the OpenTelemetry container. + + ```yaml + env: + - name: KUBE_NODE_NAME + valueFrom: + fieldRef: + apiVersion: v1 + fieldPath: spec.nodeName + ``` + + This will inject a new environment variable to the OpenTelemetry container with the value as the + name of the node the pod was scheduled to run on. + +2. Set "filter.node_from_env_var" to the name of the environment variable holding the node name. + + ```yaml + k8s_tagger: + filter: + node_from_env_var: KUBE_NODE_NAME # this should be same as the var name used in previous step + ``` + + This will restrict each OpenTelemetry agent to query pods running on the same node only dramatically reducing + resource requirements for very large clusters. + +#### As a collector + +The processor can be deployed both as an agent or as a collector. + +When running as a collector, the processor cannot correctly detect the IP address of the pods generating +the spans when it receives the spans from an agent instead of receiving them directly from the pods. To +workaround this issue, agents deployed with the k8s_tagger processor can be configured to detect +the IP addresses and forward them along with the span resources. Collector can then match this IP address +with k8s pods and enrich the spans with the metadata. In order to set this up, you'll need to complete the +following steps: + +1. Setup agents in passthrough mode +Configure the agents' k8s_tagger processors to run in passthrough mode. + + ```yaml + # k8s_tagger config for agent + k8s_tagger: + passthrough: true + ``` + This will ensure that the agents detect the IP address as add it as an attribute to all span resources. + Agents will not make any k8s API calls, do any discovery of pods or extract any metadata. + +2. Configure the collector as usual +No special configuration changes are needed to be made on the collector. It'll automatically detect +the IP address of spans sent by the agents as well as directly by other services/pods. + + +### Caveats + +There are some edge-cases and scenarios where k8s_tagger will not work properly. + + +#### Host networking mode + +The processor cannot correct identify pods running in the host network mode and +enriching spans generated by such pods is not supported at the moment. + +#### As a sidecar + +The processor does not support detecting containers from the same pods when running +as a sidecar. While this can be done, we think it is simpler to just use the kubernetes +downward API to inject environment variables into the pods and directly use their values +as tags. diff --git a/processor/k8sprocessor/config.go b/processor/k8sprocessor/config.go index f94772838118..39fec485b11f 100644 --- a/processor/k8sprocessor/config.go +++ b/processor/k8sprocessor/config.go @@ -62,56 +62,56 @@ type ExtractConfig struct { // documentation for more details. Annotations []FieldExtractConfig `mapstructure:"annotations"` - // Annotations allows extracting data from pod labels and record it + // Labels allows extracting data from pod labels and record it // as resource attributes. // It is a list of FieldExtractConfig type. See FieldExtractConfig // documentation for more details. Labels []FieldExtractConfig `mapstructure:"labels"` } -// FieldExtractConfig allows specifying an extraction rule to extract a value from exactly one field. +//FieldExtractConfig allows specifying an extraction rule to extract a value from exactly one field. // -// The field accepts a list FilterExtractConfig map. The map accepts three keys -// tag-name, key and regex +//The field accepts a list FilterExtractConfig map. The map accepts three keys +// tag-name, key and regex // -// - tag-name represents the name of the tag that will be added to the span. -// When not specified a default tag name will be used of the format: -// k8s.. -// For example, if tag-name is not specified and the key is git_sha, -// then the span name will be `k8s.annotation.deployment.git_sha`. +//- tag-name represents the name of the tag that will be added to the span. +// When not specified a default tag name will be used of the format: +// k8s.. +// For example, if tag-name is not specified and the key is git_sha, +// then the span name will be `k8s.annotation.deployment.git_sha`. // -// - key represents the annotation name. This must exactly match an annotation name. -// To capture all keys, `*` can be used +//- key represents the annotation name. This must exactly match an annotation name. +// To capture all keys, `*` can be used // -// - regex is an optional field used to extract a sub-string from a complex field value. -// The supplied regular expression must contain one named parameter with the string "value" -// as the name. For example, if your pod spec contains the following annotation, +//- regex is an optional field used to extract a sub-string from a complex field value. +// The supplied regular expression must contain one named parameter with the string "value" +// as the name. For example, if your pod spec contains the following annotation, // // kubernetes.io/change-cause: 2019-08-28T18:34:33Z APP_NAME=my-app GIT_SHA=58a1e39 CI_BUILD=4120 // -// and you'd like to extract the GIT_SHA and the CI_BUILD values as tags, then you must -// specify the following two extraction rules: +// and you'd like to extract the GIT_SHA and the CI_BUILD values as tags, then you must +// specify the following two extraction rules: // -// procesors: -// k8s-tagger: -// annotations: -// - tag_name: git.sha -// key: kubernetes.io/change-cause -// regex: GIT_SHA=(?P\w+) -// - tag_name: ci.build +// procesors: +// k8s-tagger: +// annotations: +// - tag_name: git.sha +// key: kubernetes.io/change-cause +// regex: GIT_SHA=(?P\w+) +// - tag_name: ci.build // key: kubernetes.io/change-cause -// regex: JENKINS=(?P[\w]+) +// regex: JENKINS=(?P[\w]+) // -// this will add the `git.sha` and `ci.build` tags to the spans. +// this will add the `git.sha` and `ci.build` tags to the spans. // -// It is also possible to generically fetch all keys and fill them into a template. -// To substitute the original name, use `%s`. For example: +// It is also possible to generically fetch all keys and fill them into a template. +// To substitute the original name, use `%s`. For example: // -// procesors: -// k8s-tagger: -// annotations: -// - tag_name: k8s.annotation/%s -// key: * +// procesors: +// k8s-tagger: +// annotations: +// - tag_name: k8s.annotation/%s +// key: * type FieldExtractConfig struct { TagName string `mapstructure:"tag_name"` diff --git a/processor/k8sprocessor/doc.go b/processor/k8sprocessor/doc.go index bce408e015e1..9f2556550089 100644 --- a/processor/k8sprocessor/doc.go +++ b/processor/k8sprocessor/doc.go @@ -13,101 +13,4 @@ // limitations under the License. // Package k8sprocessor allow automatic tagging of spans with k8s metadata. -// -// The processor automatically discovers k8s resources (pods), extracts metadata from them and adds the -// extracted metadata to the relevant spans. The processor use the kubernetes API to discover all pods -// running in a cluster, keeps a record of their IP addresses and interesting metadata. Upon receiving spans, -// the processor tries to identify the source IP address of the service that sent the spans and matches -// it with the in memory data. If a match is found, the cached metadata is added to the spans as attributes. -// -// RBAC -// -// TODO: mention the required RBAC rules. -// -// Config -// -// TODO: example config. -// -// Deployment scenarios -// -// The processor supports running both in agent and collector mode. -// -// As an agent -// -// When running as an agent, the processor detects IP addresses of pods sending spans to the agent and uses this -// information to extract metadata from pods and add to spans. When running as an agent, it is important to apply -// a discovery filter so that the processor only discovers pods from the same host that it is running on. Not using -// such a filter can result in unnecessary resource usage especially on very large clusters. Once the fitler is applied, -// each processor will only query the k8s API for pods running on it's own node. -// -// Node filter can be applied by setting the `filter.node` config option to the name of a k8s node. While this works -// as expected, it cannot be used to automatically filter pods by the same node that the processor is running on in -// most cases as it is not know before hand which node a pod will be scheduled on. Luckily, kubernetes has a solution -// for this called the downward API. To automatically filter pods by the node the processor is running on, you'll need -// to complete the following steps: -// -// 1. Use the downward API to inject the node name as an environment variable. -// Add the following snippet under the pod env section of the OpenTelemetry container. -// -// env: -// - name: KUBE_NODE_NAME -// valueFrom: -// fieldRef: -// apiVersion: v1 -// fieldPath: spec.nodeName -// -// This will inject a new environment variable to the OpenTelemetry container with the value as the -// name of the node the pod was scheduled to run on. -// -// 2. Set "filter.node_from_env_var" to the name of the environment variable holding the node name. -// -// k8s_tagger: -// filter: -// node_from_env_var: KUBE_NODE_NAME # this should be same as the var name used in previous step -// -// This will restrict each OpenTelemetry agent to query pods running on the same node only dramatically reducing -// resource requirements for very large clusters. -// -// As a collector -// -// The processor can be deployed both as an agent or as a collector. -// -// When running as a collector, the processor cannot correctly detect the IP address of the pods generating -// the spans when it receives the spans from an agent instead of receiving them directly from the pods. To -// workaround this issue, agents deployed with the k8s_tagger processor can be configured to detect -// the IP addresses and forward them along with the span resources. Collector can then match this IP address -// with k8s pods and enrich the spans with the metadata. In order to set this up, you'll need to complete the -// following steps: -// -// 1. Setup agents in passthrough mode -// Configure the agents' k8s_tagger processors to run in passthrough mode. -// -// # k8s_tagger config for agent -// k8s_tagger: -// passthrough: true -// -// This will ensure that the agents detect the IP address as add it as an attribute to all span resources. -// Agents will not make any k8s API calls, do any discovery of pods or extract any metadata. -// -// 2. Configure the collector as usual -// No special configuration changes are needed to be made on the collector. It'll automatically detect -// the IP address of spans sent by the agents as well as directly by other services/pods. -// -// -// Caveats -// -// There are some edge-cases and scenarios where k8s_tagger will not work properly. -// -// -// Host networking mode -// -// The processor cannot correct identify pods running in the host network mode and -// enriching spans generated by such pods is not supported at the moment. -// -// As a sidecar -// -// The processor does not support detecting containers from the same pods when running -// as a sidecar. While this can be done, we think it is simpler to just use the kubernetes -// downward API to inject environment variables into the pods and directly use their values -// as tags. package k8sprocessor