Move the documentation to README.md

SumoLogic · Mar 25, 2020 · 030e6dc · 030e6dc
1 parent 706b609
commit 030e6dc
Show file tree

Hide file tree

Showing 3 changed files with 315 additions and 130 deletions.
diff --git a/processor/k8sprocessor/README.md b/processor/k8sprocessor/README.md
@@ -1 +1,283 @@
-Documentation is published to [pkg.go.dev](https://pkg.go.dev/github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor?tab=doc)
+## <a name="k8sprocessor"></a>Kubernetes Processor
+
+The `k8sprocessor` allow automatic tagging of spans with k8s metadata.
+
+It automatically discovers k8s resources (pods), extracts metadata from them and adds theextracted 
+metadata to the relevant spans. The processor use the kubernetes API to discover all pods running 
+in a cluster, keeps a record of their IP addresses and interesting metadata. Upon receiving spans,
+the processor tries to identify the source IP address of the service that sent the spans and matches
+it with the in memory data. If a match is found, the cached metadata is added to the spans as attributes.
+
+### Config
+
+There are several top level sections of the processor config:
+
+- `passthrough` (default = false): when set to true, only annotates resources with the pod IP and
+does not try to extract any other metadata. It does not need access to the K8S cluster API. 
+Agent/Collector must receive spans directly from services to be able to correctly detect the pod IPs.
+- `pod_ip_debugging` (default = false): when set to true, enables verbose logs that help
+with verification how the Pod IP is being assigned when doing metadata tagging
+- `extract`: the section (see [below](#k8sprocessor-extract)) allows specifying extraction rules
+- `filter`: the section (see [below](#k8sprocessor-filter)) allows specifying filters when matching pods
+
+#### <a name="k8sprocessor-extract"></a>Extract section
+
+Allows specifying extraction rules to extract data from k8s pod specs.
+
+- `metadata` (default = empty): specifies a list of strings that denote extracted fields. See 
+[example config](#k8sprocessor-example) for the list of fields. 
+*Note: `owners` is a special field which enables traversing the ownership tree to pull data such 
+as `deploymentSetName`, `serviceName`, `daemonSetName`, `statefulSetName`, etc.)*
+- `tags` (default = empty): specifies an optional map of custom tags to be used. When provided, 
+specified fields use provided names when being tagged, e.g.:
+    ```yaml
+    tags:
+      containerId: my-custom-tag-for-container
+      node: kubernetes.node
+    ```
+- `annotations` (default = empty): a list of rules for extraction and recording annotation data.
+See [field extract config](#k8sprocessor-field-extract) for an example on how to use it.
+- `labels` (default = empty): a list of rules for extraction and recording label data.
+See [field extract config](#k8sprocessor-field-extract) for an example on how to use it.
+
+#### <a name="k8sprocessor-field-extract"></a> Field Extract Config
+
+Allows specifying an extraction rule to extract a value from exactly one field.
+
+The field accepts a list of maps accepting three keys: `tag-name`, `key` and `regex`
+
+- `tag-name`: represents the name of the tag that will be added to the span.  When not specified 
+a default tag name will be used of the format: `k8s.<annotation>.<annotation key>` For example, if 
+`tag-name` is not specified and the key is `git_sha`, then the span name will be `k8s.annotation.deployment.git_sha`
+
+- `key`: represents the annotation name. This must exactly match an annotation name. To capture 
+all keys, `*` can be used
+
+- `regex`: is an optional field used to extract a sub-string from a complex field value.
+The supplied regular expression must contain one named parameter with the string "value"
+as the name. For example, if your pod spec contains the following annotation,
+`kubernetes.io/change-cause: 2019-08-28T18:34:33Z APP_NAME=my-app GIT_SHA=58a1e39 CI_BUILD=4120`
+and you'd like to extract the GIT_SHA and the CI_BUILD values as tags, then you must specify 
+the following two extraction rules:
+
+  ```yaml
+  procesors:
+    k8s-tagger:
+      annotations:
+        - tag_name: git.sha
+          key: kubernetes.io/change-cause
+          regex: GIT_SHA=(?P<value>\w+)
+        - tag_name: ci.build
+          key: kubernetes.io/change-cause
+          regex: JENKINS=(?P<value>[\w]+)
+  ```
+
+  this will add the `git.sha` and `ci.build` tags to the spans. It is also possible to generically fetch 
+  all keys and fill them into a template. To substitute the original name, use `%s`. For example:
+
+  ```yaml
+  procesors:
+    k8s-tagger:
+      annotations:
+        - tag_name: k8s.annotation/%s
+          key: *
+  ```
+
+#### <a name="k8sprocessor-filter"></a>Filter section
+
+FilterConfig section allows specifying filters to filter pods by labels, fields, namespaces, nodes, etc.
+
+- `node` (default = ""): represents a k8s node or host. If specified, any pods not running on the specified 
+node will be ignored by the tagger.
+- `node_from_env_var` (default = ""): can be used to extract the node name from an environment variable. 
+The value must be the name of the environment variable. This is useful when the node a Otel agent will 
+run on cannot be predicted. In such cases, the Kubernetes downward API can be used to add the node name 
+to each pod as an environment variable. K8s tagger can then read this value and filter pods by it.
+For example, node name can be passed to each agent with the downward API as follows
+
+    ```yaml
+     env:
+       - name: K8S_NODE_NAME
+             valueFrom:
+               fieldRef:
+                 fieldPath: spec.nodeName
+    ```
+
+  Then the NodeFromEnv field can be set to `K8S_NODE_NAME` to filter all pods by the node that the agent 
+  is running on. More on downward API here: 
+  https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/
+- `namespace` (default = ""): filters all pods by the provided namespace. All other pods are ignored.
+- `fields` (default = empty): a list of maps accepting three keys: `key`, `value`, `op`. Allows to filter 
+pods by generic k8s fields. Only the following operations (`op`) are supported: `equals`, `not-equals`.
+For example, to match pods having `key1=value1` and `key2<>value2` condition met for fields, one can specify:
+
+    ```yaml
+      fields: 
+       - key: key1 # `op` defaults to "equals" when not specified
+         value: value1
+       - key: key2 
+         value: value2
+         op: not-equals
+    ```
+
+- `labels` (default = empty): a list of maps accepting three keys: `key`, `value`, `op`. Allows to filter 
+pods by generic k8s pod labels. Only the following operations (`op`) are supported: `equals`, `not-equals`, 
+`exists`, `not-exists`. For example, to match pods where `label1` exists, one can specify
+
+    ```yaml
+      fields: 
+       - key: label1
+         op: exists
+    ``` 
+
+#### <a name="k8sprocessor-example"></a>Example config:
+
+```yaml
+processors:
+  k8s_tagger:
+    passthrough: false
+    extract:
+      metadata:
+        # extract the following well-known metadata fields
+        - containerId
+        - containerName
+        - containerImage
+        - cluster
+        - daemonSetName
+        - deployment
+        - hostName
+        - namespace
+        - namespaceId
+        - node
+        - owners
+        - podId
+        - podName
+        - replicaSetName
+        - serviceName
+        - startTime
+        - statefulSetName
+      tags:
+        # It is possible to provide your custom key names for each of the extracted metadata:
+        containerId: k8s.pod.containerId
+
+      annotations:
+        # Extract all annotations using a template
+        - tag_name: k8s.annotation.%s
+          key: "*"
+      labels:
+        # Extract all labels using a template
+        - tag_name: k8s.label.%s
+          key: "*"
+
+    filter:
+      # The pods might be filtered, just uncomment the relevant section and 
+      # fill it with actual value, e.g.:
+      #
+      # namespace: ns2 # only look for pods running in ns2 namespace
+      # node: ip-111.us-west-2.compute.internal # only look for pods running on this node/host
+      # node_from_env_var: K8S_NODE # only look for pods running on the node/host specified by the K8S_NODE environment variable
+      # labels: # only consider pods that match the following labels
+      #  - key: key1 # match pods that have a label `key1=value1`. `op` defaults to "equals" when not specified
+      #    value: value1
+      #  - key: key2 # ignore pods that have a label `key2=value2`.
+      #    value: value2
+      #    op: not-equals
+      # fields: # works the same way as labels but for fields instead (like annotations)
+      #  - key: key1
+      #    value: value1
+      #  - key: key2
+      #    value: value2
+      #    op: not-equals
+```
+
+### RBAC
+
+TODO: mention the required RBAC rules.
+
+### Deployment scenarios
+
+The processor supports running both in agent and collector mode.
+
+#### As an agent
+
+When running as an agent, the processor detects IP addresses of pods sending spans to the agent and uses this
+information to extract metadata from pods and add to spans. When running as an agent, it is important to apply
+a discovery filter so that the processor only discovers pods from the same host that it is running on. Not using
+such a filter can result in unnecessary resource usage especially on very large clusters. Once the fitler is applied,
+each processor will only query the k8s API for pods running on it's own node.
+
+Node filter can be applied by setting the `filter.node` config option to the name of a k8s node. While this works
+as expected, it cannot be used to automatically filter pods by the same node that the processor is running on in
+most cases as it is not know before hand which node a pod will be scheduled on. Luckily, kubernetes has a solution
+for this called the downward API. To automatically filter pods by the node the processor is running on, you'll need
+to complete the following steps:
+
+1. Use the downward API to inject the node name as an environment variable.
+Add the following snippet under the pod env section of the OpenTelemetry container.
+
+    ```yaml
+       env:
+       - name: KUBE_NODE_NAME
+         valueFrom:
+     	  fieldRef:
+ 	        apiVersion: v1
+ 	        fieldPath: spec.nodeName
+    ```
+
+    This will inject a new environment variable to the OpenTelemetry container with the value as the
+    name of the node the pod was scheduled to run on.
+
+2. Set "filter.node_from_env_var" to the name of the environment variable holding the node name.
+
+    ```yaml
+       k8s_tagger:
+         filter:
+           node_from_env_var: KUBE_NODE_NAME # this should be same as the var name used in previous step
+    ```
+
+    This will restrict each OpenTelemetry agent to query pods running on the same node only dramatically reducing
+    resource requirements for very large clusters.
+
+#### As a collector
+
+The processor can be deployed both as an agent or as a collector.
+
+When running as a collector, the processor cannot correctly detect the IP address of the pods generating
+the spans when it receives the spans from an agent instead of receiving them directly from the pods. To
+workaround this issue, agents deployed with the k8s_tagger processor can be configured to detect
+the IP addresses and forward them along with the span resources. Collector can then match this IP address
+with k8s pods and enrich the spans with the metadata. In order to set this up, you'll need to complete the
+following steps:
+
+1. Setup agents in passthrough mode
+Configure the agents' k8s_tagger processors to run in passthrough mode.
+
+    ```yaml
+       # k8s_tagger config for agent
+       k8s_tagger:
+         passthrough: true
+    ```
+    This will ensure that the agents detect the IP address as add it as an attribute to all span resources.
+    Agents will not make any k8s API calls, do any discovery of pods or extract any metadata.
+
+2. Configure the collector as usual
+No special configuration changes are needed to be made on the collector. It'll automatically detect
+the IP address of spans sent by the agents as well as directly by other services/pods.
+
+
+### Caveats
+
+There are some edge-cases and scenarios where k8s_tagger will not work properly.
+
+
+#### Host networking mode
+
+The processor cannot correct identify pods running in the host network mode and
+enriching spans generated by such pods is not supported at the moment.
+
+#### As a sidecar
+
+The processor does not support detecting containers from the same pods when running
+as a sidecar. While this can be done, we think it is simpler to just use the kubernetes
+downward API to inject environment variables into the pods and directly use their values
+as tags.
diff --git a/processor/k8sprocessor/config.go b/processor/k8sprocessor/config.go
@@ -62,56 +62,56 @@ type ExtractConfig struct {
 	// documentation for more details.
 	Annotations []FieldExtractConfig `mapstructure:"annotations"`
 
-	// Annotations allows extracting data from pod labels and record it
+	// Labels allows extracting data from pod labels and record it
 	// as resource attributes.
 	// It is a list of FieldExtractConfig type. See FieldExtractConfig
 	// documentation for more details.
 	Labels []FieldExtractConfig `mapstructure:"labels"`
 }
 
-// FieldExtractConfig allows specifying an extraction rule to extract a value from exactly one field.
+//FieldExtractConfig allows specifying an extraction rule to extract a value from exactly one field.
 //
-// The field accepts a list FilterExtractConfig map. The map accepts three keys
-//     tag-name, key and regex
+//The field accepts a list FilterExtractConfig map. The map accepts three keys
+//    tag-name, key and regex
 //
-// - tag-name represents the name of the tag that will be added to the span.
-//   When not specified a default tag name will be used of the format:
-//       k8s.<annotation>.<annotation key>
-//   For example, if tag-name is not specified and the key is git_sha,
-//   then the span name will be `k8s.annotation.deployment.git_sha`.
+//- tag-name represents the name of the tag that will be added to the span.
+//  When not specified a default tag name will be used of the format:
+//      k8s.<annotation>.<annotation key>
+//  For example, if tag-name is not specified and the key is git_sha,
+//  then the span name will be `k8s.annotation.deployment.git_sha`.
 //
-// - key represents the annotation name. This must exactly match an annotation name.
-//   To capture all keys, `*` can be used
+//- key represents the annotation name. This must exactly match an annotation name.
+//  To capture all keys, `*` can be used
 //
-// - regex is an optional field used to extract a sub-string from a complex field value.
-//   The supplied regular expression must contain one named parameter with the string "value"
-//   as the name. For example, if your pod spec contains the following annotation,
+//- regex is an optional field used to extract a sub-string from a complex field value.
+//  The supplied regular expression must contain one named parameter with the string "value"
+//  as the name. For example, if your pod spec contains the following annotation,
 //
 //		kubernetes.io/change-cause: 2019-08-28T18:34:33Z APP_NAME=my-app GIT_SHA=58a1e39 CI_BUILD=4120
 //
-//   and you'd like to extract the GIT_SHA and the CI_BUILD values as tags, then you must
-//   specify the following two extraction rules:
+//  and you'd like to extract the GIT_SHA and the CI_BUILD values as tags, then you must
+//  specify the following two extraction rules:
 //
-//   procesors:
-//     k8s-tagger:
-//       annotations:
-//         - tag_name: git.sha
-//           key: kubernetes.io/change-cause
-//           regex: GIT_SHA=(?P<value>\w+)
-//         - tag_name: ci.build
+//  procesors:
+//    k8s-tagger:
+//      annotations:
+//        - tag_name: git.sha
+//          key: kubernetes.io/change-cause
+//          regex: GIT_SHA=(?P<value>\w+)
+//        - tag_name: ci.build
 //	         key: kubernetes.io/change-cause
-//           regex: JENKINS=(?P<value>[\w]+)
+//          regex: JENKINS=(?P<value>[\w]+)
 //
-//   this will add the `git.sha` and `ci.build` tags to the spans.
+//  this will add the `git.sha` and `ci.build` tags to the spans.
 //
-//   It is also possible to generically fetch all keys and fill them into a template.
-//   To substitute the original name, use `%s`. For example:
+//  It is also possible to generically fetch all keys and fill them into a template.
+//  To substitute the original name, use `%s`. For example:
 //
-//   procesors:
-//     k8s-tagger:
-//       annotations:
-//         - tag_name: k8s.annotation/%s
-//           key: *
+//  procesors:
+//    k8s-tagger:
+//      annotations:
+//        - tag_name: k8s.annotation/%s
+//          key: *
 
 type FieldExtractConfig struct {
 	TagName string `mapstructure:"tag_name"`