Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Dogstatsd in containerized environments #195

Open
jeremy-lq opened this issue Apr 28, 2017 · 2 comments
Open

[RFC] Dogstatsd in containerized environments #195

jeremy-lq opened this issue Apr 28, 2017 · 2 comments

Comments

@jeremy-lq
Copy link
Member

jeremy-lq commented Apr 28, 2017

Dogstatsd in containerized environments

  • Authors: Xavier Vello (@xvello)
  • Status: draft

Overview

This RFC tries to address the use case of emitting custom metrics from containerized applications via dogstatsd. Our goal is to support a broad base of orchestrators and environments.

Problem

Clients can use one of several libraries to send UDP packets to dogstatsd (running alongside an agent or standalone), for their custom metrics to be forwarded.

For metric transmission to work, we need:

  • The dogstatsd port to be listening and reachable for other containers on the same host
  • A way for client libraries to know what IP and port to reach to

Nowadays, the recommended deployment is to bind to the host’s 8125 port and send the UDP packets there. But other deployment scenarios are requested (see section Dogstatsd deployment scenarios).

As the host tag (if not already present) is added to the metrics by dogstatsd, we need to talk to the host’s dogstatsd and not a random one on the cluster. This is why using load-balanced IPs or one dogstatsd/cluster is not currently supported.

Client using the trace agent will want a consistent behavior for both dogstatsd and APM. The solution should work for both. Currently, the trace libraries all have specific host & port options and recommend host port binding.

Officially supported libraries:

Clients may also use other community developed statsd libraries, maintaining compatibility with them is preferable.

Constraints

  1. Allow a consistent behaviour across all supported client libraries
  2. Support most cases out of the box with limited user configuration requirements.
  3. Be future-proof (container network interface systems, new orchestrator tools) by allowing configurability/extensibility
  4. Metrics origin should be preserved, in order to allow container/pod/orchestrator tags to be maintained or added to metrics
  5. Maintain backwards compatibility with statsd and dogstatsd protocols as they exist today

Dogstatsd deployment scenarios

Case A: One dogstatsd per host, local traffic only

Our recommended deployment so far is to run dogstatsd (standalone or with an agent), either on a container or directly in the host system. In both cases, we bind to port 8125 on the host IP. All major orchestrators support this, and we should enable it in our official installation methods:

  • Kubernetes has NodePort Services that have both a load-balanced IP and a binding on every node’s port. The latter would be used. Future node-local services would help this use case.
  • Nomad : Static Ports can be specified for a job
  • ECS : PortMapping allows containers to bind to a host’s port
  • Rancher : the docker port binding mechanism is supported, but the default managed network mode will probably mess with gateway detection (to be tested)
  • Mesos : Marathon allows you to set "requirePorts" on host mode and "portMappings" in bridge mode

Pros:

  • Tagging metrics with the correct host automatically
  • Flush metrics together, making sure they are in sync

Cons:

  • We need to route the metrics to the same host, which can be tricky with software network solutions
  • Some customers are reaching dogstatsd’s processing capacity

Case B: A load-balanced dogstatsd pool for the cluster

In this scenario, several dogstatsd instances are load-balanced behind a common IP/DNS name. They can be running with agents, or on dedicated containers.

Pros:

  • Higher scalability for custom metrics
  • Easier configuration (just assign it a fixed DNS name)

Cons:

  • Host tag must be added beforehand, or dogstatsd will add it, messing up metric’s origin

There are three ways to address this:

  • Don’t host tag the metrics (need to test whether our pipeline deals nicely with this)
  • Use the originating IP’s reverse DNS to tag the metric (if it is correct)
  • Pre-tag the metrics before sending them to dogstatsd (in the library)

Case C: Dogstatsd as a sidecar container in every pod

Kubernetes pods & Rancher services allow you to run a container alongside your application, and many k8s users want this solution. This could allow container-specific tags to be added, but we could also implement that on the common dogstatsd by matching the originating IP with the orchestrators’ information.

Possible solutions

Binding dogstatsd to the host IP and using the default network gateway

On vanilla docker, the containers use bridge networking : every container uses the host as a default gateway to reach the external network. This is why the datadogpy library parses /proc/net/route to determine the host’s IP address.

Pros:

  • Works out of the box in every setup that uses bridge networking, no orchestrator-specific code or configuration needed
  • Minimal code, easy to port to other languages

Cons:

  • OS specific code (Linux only supported by datadogpy)
  • Default GW assumption doesn’t work if a container network interface is used (Rancher’s default managed mode, Weave/Calico/…). This cannot be the only solution.

Pass host & IP via environment variables and modify libraries

Container Network Interfaces (CNIs) adoption is rapidly rising and assuming the host is the default network gateway does not work on these systems. While we can design CNI/orchestrator specific fixes, implementing and maintaining them in every client library would not be ideal.

The most maintainable solution is to separate the detection logic from the client libraries and pass the IP and port in two environment variables in the using application’s container. If not present, localhost will be used by default.

  • Most orchestrators allow you to inject environment variables into containers via templating in the specs. If not, we should work with their communities to enable this. This would cover 98% of the cases if host-binding.
  • If the orchestrator does not support it, or the client has a specific setup (eg. wants to interface with a service discovery system), they could set the environment variables in their container’s entrypoint.

Pros:

  • Minimal patching to the user libraries

Cons:

  • Customers need to change their application’s deployment to include these variables
  • Custom client libraries need to implement this logic

Recommended client implementation logic

  1. If the library has an existing configuration logic, use it
  2. If $DOGSTATSD_HOSTNAME is set, send to this, on port $DOGSTATSD_PORT if set, on 8126 if not set
  3. If not set, send to localhost:8126

Unfortunately, UDP packet drop can’t be detected, so a try-failover scenario can’t be implemented

Provide a dogstatsd-proxy system

We could provide a proxy listening on localhost:8125 and forwarding metrics to a dogstatsd server, either specified via the environment variables (see above), or through custom specific logic.

The initial implementation could use socat, and allow several use cases:

  • Transmit UDP packets to $DOGSTATSD_HOSTNAME:$DOGSTATSD_PORT, using environment variables set in the container’s template (see above)
  • Tunnel UDP packets through a UNIX socket, to reach the node’s dogstatsd server without any network issues

Later on, we could implement a proxy mode into dogstatsd (once version 6 in Go is production-ready), that could allow:

  • Tagging metrics (hostname / container tags) before retransmission
  • Autodiscovery logic

This proxy would be distributed as:

  • A docker image to run in the same pod as the user application
  • A binary package to install inside the user’s container if their orchestrator doesn’t support sidekick containers

Pros:

  • No need to modify libraries, they will continue to send packets to localhost
  • Seamless switch from socat to next-gen dogstatsd when we want to implement more complex use cases
  • Allows new use cases (networkless, pre-tagging)
  • This can be the sidecar we put forward for “everything in the pod” people

Cons:

  • One more docker image to distribute and maintain
  • Proxying overhead (probably small enough for UDP, but might be higher for trace-agent)
  • Custom logic to develop later on

eBPF

TBD

@jonmoter
Copy link

jonmoter commented May 3, 2017

We're running several Kubernetes clusters, and running Datadog as a Daemonset. We have it configured to listen to hostPort: 8125 (the port on the underlying host.)

Then we have each host configured with a Link-local address, say 169.254.10.10. Then any application that runs in a Kubernetes pod is configured via Environment variable to send dogstatsd metrics to 169.254.10.10:8125, rather than localhost:8125.

We do the same thing with Consul, which we have an instance running on each node.

This takes a bit of extra configuration of the nodes, and a bit of reconfiguring of each app running, but it's working pretty well for us.

@lattwood
Copy link

lattwood commented Aug 11, 2017

We're looking at sidecar-ing https://github.com/DataDog/docker-dd-agent/tree/master/dogstatsd because we have some very high volume dogstatsd submitters, and that will allow for us to account for any dogstatsd overhead associated with scheduling a container.

edit: and we do see overhead, one of our apps submits such a volume of metrics to dogstatsd we routinely see cpu utilization north of 60%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants