-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add socket_path option to enable unix socket traffic to dogstatsd6 #199
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a small comment otherwise 👍 for me
datadog/dogstatsd/base.py
Outdated
@@ -24,7 +24,8 @@ class DogStatsd(object): | |||
OK, WARNING, CRITICAL, UNKNOWN = (0, 1, 2, 3) | |||
|
|||
def __init__(self, host='localhost', port=8125, max_buffer_size=50, namespace=None, | |||
constant_tags=None, use_ms=False, use_default_route=False): | |||
constant_tags=None, use_ms=False, use_default_route=False, | |||
use_unix_socket=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call this param socket_path
this mimics UDP behaviour for unix socket implementation
Added an exception handling for socket.timeout (silently drop the packets), but I need some python-fu on how to handle dogstatsd crashing (generic socket.error). Currently, the exception is graceful handled and traffic resumes as soon as dogstatsd restarts, but the CPU usage is significant (trying on a new socket for every packet). Should we detect repeat failures and throttle? |
@xvello is it still a WIP? |
@yannmh removed the WIP flag as stress test went well |
@@ -265,6 +282,9 @@ def _send_to_server(self, packet): | |||
try: | |||
# If set, use socket directly | |||
(self.socket or self.get_socket()).send(packet.encode(self.encoding)) | |||
except socket.timeout: | |||
# dogstatsd is overflowing, drop the packets (mimicks the UDP behaviour) | |||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? From what I understand, we should never get a timeout on non-blocking mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The python socket implementation implements non-blocking send as a zero-timeout send and sends a socket.timeout
exception if the write does not return immediately (the queue, managed by the kernel, is full), I chose to silently drop the packet, mirroring what UDP would do. With dogstatsd6's goroutine-based intake, this should only happen if the CPU is saturated.
If needed, the socket queue length is configurable via sysctl net.unix.max_dgram_qlen (10 on my test machines).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose to silently drop the packet, mirroring what UDP would do
@xvello I wonder if we should still add some logging here. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could turn into log spamming really fast. But I'll open another PR to add a fail_on_error option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could turn into log spamming really fast
That's a good point!
I am in favor of keeping the code very simple, i.e. avoid adding any back-off mechanism. Do you know what errors to expect? datadogpy/datadog/dogstatsd/base.py Line 294 in b06997a
It may also not be necessary to re-create the socket. |
Here are the errors I could produce: socket.connect
In these cases, we should retry for the next packet socket.send
|
Tested one last time communicating with dogstatsd6, the packet drop logic works, so does recovery after dsd restart. Let's merge that! |
What does this PR do?
dogstatsd6 will allow custom metrics to be sent via a datagram unix socket instead of UDP (see DataDog/datadog-agent#252). This PR add a use_unix_socket option to the DogstatsD object in order to allow using this communication channel.
Motivation
Containers make it harder to reach the dogstatsd server from client applications and we don't have a future-proof network solution that could work across all orchestrators. This allows to bypass the network stack altogether and use a host-local protocol. See DataDog/docker-dd-agent#195 for more context.
As unix datagram and UDP share the same semantics, the patch is pretty small and unobstrusive.
Socket connection is set to non blocking: if the server is unresponsive, the packets are dropped, mirroring what would happen with UDP.
Additional Notes
Still WIP as we need to: