Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket.gaierror is raised if the host is not known #130

Open
southworthy opened this issue Nov 14, 2019 · 2 comments
Open

socket.gaierror is raised if the host is not known #130

southworthy opened this issue Nov 14, 2019 · 2 comments
Labels
help-wanted open to a pull request to address this issue

Comments

@southworthy
Copy link

Curious to know if this is intended behaviour or not.

Example:

>>> import statsd
>>> # use the default, "localhost"
... # there is _not_ a statsd server running on localhost:8125
... # initialises ok,
... sc = statsd.StatsClient()
>>> # this time, specify a random hostname
... # again, no statsd server running at this host
... # initialisation causes error
... sc = statsd.StatsClient(host='foobar')
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/usr/local/lib/python3.7/site-packages/statsd/client/udp.py", line 35, in __init__
    host, port, fam, socket.SOCK_DGRAM)[0]
  File "/usr/local/lib/python3.7/socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

I'm wondering if it would be better to catch-pass this error? On the one hand, it's more in-line with the philosophy that errors with stats should not bring down the application, but on the other, I could imagine wanting to be sure that my application can log anything at all.

Perhaps a middle ground would be some kind of strict=True/False argument, which toggles between the behaviours?

@jsocol
Copy link
Owner

jsocol commented Nov 14, 2019

Wow this is some... old code. This was originally added in a completely different place in #6, and has moved along for the last 8 years.

I'd go so far as to say this is roughly deliberate but certainly open to re-evaluation, especially if there's a way to make it backwards compatible. It's also vaguely related to #125, since that would move the logic for when DNS resolution happens.

I don't have a super strong opinion here, other than "we don't want to do DNS lookups on every metric".

@southworthy
Copy link
Author

Ok; I had a few ideas for how this could be done, hopefully at some point I'll be able to open a PR. I'll note down a few ideas here in case anyone else wants to try:

  • add keyword arg to switch between "error if I can't resolve the host" and "fail silently if I can't resolve the host"
  • add callback to invoke if host can't be resolved
  • don't try to resolve hostname until the first time you try to send a stat (and then have normal behaviour of catch-pass socket.error).
  • periodically re-resolve host, e.g sc = StatsClient(ttl=timedelta(minutes=5)) would automatically attempt to re-resolve the host on the next stat update after 5 minutes. (This might also help with Add a method to re-run DNS lookup #125)

csestelo pushed a commit to Destygo/pystatsd that referenced this issue Jul 18, 2022
* Allow the start node to not be an answer node

* Upgrade requirements
@jsocol jsocol added the help-wanted open to a pull request to address this issue label Nov 6, 2022
JeanFred added a commit to JeanFred/inteGraality that referenced this issue Dec 26, 2022
Since f45589b, the application is instrumented and sends
metrics to the Wikimedia Cloud Services statsd endpoint.

It was initially sent to cloudmetrics1002,
then switched to cloudmetrics1001 (6de2308).

Since a few weeks, logs report issues reaching that endpoint [1].

This started on 2022-11-20 according to the otherwise fine Shell logs.
(Not sure exactly when it started for the Python webservice,
as the uwsgi logfile ballooned to 31 million lines,
making precise investigation a tad difficult).

It turns out cloudmetrics1002 (as well as cloudmetrics1001)
were decommissioned on November 16th (see T297444).

It also turns out the pystatsd library plain crashes
if the statsd host is unreachable [2].

This replaces the host with the new cloudmetrics1003.

Bug: T325936

[1] Python traceback:
```
Traceback (most recent call last):
  File "/data/project/integraality/www/python/src/app.py", line 10, in <module>
    from pages_processor import PagesProcessor, ProcessingException
  File "./pages_processor.py", line 16, in <module>
    from property_statistics import PropertyStatistics, QueryException
  File "./property_statistics.py", line 16, in <module>
    from statsd.defaults.env import statsd
  File "/data/project/integraality/www/python/venv/lib/python3.7/site-packages/statsd/defaults/env.py", line 17, in <module>
    maxudpsize=maxudpsize, ipv6=ipv6)
  File "/data/project/integraality/www/python/venv/lib/python3.7/site-packages/statsd/client/udp.py", line 35, in __init__
    host, port, fam, socket.SOCK_DGRAM)[0]
  File "/usr/lib/python3.7/socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
```

[2] jsocol/pystatsd#130
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help-wanted open to a pull request to address this issue
Projects
None yet
Development

No branches or pull requests

2 participants