Skip to content

PrometheusEndpointServer throws an exception, after which the endpoint is not available and not restart #415

Open
@NitroLine

Description

@NitroLine

I have django app on uvicorn. I use PROMETHEUS_METRICS_EXPORT_PORT_RANGE=range(8001, 8011) to start metrics on each uvicorn worker. It works fine.

But after some netowork error on server, some workers print execption:

Exception occurred during processing of request from ('106.75.72.22', 52046)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/socketserver.py", line 316, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/local/lib/python3.10/socketserver.py", line 347, in process_request
    self.finish_request(request, client_address)
  File "/usr/local/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/local/lib/python3.10/socketserver.py", line 747, in __init__
    self.handle()
  File "/usr/local/lib/python3.10/http/server.py", line 433, in handle
    self.handle_one_request()
  File "/usr/local/lib/python3.10/http/server.py", line 421, in handle_one_request
    method()
  File "/usr/local/lib/python3.10/site-packages/prometheus_client/exposition.py", line 276, in do_GET
    self.wfile.write(output)
  File "/usr/local/lib/python3.10/socketserver.py", line 826, in write
    self._sock.sendall(b)
BrokenPipeError: [Errno 32] Broken pipe

Or

Exception occurred during processing of request from ('162.142.125.223', 51380)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/socketserver.py", line 316, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/local/lib/python3.10/socketserver.py", line 347, in process_request
    self.finish_request(request, client_address)
  File "/usr/local/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/local/lib/python3.10/socketserver.py", line 747, in __init__
    self.handle()
  File "/usr/local/lib/python3.10/http/server.py", line 433, in handle
    self.handle_one_request()
  File "/usr/local/lib/python3.10/http/server.py", line 421, in handle_one_request
    method()
  File "/usr/local/lib/python3.10/site-packages/prometheus_client/exposition.py", line 276, in do_GET
    self.wfile.write(output)
  File "/usr/local/lib/python3.10/socketserver.py", line 826, in write
    self._sock.sendall(b)
ConnectionResetError: [Errno 104] Connection reset by peer

And some targets in prometheus start show error context deadline exceeded.
(I saw such Traceback four times in logs, and four targets are down now)

So I think the PrometheusEndpointServer process has crashed and won't restart, I'm losing some metrics because of that.
It would be cool if the exporter server automatically restarted if it became unavailable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions