Monitoring cluster via external tools #959

jnehlmeier · 2022-10-26T11:18:45Z

Are there any plans to add monitoring capabilities to pg_auto_failover? While the monitor knows the cluster state, it still feels like black box and you will not be notified via mail/messages/whatever if a failover occurs.

I know that a lot of commands have an option to output json and I am relatively sure it exists to build monitoring tools around it. But given that pg_autoctl already manages multiple processes, wouldn't it be great to also provide a simple HTTP endpoint that publishes the state the monitor sees? Or some script hook that will be called once the state changes so that a script can publish the new state somewhere else?

Otherwise there is only a polling solution possible that requires access to pg_autoctl executable (or direct access to the monitor database).

I would really like to have the cluster state visible in a dashboard (e.g. Grafana) and add alerting features on top of it. What is the best practice you have in mind?

The text was updated successfully, but these errors were encountered:

DimCitus · 2022-11-02T11:26:11Z

See #958 for script hooks. Introducing an HTTP API would be nice too. Meanwhile a cgi-bin thing that would call into the pg_autoctl binary using --json might be a good way to have it. Closing this one now because I believe the work in #958 is closing it.

We can revise the HTTP idea later. I believe last time I had a look integrating with https://sqlite.org/althttpd/doc/trunk/althttpd.md seemed a good way forward. I would review a PR that would integrate that lib (vendor it in) and expose information in the JSON format over HTTP then.

s4ke · 2022-11-02T14:12:57Z

While I do think an embedded server will be quite nice and the way to go down the road, it is pretty straight forward to set something like this up. We have a small script for this over at https://github.com/neuroforgede/pg_auto_failover_ansible/tree/master/tools/health_monitor

You would probably run this on the app server where your monitoring lives. Also the tool in the link is quite configurable and you can run arbitrary checks via http.

If you want more granular monitoring, maybe something like a prometheus exporter for postgres would be something you might like.

DimCitus closed this as completed Nov 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring cluster via external tools #959

Monitoring cluster via external tools #959

jnehlmeier commented Oct 26, 2022

DimCitus commented Nov 2, 2022

s4ke commented Nov 2, 2022 •

edited

Loading

Monitoring cluster via external tools #959

Monitoring cluster via external tools #959

Comments

jnehlmeier commented Oct 26, 2022

DimCitus commented Nov 2, 2022

s4ke commented Nov 2, 2022 • edited Loading

s4ke commented Nov 2, 2022 •

edited

Loading