Healthcheck and stats for monitoring #181

ThomDietrich · 2024-01-25T23:12:06Z

Hey @djmaze and all,

not sure if you remember me. We did work on some good little improvements some years ago.
Since then, I've been a happy user of your image. One constant problem I had is lack of visibility. Backups could be paused because of an issue for months, until I eventually get a hold of that. This is of course partially my fault, but also the reason why the observability industry is thriving :)

I would like to discuss how this image could provide users with actionable and informative data on the activities of their backup jobs. Specifically,

A counter for consecutive errors (to delay warnings and notifications beyond a single hick-up)
An indicator for permanent errors (if possible)
Timestamp for the last successful sync (for downtime detection and notification)
Performance stats (because 📊🤩)

How does that sound?

In #171 (comment) you mentioned your solution to some of these points: Healthchecks.io. The service looks good but does not (I believe) solve all of the above. Also, everyone kind of tends to use different tools (I'd like to link Grafana), gladly most of them cater to the same needs.

Long story short, I propose to

Generate a string of stats after each backup run. This string could be in any of the common formats, like JSON, Prometheus, or ... / or even translated to some of them
Provide the stats string as an env variable to POST_COMMANDS_SUCCESS etc. for e.g. healthchecks.io, telegraf, or prometheus (push principle)
Provide the stats string via an http endpoint (pub principle)

What do you think? Cheers!

The text was updated successfully, but these errors were encountered:

escoand · 2024-01-26T14:59:08Z

I was interested in something similar, in my case to get monitored with Home Assistant. So I execute restic --no-lock stats latest --json and restic --no-lock snapshots latest --json every now and than. Not that much information but I first starting point.

The question is how to expose these values. I use a http rest endpoint to post these two jsons.

ThomDietrich · 2024-01-26T22:11:26Z

Hey @escoand the stats command is certainly a good start but it does not offer enough details about individual sync runs and snapshot, especially failures.

Regarding the transmission, I am inclined to say hat I would love to see both strategies (push and pub) implemented. This seems rather trivial after all.

The real issue here is retrieving the data from restic. I believe we all agree that human-readable output on the docker logs is desired, hence we can't just switch the backup command to --json. I think this is the way to go: restic/restic#3274

spychodelics · 2024-02-16T12:58:12Z

I am new to restic and resticker, right now i added a "dirty" restic ..... > filename_date.log 2>filename_error_date.log at the end of the restic backup stuff in the backup script. So i get a human readable logfile.

Now i am able to send me the 2 log files via apprise/email as attachment, but when im trying to send me the logs via apprise/curl as "Body" in an email and i dont get it working .. so far.

POST_COMMANDS_SUCCESS & POST_COMMANDS_FAILURE & POST_COMMANDS_INCOMPLETE Scripts help...

djmaze · 2024-02-17T17:59:37Z

Sorry for answering this late. @ThomDietrich Yeah, remembering :) Concerning your ideas, indeed Healthchecks.io solves all of those except the last one (performance stats) somehow for me. But I get that other people want to have different tools and more detailed insights.

The points 1 to 2 could probably be implemented quite easily. About point 3, that would mean running an additional http service somehow, which I am not a fan of. That also should be totally optional. (I see that there is also a request for improving restic itself.)

All that said, personally I am not really interested to use this additional functionality (not doing advanced performance monitoring for my stuff currently), so: I would be open for more detailed proposals or even PRs.

ThomDietrich mentioned this issue Jan 26, 2024

Add an option to send JSON output to a separate file descriptor restic/restic#3274

Open

djmaze added the enhancement label Apr 9, 2024

ThomDietrich mentioned this issue Aug 30, 2024

Introduce ENABLE_JSON_LOGGING #215

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Healthcheck and stats for monitoring #181

Healthcheck and stats for monitoring #181

ThomDietrich commented Jan 25, 2024 •

edited

Loading

escoand commented Jan 26, 2024 •

edited

Loading

ThomDietrich commented Jan 26, 2024 •

edited

Loading

spychodelics commented Feb 16, 2024

djmaze commented Feb 17, 2024

Healthcheck and stats for monitoring #181

Healthcheck and stats for monitoring #181

Comments

ThomDietrich commented Jan 25, 2024 • edited Loading

escoand commented Jan 26, 2024 • edited Loading

ThomDietrich commented Jan 26, 2024 • edited Loading

spychodelics commented Feb 16, 2024

djmaze commented Feb 17, 2024

ThomDietrich commented Jan 25, 2024 •

edited

Loading

escoand commented Jan 26, 2024 •

edited

Loading

ThomDietrich commented Jan 26, 2024 •

edited

Loading