Skip to content

Commit

Permalink
Add supervisor specific metrics (#54)
Browse files Browse the repository at this point in the history
  • Loading branch information
tschaefer authored Jan 15, 2025
1 parent 4e3bd0f commit d4e0662
Show file tree
Hide file tree
Showing 6 changed files with 86 additions and 16 deletions.
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Supervisor is a Docker GitOps service that allows you to manage `docker-compose`
* [Delete Stack](#delete-stack)
* [Get Stack Log](#get-stack-log)
* [Control Stack](#control-stack)
* [Metrics](#metrics)
* [Dashboard](#dashboard)
* [License](#license)
* [Is it any good?](#is-it-any-good)
Expand Down Expand Up @@ -237,6 +238,19 @@ curl --request POST \
https://supervisor.example.com/stacks/<stack_uuid>/control
```

## Metrics

To retrieve Prometheus metrics, you can access the
`http://supervisor.example.com:9394/metrics` endpoint.

* `supervisor_total_stacks`: The total number of stacks. (gauge)
* `supervisor_total_healthy_stacks`: The total number of healthy stacks. (gauge)
* `supervisor_total_unhealthy_stacks`: The total number of unhealthy stacks. (gauge)
* `supervisor_jobs_execution_time`: The time taken to execute stack jobs, measured in seconds. (histogram)
* `supervisor_jobs_executed_total`: The total number of stack jobs executed. (counter)
* `supervisor_jobs_succeeded_total`: The total number of stack jobs that succeeded. (counter)
* `supervisor_jobs_failed_total`: The total number of stack jobs that failed. (counter)

## Dashboard

Supervisor provides a simple dashboard to view and monitor stacks. The
Expand Down
4 changes: 3 additions & 1 deletion app/jobs/stack_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ def perform(stack)

def execute
script = render_script(@stack, @assets)
run_script(script)
Yabeda.supervisor.stack_jobs_execution_time.measure do
run_script(script)
end
return if instance_of?(StackDestroyJob)

stack_log
Expand Down
4 changes: 2 additions & 2 deletions app/jobs/stack_job/handles_execute_result.rb
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@ def stack_log
end

def stack_stats
@stack.update_stats(failed: error?, action: __action.split.second)
@stack.update_stats(succeeded: success?, action: __action.split.second)
end

def stack_health
@stack.update(healthy: !error?)
@stack.update(healthy: success?)
end

def __action
Expand Down
34 changes: 24 additions & 10 deletions app/models/stack/has_stats.rb
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,31 @@ def stats
)
end

def update_stats(failed: false, action: nil)
processed = self.processed + 1
failed = failed ? self.failed + 1 : self.failed
last_run = processed ? Time.current : self.last_run
last_action = action || 'unknown'

def update_stats(succeeded: false, action: nil)
update(
processed: processed,
failed: failed,
last_run: last_run,
last_action: last_action
processed: processed + 1,
failed: succeeded ? failed : failed + 1,
last_run: Time.current,
last_action: action || 'unknown'
)

status = succeeded ? 'succeeded' : 'failed'

Yabeda.supervisor.stack_jobs_executed_total.increment(
{
name: name,
action: last_action,
status: succeeded ? 'succeeded' : 'failed'
},
by: 1
)

Yabeda.supervisor.send(:"stack_jobs_#{status}_total").increment(
{
name: name,
action: last_action
},
by: 1
)
end
end
Expand Down
43 changes: 43 additions & 0 deletions config/initializers/metrics.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
require 'prometheus/client/support/puma'

Prometheus::Client.configuration.pid_provider = Prometheus::Client::Support::Puma.method(:worker_pid_provider)

Yabeda.configure do
group :supervisor do
counter :stack_jobs_executed_total do
comment 'The total number of stack jobs executed.'
tags %i[name action status]
end
counter :stack_jobs_failed_total do
comment 'The total number of stack jobs that failed.'
tags %i[name action]
end
counter :stack_jobs_succeeded_total do
comment 'The total number of stack jobs that succeeded.'
tags %i[name action]
end
histogram :stack_jobs_execution_time do
comment 'The time taken to execute stack jobs, measured in seconds.'
unit :seconds
buckets [
0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10,
30, 60, 120, 300, 1800, 3600, 86_400
].freeze
end
gauge :total_stacks do
comment 'The total number of stacks.'
end
gauge :total_healthy_stacks do
comment 'The total number of healthy stacks.'
end
gauge :total_unhealthy_stacks do
comment 'The total number of unhealthy stacks.'
end

collect do
supervisor.total_stacks.set({}, Stack.count)
supervisor.total_healthy_stacks.set({}, Stack.where(healthy: true).count)
supervisor.total_unhealthy_stacks.set({}, Stack.where(healthy: false).count)
end
end
end
3 changes: 0 additions & 3 deletions config/initializers/yabeda.rb

This file was deleted.

0 comments on commit d4e0662

Please sign in to comment.