This module allows for tracking of statistics about batch jobs including number of items or records processed in each stage of a job, how long each stage takes, and when the job as a whole last succeeded. This enables configuration of a single alert covering jobs that use these metrics.
To run the tests successfully, you must set the PUSHGATEWAY
environment variable:
export PUSHGATEWAY=http://localhost:9091
cpanm https://github.com/hathitrust/progress_tracker.git
Alternatively, you can install without running tests:
cpanm --notest https://github.com/hathitrust/progress_tracker.git
use ProgressTracker;
my $tracker = ProgressTracker->new();
while(my $line = <>) {
# do some stuff..
$tracker->inc();
}
$tracker->finalize;
This will report the number of lines processed to the push gateway every 1,000 lines.
In order to report to the pushgateway, the PUSHGATEWAY
environment variable
must be set, or a pushgateway
argument can be supplied to
ProgressTracker->new()
. Otherwise, ProgressTracker
will warn this is not
provided.
$tracker->start_stage('first_stage');
foreach my $item (@items) {
# do some stuff
$tracker->inc();
}
$tracker->start_stage('second_stage');
foreach my $item (@other_items) {
# do some other stuff
$tracker->inc();
}
$tracker->finalize();
You can also use this functionality to report on (for example) the number of items performed in some other external operation:
$tracker->start_stage('first_stage');
my $count = go_do_some_things();
$tracker->inc($count);
$tracker->start_stage('second_stage');
my $second_count = go_do_some_other_things();
$tracker->inc($second_count);
$tracker->finalize();
Each stage will be reported with a separate stage
label to the push gateway.
Except for report_interval
, all can also be set with environment variables.
The URL to the push gateway must be specified either here or as an environment
variable. All other parameters are optional.
my $tracker = ProgressTracker->new(
job => 'jobname.pl',
pushgateway => 'http://localhost:9091',
namespace => 'namespace',
app => 'app',
report_interval => 1000,
success_interval => 65*60
);
job
: Name of the job. Defaults to the script name ($0
) or theJOB_NAME
env var. Populatesjob
label in metrics.pushgateway
: URL to the push gateway. Must be provided here or viaPUSHGATEWAY
env var.namespace
: Optional. Populatesnamespace
label in metrics. Typically the Kubernetes namespace the job is running in.app
: Optional. Populatesapp
label in metrics.report_interval
:ProgressTracker
will push metrics to the push gateway whenever this many records have been processed. Defaults to every 1,000 records processed. Ideally, set this to the number of records that can be processed between intervals when Prometheus scrapes your push gateway. By default, Prometheus scrapes the push gateway every 15 seconds, so setting this to the number of records you expect to process in 15 seconds would be reasonable.success_interval
: If set, used to populate thejob_expected_success_interval
metric. Can be used with the genericJobCompletionTimeoutExceeded
alert below. Set to the expected interval between completions of your job with some allowance for variance to prevent spurious alerts. For example, if this is a short-running job you expect to complete once per day, you might set this to 86700 seconds: 86400 seconds for one day, plus 5 minutes to cover variance in run time.
All metrics are gauges, because they can reset between runs of the job.
-
job_duration_seconds
: Time spend running job in seconds, or a particular stage if thestage
label is present. -
job_expected_success_interval
Maximum expected time in seconds between job completions. Set once when the job starts. -
job_records_processed
: Count of records processed by the job, or a particular stage if thestage
label is set.
job
: The name of the process or script that is running. Defaults to the filename of the running script ($0
). Can be set with theJOB_NAME
environment variable or thejob
parameter toProgressTracker->new()
. Required.app
: The name of the application the job belongs to. Can be set with theJOB_APP
environment variable or theapp
parameter toProgressTracker->new()
. Optional.namespace
: The Kubernetes namespace this job is running in. Can be set with theJOB_NAMESPACE
environment variable or thenamespace
parameter toProgressTracker->new()
. Optional.stage
: The part of the job to which the metrics pertain. Can be set by callingset_stage
, which then starts tracking duration and record count for that particular stage.
This alert will fire when a job exceeds its expected success interval.
- alert: JobCompletionTimeoutExceeded
expr: "time() - job_last_success > job_expected_success_interval"
for: 0m
labels:
severity: warning
annotations:
summary: "Job {{$labels.job}} has not completed successfully"
description: "Job {{$labels.job}} has not completed successfully\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"