Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CICD metrics #1681

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
72b225b
[cicd] add pipeline run duration metric
christophe-kamphaus-jemmic Dec 13, 2024
1b10459
[cicd] add cicd queue metrics
christophe-kamphaus-jemmic Dec 13, 2024
d820586
[cicd] add cicd worker count metric
christophe-kamphaus-jemmic Dec 13, 2024
a2edc16
[cicd] add cicd error count
christophe-kamphaus-jemmic Dec 13, 2024
3e1537b
Update vscode settings to align with markdown-toc --no-first-h1
christophe-kamphaus-jemmic Dec 13, 2024
7d4affa
[cicd] update examples of cicd.pipeline.result
christophe-kamphaus-jemmic Dec 13, 2024
70f2bd6
[cicd] add changelog entry
christophe-kamphaus-jemmic Dec 13, 2024
1c4e584
[cicd] update brief to add missing article
christophe-kamphaus-jemmic Dec 13, 2024
2f9b857
[cicd] improve metric brief
christophe-kamphaus-jemmic Dec 13, 2024
65b854f
Merge branch 'main' into 1600-cicd-metrics
christophe-kamphaus-jemmic Dec 14, 2024
a044341
[cicd] add skipped as possible cicd.pipeline.result value
christophe-kamphaus-jemmic Dec 14, 2024
7330043
[cicd] add down as possible cicd.worker.state value
christophe-kamphaus-jemmic Dec 14, 2024
a243084
[cicd] improve brief and add note for idle cicd.worker.state
christophe-kamphaus-jemmic Dec 14, 2024
3a31dd9
[cicd] add metric cicd.pipeline.run.executing
christophe-kamphaus-jemmic Dec 14, 2024
66732c5
[cicd] add cicd.worker.type container
christophe-kamphaus-jemmic Dec 16, 2024
2e8ab02
[cicd] mark all cicd metrics as recommended
christophe-kamphaus-jemmic Dec 16, 2024
1562dc6
Merge branch 'main' into 1600-cicd-metrics
christophe-kamphaus-jemmic Dec 25, 2024
dc09536
[cicd] improve brief of `cicd.worker.count` metric
christophe-kamphaus-jemmic Dec 25, 2024
26d3064
[cicd] renamed attribute `cicd.worker.type` to `cicd.worker.class`
christophe-kamphaus-jemmic Dec 25, 2024
a2202cd
[cicd] adapt metric brief following attribute rename
christophe-kamphaus-jemmic Dec 25, 2024
cc76185
Merge branch 'main' into 1600-cicd-metrics
joaopgrassi Dec 30, 2024
89f6a8f
[cicd] rename unit {pipeline_run} to {run}
christophe-kamphaus-jemmic Jan 9, 2025
81c4407
[cicd] use consistent naming for cicd.pipeline.result values
christophe-kamphaus-jemmic Jan 9, 2025
c1bc664
[cicd] add error.type attribute to cicd.pipeline.run.duration metric
christophe-kamphaus-jemmic Jan 9, 2025
226b835
[cicd] rename metric cicd.pipeline.run.executing to cicd.pipeline.run…
christophe-kamphaus-jemmic Jan 9, 2025
55414ee
[cicd] rename metric cicd.queue.latency to cicd.pipeline.run.time_in_…
christophe-kamphaus-jemmic Jan 9, 2025
fc48dd2
[cicd] rename metric cicd.errors to cicd.system.errors
christophe-kamphaus-jemmic Jan 9, 2025
0be8d4d
[cicd] Add metric cicd.pipeline.run.errors
christophe-kamphaus-jemmic Jan 9, 2025
996e809
[cicd] added attribute cicd.system.component, added it to cicd.system…
christophe-kamphaus-jemmic Jan 9, 2025
b332638
[cicd] remove cicd.worker.class attribute
christophe-kamphaus-jemmic Jan 9, 2025
cf67fd2
[cicd] adapt brief of cicd.worker.count metric
christophe-kamphaus-jemmic Jan 9, 2025
702ca16
[cicd] rename metric cicd.queue.length to cicd.pipeline.run.queued
christophe-kamphaus-jemmic Jan 9, 2025
b113166
[cicd] fix yamllint
christophe-kamphaus-jemmic Jan 9, 2025
adf29b4
[cicd] rename result value cancel to cancellation
christophe-kamphaus-jemmic Jan 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .chloggen/1600-cicd-metrics.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Use this changelog template to create an entry for release notes.
#
# If your change doesn't affect end users you should instead start
# your pull request title with [chore] or use the "Skip Changelog" label.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db)
component: cicd

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add CICD metrics

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
# The values here must be integers.
issues: [1600]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
Makes the following changes:

- Add metrics `cicd.pipeline.run.duration`, `cicd.queue.latency`, `cicd.queue.length`, `cicd.worker.count`, `cicd.errors`.
- The CICD attributes `cicd.pipeline.result`, `cicd.worker.state` and `cicd.worker.type` have been added to the registry.
3 changes: 2 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,6 @@
"model/**/*.yaml"
]
},
"json.schemaDownload.enable": true
"json.schemaDownload.enable": true,
"markdown.extension.toc.levels": "2..6"
christophe-kamphaus-jemmic marked this conversation as resolved.
Show resolved Hide resolved
}
33 changes: 33 additions & 0 deletions docs/attributes-registry/cicd.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,26 @@ This group describes attributes specific to pipelines within a Continuous Integr
| Attribute | Type | Description | Examples | Stability |
|---|---|---|---|---|
| <a id="cicd-pipeline-name" href="#cicd-pipeline-name">`cicd.pipeline.name`</a> | string | The human readable name of the pipeline within a CI/CD system. | `Build and Test`; `Lint`; `Deploy Go Project`; `deploy_to_environment` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="cicd-pipeline-result" href="#cicd-pipeline-result">`cicd.pipeline.result`</a> | string | The result of a pipeline run. | `success`; `failure`; `timeout` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
christophe-kamphaus-jemmic marked this conversation as resolved.
Show resolved Hide resolved
| <a id="cicd-pipeline-run-id" href="#cicd-pipeline-run-id">`cicd.pipeline.run.id`</a> | string | The unique identifier of a pipeline run within a CI/CD system. | `120912` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="cicd-pipeline-task-name" href="#cicd-pipeline-task-name">`cicd.pipeline.task.name`</a> | string | The human readable name of a task within a pipeline. Task here most closely aligns with a [computing process](https://wikipedia.org/wiki/Pipeline_(computing)) in a pipeline. Other terms for tasks include commands, steps, and procedures. | `Run GoLang Linter`; `Go Build`; `go-test`; `deploy_binary` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="cicd-pipeline-task-run-id" href="#cicd-pipeline-task-run-id">`cicd.pipeline.task.run.id`</a> | string | The unique identifier of a task run within a pipeline. | `12097` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="cicd-pipeline-task-run-url-full" href="#cicd-pipeline-task-run-url-full">`cicd.pipeline.task.run.url.full`</a> | string | The [URL](https://wikipedia.org/wiki/URL) of the pipeline run providing the complete address in order to locate and identify the pipeline run. | `https://github.com/open-telemetry/semantic-conventions/actions/runs/9753949763/job/26920038674?pr=1075` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="cicd-pipeline-task-type" href="#cicd-pipeline-task-type">`cicd.pipeline.task.type`</a> | string | The type of the task within a pipeline. | `build`; `test`; `deploy` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="cicd-worker-state" href="#cicd-worker-state">`cicd.worker.state`</a> | string | The state of a CICD worker / agent. | `idle`; `busy` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
christophe-kamphaus-jemmic marked this conversation as resolved.
Show resolved Hide resolved
| <a id="cicd-worker-type" href="#cicd-worker-type">`cicd.worker.type`</a> | string | The type of worker / agent used by the CICD system. | `vm`; `pod` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
christophe-kamphaus-jemmic marked this conversation as resolved.
Show resolved Hide resolved

---

`cicd.pipeline.result` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

| Value | Description | Stability |
|---|---|---|
christophe-kamphaus-jemmic marked this conversation as resolved.
Show resolved Hide resolved
| `cancelled` | The pipeline run was cancelled, eg. by a user manually cancelling the pipeline run. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `error` | The pipeline run failed due to an error in the CICD system, eg. due to the worker being killed. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `failure` | The pipeline run did not finish successfully, eg. due to a compile error or a failing test. Such failures are usually detected by non-zero exit codes of the tools executed in the pipeline run. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `success` | The pipeline run finished successfully. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `timeout` | A timeout caused the pipeline run to be interrupted. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

---

Expand All @@ -28,3 +43,21 @@ This group describes attributes specific to pipelines within a Continuous Integr
| `build` | build | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `deploy` | deploy | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `test` | test | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

---

`cicd.worker.state` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
christophe-kamphaus-jemmic marked this conversation as resolved.
Show resolved Hide resolved

| Value | Description | Stability |
|---|---|---|
| `busy` | The worker is performing work for the CICD system. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `idle` | The worker is not performing work for the CICD system. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

---

`cicd.worker.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
christophe-kamphaus-jemmic marked this conversation as resolved.
Show resolved Hide resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In many complex CI/CDs the workers are categorized by other attributes, such as the platform they run (ubuntu, windows) the tooling they have installed (java, go) or other attributes.

I lean towards leaving this attribute open (without pre-defined values). Also consider class as an alternative name to reflect its intention to group the workers into categories.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed this attribute to cicd.worker.class in 26d3064.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now actually removed cicd.worker.class in b332638.
See the discussion #1681 (comment) for reference.


| Value | Description | Stability |
|---|---|---|
| `pod` | One or more containers deployed together. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `vm` | A virtual machine or baremetal host. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
194 changes: 194 additions & 0 deletions docs/cicd/cicd-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ linkTitle: CICD metrics

<!-- toc -->

- [CICD Metrics](#cicd-metrics)
- [Metric: `cicd.pipeline.run.duration`](#metric-cicdpipelinerunduration)
- [Metric: `cicd.queue.latency`](#metric-cicdqueuelatency)
- [Metric: `cicd.queue.length`](#metric-cicdqueuelength)
- [Metric: `cicd.worker.count`](#metric-cicdworkercount)
- [Metric: `cicd.errors`](#metric-cicderrors)
- [VCS Metrics](#vcs-metrics)
- [Metric: `vcs.change.count`](#metric-vcschangecount)
- [Metric: `vcs.change.duration`](#metric-vcschangeduration)
Expand All @@ -23,6 +29,193 @@ linkTitle: CICD metrics

<!-- tocstop -->

## CICD Metrics

The conventions described in this section are specific to Continuous Integration / Continuous Deployment (CICD) systems.

**Disclaimer:** These are initial CICD metrics and attributes
but more may be added in the future.

### Metric: `cicd.pipeline.run.duration`

This metric is [required][MetricRequired].

<!-- semconv metric.cicd.pipeline.run.duration -->
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
<!-- see templates/registry/markdown/snippet.md.j2 -->
<!-- prettier-ignore-start -->
<!-- markdownlint-capture -->
<!-- markdownlint-disable -->

| Name | Instrument Type | Unit (UCUM) | Description | Stability |
| -------- | --------------- | ----------- | -------------- | --------- |
| `cicd.pipeline.run.duration` | Histogram | `s` | Duration of a pipeline run grouped by pipeline and result. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
|---|---|---|---|---|---|
| [`cicd.pipeline.name`](/docs/attributes-registry/cicd.md) | string | The human readable name of the pipeline within a CI/CD system. | `Build and Test`; `Lint`; `Deploy Go Project`; `deploy_to_environment` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`cicd.pipeline.result`](/docs/attributes-registry/cicd.md) | string | The result of a pipeline run. | `success`; `failure`; `timeout` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

---

`cicd.pipeline.result` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

| Value | Description | Stability |
|---|---|---|
| `cancelled` | The pipeline run was cancelled, eg. by a user manually cancelling the pipeline run. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
christophe-kamphaus-jemmic marked this conversation as resolved.
Show resolved Hide resolved
| `error` | The pipeline run failed due to an error in the CICD system, eg. due to the worker being killed. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `failure` | The pipeline run did not finish successfully, eg. due to a compile error or a failing test. Such failures are usually detected by non-zero exit codes of the tools executed in the pipeline run. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `success` | The pipeline run finished successfully. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `timeout` | A timeout caused the pipeline run to be interrupted. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

### Metric: `cicd.queue.latency`

This metric is [required][MetricRecommended].

<!-- semconv metric.cicd.queue.latency -->
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
<!-- see templates/registry/markdown/snippet.md.j2 -->
<!-- prettier-ignore-start -->
<!-- markdownlint-capture -->
<!-- markdownlint-disable -->

| Name | Instrument Type | Unit (UCUM) | Description | Stability |
| -------- | --------------- | ----------- | -------------- | --------- |
| `cicd.queue.latency` | Histogram | `s` | The duration a pipeline run takes from being triggered to the start of execution. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
|---|---|---|---|---|---|
| [`cicd.pipeline.name`](/docs/attributes-registry/cicd.md) | string | The human readable name of the pipeline within a CI/CD system. | `Build and Test`; `Lint`; `Deploy Go Project`; `deploy_to_environment` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

### Metric: `cicd.queue.length`

This metric is [required][MetricRecommended].

<!-- semconv metric.cicd.queue.length -->
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
<!-- see templates/registry/markdown/snippet.md.j2 -->
<!-- prettier-ignore-start -->
<!-- markdownlint-capture -->
<!-- markdownlint-disable -->

| Name | Instrument Type | Unit (UCUM) | Description | Stability |
| -------- | --------------- | ----------- | -------------- | --------- |
| `cicd.queue.length` | UpDownCounter | `{pipeline_run}` | The number of pipeline runs waiting for their start of execution. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
christophe-kamphaus-jemmic marked this conversation as resolved.
Show resolved Hide resolved

| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
|---|---|---|---|---|---|
| [`cicd.pipeline.name`](/docs/attributes-registry/cicd.md) | string | The human readable name of the pipeline within a CI/CD system. | `Build and Test`; `Lint`; `Deploy Go Project`; `deploy_to_environment` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

### Metric: `cicd.worker.count`

This metric is [required][MetricRecommended].

<!-- semconv metric.cicd.worker.count -->
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
<!-- see templates/registry/markdown/snippet.md.j2 -->
<!-- prettier-ignore-start -->
<!-- markdownlint-capture -->
<!-- markdownlint-disable -->

| Name | Instrument Type | Unit (UCUM) | Description | Stability |
| -------- | --------------- | ----------- | -------------- | --------- |
| `cicd.worker.count` | UpDownCounter | `{count}` | The number of workers available to the CICD system and/or busy. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
|---|---|---|---|---|---|
| [`cicd.worker.state`](/docs/attributes-registry/cicd.md) | string | The state of a CICD worker / agent. | `idle`; `busy` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`cicd.worker.type`](/docs/attributes-registry/cicd.md) | string | The type of worker / agent used by the CICD system. | `vm`; `pod` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

---

`cicd.worker.state` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

| Value | Description | Stability |
|---|---|---|
| `busy` | The worker is performing work for the CICD system. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `idle` | The worker is not performing work for the CICD system. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

---

`cicd.worker.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

| Value | Description | Stability |
|---|---|---|
| `pod` | One or more containers deployed together. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `vm` | A virtual machine or baremetal host. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

### Metric: `cicd.errors`

This metric is [required][MetricRecommended].

<!-- semconv metric.cicd.errors -->
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
<!-- see templates/registry/markdown/snippet.md.j2 -->
<!-- prettier-ignore-start -->
<!-- markdownlint-capture -->
<!-- markdownlint-disable -->

| Name | Instrument Type | Unit (UCUM) | Description | Stability |
| -------- | --------------- | ----------- | -------------- | --------- |
| `cicd.errors` | Counter | `{error}` | The number of errors in the controller of the CICD system. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
|---|---|---|---|---|---|
| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Required` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |

**[1] `error.type`:** The `error.type` SHOULD be predictable, and SHOULD have low cardinality.

When `error.type` is set to a type (e.g., an exception type), its
canonical class name identifying the type within the artifact SHOULD be used.

Instrumentations SHOULD document the list of errors they report.

The cardinality of `error.type` within one instrumentation library SHOULD be low.
Telemetry consumers that aggregate data from multiple instrumentation libraries and applications
should be prepared for `error.type` to have high cardinality at query time when no
additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set `error.type`.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes),
it's RECOMMENDED to:

- Use a domain-specific attribute
- Set `error.type` to capture all errors, regardless of whether they are defined within the domain-specific set or not.

---

`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

| Value | Description | Stability |
|---|---|---|
| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |

<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

## VCS Metrics

The conventions described in this section are specific to Version Control Systems.
Expand Down Expand Up @@ -429,3 +622,4 @@ the same backends.
[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
[MetricOptIn]: /docs/general/metric-requirement-level.md#opt-in
[MetricRecommended]: /docs/general/metric-requirement-level.md#recommended
[MetricRequired]: /docs/general/metric-requirement-level.md#required
55 changes: 55 additions & 0 deletions model/cicd/metrics.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
groups:
- id: metric.cicd.pipeline.run.duration
type: metric
metric_name: cicd.pipeline.run.duration
brief: 'Duration of a pipeline run grouped by pipeline and result.'
instrument: histogram
unit: "s"
stability: experimental
attributes:
- ref: cicd.pipeline.name
requirement_level: required
- ref: cicd.pipeline.result
requirement_level: required
christophe-kamphaus-jemmic marked this conversation as resolved.
Show resolved Hide resolved
- id: metric.cicd.queue.latency
type: metric
metric_name: cicd.queue.latency
brief: 'The duration a pipeline run takes from being triggered to the start of execution.'
christophe-kamphaus-jemmic marked this conversation as resolved.
Show resolved Hide resolved
instrument: histogram
unit: "s"
stability: experimental
attributes:
- ref: cicd.pipeline.name
requirement_level: recommended
- id: metric.cicd.queue.length
type: metric
metric_name: cicd.queue.length
christophe-kamphaus-jemmic marked this conversation as resolved.
Show resolved Hide resolved
brief: 'The number of pipeline runs waiting for their start of execution.'
instrument: updowncounter
joaopgrassi marked this conversation as resolved.
Show resolved Hide resolved
unit: "{pipeline_run}"
christophe-kamphaus-jemmic marked this conversation as resolved.
Show resolved Hide resolved
stability: experimental
attributes:
- ref: cicd.pipeline.name
requirement_level: recommended
- id: metric.cicd.worker.count
type: metric
metric_name: cicd.worker.count
brief: 'The number of workers available to the CICD system and/or busy.'
christophe-kamphaus-jemmic marked this conversation as resolved.
Show resolved Hide resolved
instrument: updowncounter
unit: "{count}"
stability: experimental
attributes:
- ref: cicd.worker.type
requirement_level: required
- ref: cicd.worker.state
requirement_level: required
- id: metric.cicd.errors
christophe-kamphaus-jemmic marked this conversation as resolved.
Show resolved Hide resolved
type: metric
metric_name: cicd.errors
brief: 'The number of errors in the controller of the CICD system.'
Copy link
Contributor

@lmolkova lmolkova Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cicd.errors name seems too broad - it seems it would cover all kinds of errors in all CI/CD system parts and then it stops being practical.

The first time I read it, I assumed it to count pipeline run errors, but it's probably something else.

Wonder if we could limit the scope of this metric to something reasonable. E.g. should it be cicd.controller.errors? I could envision we'd want to differentiate per component, e.g. have a cicd.scheduler.errors, etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended to count the errors from the CICD system (controller, scheduler, agent) in this metric and not errors from the pipeline run.

It is indeed good to be able to distinguish the different system components as well as errors from the pipeline runs.
Adriel also made a good point that it would be good to be able to count pipeline run errors separately from just the cicd.pipeline.run.duration with result=error because a single pipeline run might encounter several different errors (some recoverable and others not).

Most likely this will be a derived metric, eg. by using a span metrics connector on the pipeline run spans or count connector on the controller logs.

I will define the following metrics to cover these:
cicd.pipeline.run.errors for errors encountered as part of the pipeline run execution
cicd.system.errors for errrors encountered in CICD system components with an attribute component (eg. scheduler, agent, controller, …)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fc48dd2, 0be8d4d and 996e809.

instrument: counter
unit: "{error}"
stability: experimental
attributes:
- ref: error.type
requirement_level: required
Loading
Loading