Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MON-3858: Add support to sysctl node-exporter collector #2339

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
- [#2302](https://github.com/openshift/cluster-monitoring-operator/issues/2302) Enable feature `extra-scrape-metrics` in Prometheus user-workload
- [#2319](https://github.com/openshift/cluster-monitoring-operator/pull/2319) Allow read-only access to the Alertmanager API (use `monitoring-alertmanager-view`).
- [#2078](https://github.com/openshift/cluster-monitoring-operator/pull/2078) Support exporting VPA metrics from KSM.
- [#2339](https://github.com/openshift/cluster-monitoring-operator/pull/2339) Add support to sysctl node-exporter collector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid the sysctl change should be under 4.19 now, master branch points to 4.19 now


## 4.15

Expand Down
19 changes: 19 additions & 0 deletions Documentation/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Configuring Cluster Monitoring is optional. If the config does not exist or is e
* [NodeExporterCollectorNetClassConfig](#nodeexportercollectornetclassconfig)
* [NodeExporterCollectorNetDevConfig](#nodeexportercollectornetdevconfig)
* [NodeExporterCollectorProcessesConfig](#nodeexportercollectorprocessesconfig)
* [NodeExporterCollectorSysctlConfig](#nodeexportercollectorsysctlconfig)
* [NodeExporterCollectorSystemdConfig](#nodeexportercollectorsystemdconfig)
* [NodeExporterCollectorTcpStatConfig](#nodeexportercollectortcpstatconfig)
* [NodeExporterConfig](#nodeexporterconfig)
Expand Down Expand Up @@ -250,6 +251,7 @@ The `NodeExporterCollectorConfig` resource defines settings for individual colle
| mountstats | [NodeExporterCollectorMountStatsConfig](#nodeexportercollectormountstatsconfig) | Defines the configuration of the `mountstats` collector, which collects statistics about NFS volume I/O activities. Disabled by default. |
| ksmd | [NodeExporterCollectorKSMDConfig](#nodeexportercollectorksmdconfig) | Defines the configuration of the `ksmd` collector, which collects statistics from the kernel same-page merger daemon. Disabled by default. |
| processes | [NodeExporterCollectorProcessesConfig](#nodeexportercollectorprocessesconfig) | Defines the configuration of the `processes` collector, which collects statistics from processes and threads running in the system. Disabled by default. |
| sysctl | [NodeExporterCollectorSysctlConfig](#nodeexportercollectorsysctlconfig) | Defines the configuration of the `sysctl` collector, which collects sysctl metrics. Disabled by default. |
| systemd | [NodeExporterCollectorSystemdConfig](#nodeexportercollectorsystemdconfig) | Defines the configuration of the `systemd` collector, which collects statistics on the systemd daemon and its managed services. Disabled by default. |

[Back to TOC](#table-of-contents)
Expand Down Expand Up @@ -345,6 +347,23 @@ The `NodeExporterCollectorProcessesConfig` resource works as an on/off switch fo

[Back to TOC](#table-of-contents)

## NodeExporterCollectorSysctlConfig

#### Description

The `NodeExporterCollectorSysctlConfig` resource works as an on/off switch for the `sysctl` collector of the `node-exporter` agent. Caution! Exposing metrics like kernel.random.uuid can disrupt Prometheus, as it generates new data series with every scrape. Use this option judiciously! By default, the `sysctl` collector is disabled.


<em>appears in: [NodeExporterCollectorConfig](#nodeexportercollectorconfig)</em>

| Property | Type | Description |
| -------- | ---- | ----------- |
| enabled | bool | A Boolean flag that enables or disables the `sysctl` collector. |
| includeSysctlMetrics | []string | A list of numeric sysctl values. Note that a sysctl can contain multiple values, for example: `net.ipv4.tcp_rmem = 4096\t131072\t6291456`. Using `includeSysctlMetrics: ['net.ipv4.tcp_rmem']` the collector will expose: `node_sysctl_net_ipv4_tcp_rmem{index=\"0\"} 4096`, `node_sysctl_net_ipv4_tcp_rmem{index=\"1\"} 131072`, `node_sysctl_net_ipv4_tcp_rmem{index=\"2\"} 6291456`. If the indexes have defined meaning like in this case, the values can be mapped to multiple metrics: `includeSysctlMetrics: ['net.ipv4.tcp_rmem:min,default,max']`. The collector will expose these metrics as such: `node_sysctl_net_ipv4_tcp_rmem_min 4096`, `node_sysctl_net_ipv4_tcp_rmem_default 131072`, `node_sysctl_net_ipv4_tcp_rmem_max 6291456`. |
| includeInfoSysctlMetrics | []string | A list of string sysctl values. For example: `includeSysctlMetrics: ['kernel.core_pattern', 'kernel.seccomp.actions_avail = kill_process kill_thread']`. The collector will expose these metrics as such: `node_sysctl_info{name=\"kernel.core_pattern\", value=\"core\"} 1`, `node_sysctl_info{name=\"kernel.seccomp.actions_avail\", index=\"0\", value=\"kill_process\"} 1`, `node_sysctl_info{name=\"kernel.seccomp.actions_avail\", index=\"1\", value=\"kill_thread\"} 1`, ... |

[Back to TOC](#table-of-contents)

## NodeExporterCollectorSystemdConfig

#### Description
Expand Down
1 change: 1 addition & 0 deletions Documentation/openshiftdocs/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ The configuration file itself is always defined under the `config.yaml` key in t
* link:modules/nodeexportercollectornetclassconfig.adoc[NodeExporterCollectorNetClassConfig]
* link:modules/nodeexportercollectornetdevconfig.adoc[NodeExporterCollectorNetDevConfig]
* link:modules/nodeexportercollectorprocessesconfig.adoc[NodeExporterCollectorProcessesConfig]
* link:modules/nodeexportercollectorsysctlconfig.adoc[NodeExporterCollectorSysctlConfig]
* link:modules/nodeexportercollectorsystemdconfig.adoc[NodeExporterCollectorSystemdConfig]
* link:modules/nodeexportercollectortcpstatconfig.adoc[NodeExporterCollectorTcpStatConfig]
* link:modules/nodeexporterconfig.adoc[NodeExporterConfig]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ Appears in: link:nodeexporterconfig.adoc[NodeExporterConfig]

|processes|link:nodeexportercollectorprocessesconfig.adoc[NodeExporterCollectorProcessesConfig]|Defines the configuration of the `processes` collector, which collects statistics from processes and threads running in the system. Disabled by default.

|sysctl|link:nodeexportercollectorsysctlconfig.adoc[NodeExporterCollectorSysctlConfig]|Defines the configuration of the `sysctl` collector, which collects sysctl metrics. Disabled by default.

|systemd|link:nodeexportercollectorsystemdconfig.adoc[NodeExporterCollectorSystemdConfig]|Defines the configuration of the `systemd` collector, which collects statistics on the systemd daemon and its managed services. Disabled by default.

|===
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
// DO NOT EDIT THE CONTENT IN THIS FILE. It is automatically generated from the
// source code for the Cluster Monitoring Operator. Any changes made to this
// file will be overwritten when the content is re-generated. If you wish to
// make edits, read the docgen utility instructions in the source code for the
// CMO.
:_content-type: ASSEMBLY

== NodeExporterCollectorSysctlConfig

=== Description

The `NodeExporterCollectorSysctlConfig` resource works as an on/off switch for the `sysctl` collector of the `node-exporter` agent. Caution! Exposing metrics like kernel.random.uuid can disrupt Prometheus, as it generates new data series with every scrape. Use this option judiciously! By default, the `sysctl` collector is disabled.



Appears in: link:nodeexportercollectorconfig.adoc[NodeExporterCollectorConfig]

[options="header"]
|===
| Property | Type | Description
|enabled|bool|A Boolean flag that enables or disables the `sysctl` collector.

|includeSysctlMetrics|[]string|A list of numeric sysctl values. Note that a sysctl can contain multiple values, for example: `net.ipv4.tcp_rmem = 4096\t131072\t6291456`. Using `includeSysctlMetrics: ['net.ipv4.tcp_rmem']` the collector will expose: `node_sysctl_net_ipv4_tcp_rmem{index=\"0\"} 4096`, `node_sysctl_net_ipv4_tcp_rmem{index=\"1\"} 131072`, `node_sysctl_net_ipv4_tcp_rmem{index=\"2\"} 6291456`. If the indexes have defined meaning like in this case, the values can be mapped to multiple metrics: `includeSysctlMetrics: ['net.ipv4.tcp_rmem:min,default,max']`. The collector will expose these metrics as such: `node_sysctl_net_ipv4_tcp_rmem_min 4096`, `node_sysctl_net_ipv4_tcp_rmem_default 131072`, `node_sysctl_net_ipv4_tcp_rmem_max 6291456`.

|includeInfoSysctlMetrics|[]string|A list of string sysctl values. For example: `includeSysctlMetrics: ['kernel.core_pattern', 'kernel.seccomp.actions_avail = kill_process kill_thread']`. The collector will expose these metrics as such: `node_sysctl_info{name=\"kernel.core_pattern\", value=\"core\"} 1`, `node_sysctl_info{name=\"kernel.seccomp.actions_avail\", index=\"0\", value=\"kill_process\"} 1`, `node_sysctl_info{name=\"kernel.seccomp.actions_avail\", index=\"1\", value=\"kill_thread\"} 1`, ...

|===

link:../index.adoc[Back to TOC]
5 changes: 5 additions & 0 deletions pkg/manifests/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,11 @@ func defaultClusterMonitoringConfiguration() ClusterMonitoringConfiguration {
Systemd: NodeExporterCollectorSystemdConfig{
Enabled: false,
},
Sysctl: NodeExporterCollectorSysctlConfig{
Enabled: false,
IncludeSysctlMetrics: []string{},
IncludeInfoSysctlMetrics: []string{},
},
},
},
}
Expand Down
31 changes: 31 additions & 0 deletions pkg/manifests/manifests.go
Original file line number Diff line number Diff line change
Expand Up @@ -876,6 +876,25 @@ func (f *Factory) updateNodeExporterArgs(args []string) ([]string, error) {
args = setArg(args, "--no-collector.tcpstat", "")
}

if f.config.ClusterMonitoringConfiguration.NodeExporterConfig.Collectors.Sysctl.Enabled {
includeSysctlMetrics := f.config.ClusterMonitoringConfiguration.NodeExporterConfig.Collectors.Sysctl.IncludeSysctlMetrics
includeInfoSysctlMetrics := f.config.ClusterMonitoringConfiguration.NodeExporterConfig.Collectors.Sysctl.IncludeInfoSysctlMetrics

args = setArg(args, "--collector.sysctl", "")

sysctlSet := uniqueSet(includeSysctlMetrics)
for _, sysctl := range sysctlSet {
args = append(args, fmt.Sprintf("--collector.sysctl.include=%s", sysctl))
}

sysctlSet = uniqueSet(includeInfoSysctlMetrics)
for _, sysctl := range sysctlSet {
args = append(args, fmt.Sprintf("--collector.sysctl.include-info=%s", sysctl))
}
} else {
args = setArg(args, "--no-collector.sysctl", "")
}

var excludedDevices string
if f.config.ClusterMonitoringConfiguration.NodeExporterConfig.Collectors.NetDev.Enabled ||
f.config.ClusterMonitoringConfiguration.NodeExporterConfig.Collectors.NetClass.Enabled {
Expand Down Expand Up @@ -2488,6 +2507,18 @@ func setArg(args []string, argName string, argValue string) []string {
return args
}

func uniqueSet(input []string) []string {
uniqueMap := make(map[string]struct{})
var unique []string
for _, str := range input {
if _, ok := uniqueMap[str]; !ok {
uniqueMap[str] = struct{}{}
unique = append(unique, str)
}
}
return unique
}

func (f *Factory) PrometheusRuleValidatingWebhook() (*admissionv1.ValidatingWebhookConfiguration, error) {
return f.NewValidatingWebhook(f.assets.MustNewAssetSlice(AdmissionWebhookRuleValidatingWebhook))
}
Expand Down
34 changes: 34 additions & 0 deletions pkg/manifests/manifests_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -3480,6 +3480,7 @@ func TestNodeExporterCollectorSettings(t *testing.T) {
name: "default config",
config: "",
argsPresent: []string{"--no-collector.cpufreq",
"--no-collector.sysctl",
"--no-collector.tcpstat",
"--collector.netdev",
"--collector.netclass",
Expand All @@ -3492,6 +3493,7 @@ func TestNodeExporterCollectorSettings(t *testing.T) {
"--no-collector.systemd",
},
argsAbsent: []string{"--collector.cpufreq",
"--collector.sysctl",
"--collector.tcpstat",
"--no-collector.netdev",
"--no-collector.netclass",
Expand Down Expand Up @@ -3657,6 +3659,38 @@ nodeExporter:
"--collector.systemd.unit-include=^(network.+|nss.+)$"},
argsAbsent: []string{"--no-collector.systemd"},
},
{
name: "disable sysctl collector",
config: `
nodeExporter:
collectors:
sysctl:
enabled: false
`,
argsPresent: []string{"--no-collector.sysctl"},
argsAbsent: []string{"--collector.sysctl"},
},
{
name: "enable sysctl collector",
config: `
nodeExporter:
collectors:
sysctl:
enabled: true
includeSysctlMetrics:
- net.ipv4.tcp_rmem:min,default,max
- net.ipv4.tcp_mem
includeInfoSysctlMetrics:
- kernel.core_pattern
- kernel.seccomp.actions_avail
`,
argsPresent: []string{"--collector.sysctl",
"--collector.sysctl.include=net.ipv4.tcp_rmem:min,default,max",
"--collector.sysctl.include=net.ipv4.tcp_mem",
"--collector.sysctl.include-info=kernel.core_pattern",
"--collector.sysctl.include-info=kernel.seccomp.actions_avail"},
argsAbsent: []string{"--no-collector.sysctl"},
},
}

for _, test := range tests {
Expand Down
35 changes: 35 additions & 0 deletions pkg/manifests/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,9 @@ type NodeExporterCollectorConfig struct {
// Defines the configuration of the `processes` collector, which collects statistics from processes and threads running in the system.
// Disabled by default.
Processes NodeExporterCollectorProcessesConfig `json:"processes,omitempty"`
// Defines the configuration of the `sysctl` collector, which collects sysctl metrics.
// Disabled by default.
Sysctl NodeExporterCollectorSysctlConfig `json:"sysctl,omitempty"`
// Defines the configuration of the `systemd` collector, which collects statistics on the systemd daemon and its managed services.
// Disabled by default.
Systemd NodeExporterCollectorSystemdConfig `json:"systemd,omitempty"`
Expand All @@ -376,6 +379,38 @@ type NodeExporterCollectorTcpStatConfig struct {
Enabled bool `json:"enabled,omitempty"`
}

// The `NodeExporterCollectorSysctlConfig` resource works as an on/off switch for
// the `sysctl` collector of the `node-exporter` agent.
// Caution! Exposing metrics like kernel.random.uuid can disrupt Prometheus, as it generates new data series with every scrape. Use this option judiciously!
// By default, the `sysctl` collector is disabled.
cgoncalves marked this conversation as resolved.
Show resolved Hide resolved
type NodeExporterCollectorSysctlConfig struct {
// A Boolean flag that enables or disables the `sysctl` collector.
Enabled bool `json:"enabled,omitempty"`
// A list of numeric sysctl values.
cgoncalves marked this conversation as resolved.
Show resolved Hide resolved
// Note that a sysctl can contain multiple values, for example:
// `net.ipv4.tcp_rmem = 4096 131072 6291456`.
// Using `includeSysctlMetrics: ['net.ipv4.tcp_rmem']` the collector will expose:
// `node_sysctl_net_ipv4_tcp_rmem{index="0"} 4096`,
// `node_sysctl_net_ipv4_tcp_rmem{index="1"} 131072`,
// `node_sysctl_net_ipv4_tcp_rmem{index="2"} 6291456`.
// If the indexes have defined meaning like in this case, the values can be mapped to multiple metrics:
// `includeSysctlMetrics: ['net.ipv4.tcp_rmem:min,default,max']`.
// The collector will expose these metrics as such:
// `node_sysctl_net_ipv4_tcp_rmem_min 4096`,
// `node_sysctl_net_ipv4_tcp_rmem_default 131072`,
// `node_sysctl_net_ipv4_tcp_rmem_max 6291456`.
IncludeSysctlMetrics []string `json:"includeSysctlMetrics,omitempty"`
// A list of string sysctl values.
// For example:
// `includeSysctlMetrics: ['kernel.core_pattern', 'kernel.seccomp.actions_avail = kill_process kill_thread']`.
// The collector will expose these metrics as such:
// `node_sysctl_info{name="kernel.core_pattern", value="core"} 1`,
// `node_sysctl_info{name="kernel.seccomp.actions_avail", index="0", value="kill_process"} 1`,
// `node_sysctl_info{name="kernel.seccomp.actions_avail", index="1", value="kill_thread"} 1`,
// ...
IncludeInfoSysctlMetrics []string `json:"includeInfoSysctlMetrics,omitempty"`
}

// The `NodeExporterCollectorNetDevConfig` resource works as an on/off switch for
// the `netdev` collector of the `node-exporter` agent.
// By default, the `netdev` collector is enabled.
Expand Down