Skip to content

Commit

Permalink
Merge pull request #13 from NethServer/bundle
Browse files Browse the repository at this point in the history
Bundle loki, prometheus and grafana
  • Loading branch information
gsanchietti authored Mar 7, 2024
2 parents e66b5aa + 9a7a28d commit f0967ca
Show file tree
Hide file tree
Showing 24 changed files with 11,334 additions and 744 deletions.
131 changes: 108 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,16 @@ Setup and start an instance of [nethsecurity-controller](https://github.com/Neth

Each node can host multiple controller instances.

**Note**
This module is not yes present inside NS8 repository, so please install it manually following below instructions.
The module is composed by the following containers:
- [nethsecurity-api](#api-server): REST API python server to manage nethsec-vpn clients
- [nethsecurity-vpn](#vpn): OpenVPN server, it authenticates the machines and create routes for the proxy
- [nethsecurity-ui](#proxy-and-ui): lighttpd instance serving static UI files
- [nethsecurity-proxy](#proxy-and-ui): traefik forwards requests to the connected machines using the machine name as path prefix
- [promtail](#promtail): log collector for Loki, it listens for syslog messages on the VPN address and forwards them to Loki
- [prometheus](#prometheus): metrics collector, it scrapes metrics from the connected machines
- [loki](#loki): log storage, it stores logs from promtail
- [grafana](#grafana): metrics visualization, it visualizes metrics from prometheus and logs from loki


**Note**
This module implements the backup but not the restore procedure.
Expand Down Expand Up @@ -35,49 +43,126 @@ Launch `configure-module`, by setting the following parameters:
- `ovpn_cn`: OpenVPN Certificate CN
- `api_user`: controller admin user
- `api_password`: controller admin password
- `loki_retention`: Loki retention period in days (default: ``180`` days)
- `promtail_retention`: Promtail retention period in days (default: ``15`` days)

Example:

api-cli run module/nethsecurity-controller1/configure-module --data '{"host": "nscontroller.nethserver.org", "lets_encrypt": false, "ovpn_network": "172.19.64.0", "ovpn_netmask": "255.255.255.0", "ovpn_cn": "nethsec", "api_user": "admin", "api_password": "password"}'
api-cli run module/nethsecurity-controller1/configure-module --data '{"host": "mycontroller.nethsecurity.org", "lets_encrypt": false, "ovpn_network": "172.19.64.0", "ovpn_netmask": "255.255.255.0", "ovpn_cn": "nethsec", "api_user": "admin", "api_password": "password", "loki_retention": "180", "promtail_retention": "15"}'

The above command will:
- start and configure the nethsecurity-controller instance
- setup a route inside traefik to reach the controller
- setup a syslog receiver
- setup prometheus for metrics scraping
- setup a the following routes inside traefik:
- one route to reach the controller based on the `host` parameter like `https://mycontroller.nethsecurity.org/`
- one route to reach the prometheus, with a random generated URL like `https://myontroller.nethsecurity.org/f0365996-c1b3-4252-9cf3-c2e7e86ed617/`
- one route to reach the loki, with a random generated URL like `https://mycontroller.nethsecurity.org/3e3e3e3e-3e3e-3e3e-3e3e-3e3e3e3e3e3e/`
- one route to reach the grafana, with a well-know URL like `https://mycontroller.nethsecurity.org/grafana/`
- setup Promtail syslog receiver
- setup Prometheus for metrics scraping
- setup Loki for receiving logs
- setup Grafana for metrics visualization

Once the controller is configured, you access the controller URL, eg. `mycontroller.nethsecurity.org`, and manage NethSecurity units.

## Module overview

The module is composed by the following systemd units:
- controller.service: runs the container pod, all containers are part of the same pod; it can start and stop all the containers at once
- api.service: runs the nethsecurity-api container
- vpn.service: runs the nethsecurity-vpn container
- ui.service: runs the nethsecurity-ui container
- proxy.service: runs the nethsecurity-proxy container
- promtail.service: runs the promtail container
- prometheus.service: runs the prometheus container
- loki.service: runs the loki container
- grafana.service: runs the grafana container
- metrics-exporter.path: watch for vpn connections from vpn.service and start metrics-exporter.service; each time a new client connects, the vpn
container creates a file inside the `prometheus.d/` directory
- metrics-exporter.service: executes the `metrics_exporter_handler` script to create a new prometheus target for the connected machine

Send a test HTTP request to the nethsecurity-controller backend service:
### API Server

curl https://nscontroller.nethserver.org/
The [api server](https://github.com/NethServer/nethsecurity-controller/tree/master/api) gives NethSecurity the ability to register itself to NS8 (through [`ns-plug`](https://dev.nethsecurity.org/nethsecurity/packages/ns-plug/)) and gives access to the on-demand generated credentials for the VPN.

All logs are grouped the controller name (`ovpn_cn`). To query the logs, use:
```
logcli query '{controller_name="nethsec"}' --tail
```
The API also registers the endpoints for the [Traefik Proxy](#proxy-and-ui) that allows direct interaction with the firewall, even if it's not in the same network.

## Module Overview
### VPN

This multi-container module allows to connect the NS8 cluster to NethSecurity installations.
The [OpenVPN container](https://github.com/NethServer/nethsecurity-controller/tree/master/vpn) tunnels connection from the NethSecurity to the NS8 through a VPN tunnel, due to [firewall configuration](https://github.com/NethServer/ns8-nethsecurity-controller/blob/main/imageroot/actions/configure-module/20configure#L87) in NS8, no client can be reached from other clients and only client-server communication is allowed.

Here are all the module features:
The module uses the NS8 [TUN feature](https://dev.nethsecurity.org/ns8-core/core/tun/) to create a new network interface and assign it to the VPN container.

### API Server
### Proxy and UI

The [api server](https://github.com/NethServer/nethsecurity-controller/tree/master/api) gives NethSecurity the ability to register itself to NS8 (through [`ns-plug`](https://nethserver.github.io/nethsecurity/packages/ns-plug/)) and gives access to the on-demand generated credentials for the VPN.
The [UI](https://github.com/NethServer/nethsecurity-controller/tree/master/ui) allows the browse of the interface directly off the NethSecurity installation, this is possible due to the [Traefik Proxy](https://github.com/NethServer/nethsecurity-controller/tree/master/proxy) server that redirects the urls to the correct IP inside the VPN.

The API even registers the endpoints for the [Traefik Proxy](#proxy-and-ui) that allows the interaction directly with the firewall even if it's not in the same network.
### Promtail

### VPN
[Promtail](https://grafana.com/docs/loki/latest/send-data/promtail/) is a log collector for Loki, it listens for syslog messages on the VPN address and forwards them to Loki. The configuration is available at `/home/nethsecurity-controller1/.config/etc/promtail.yml`.
Promtail sets the following labels:
- `job` fixed to `syslog`
- unit `hostname`
- log `level`
- `application` name
- syslog `facility` name
- `controller_name` the name of the controller as configured in `ovpn_cn`

The [OpenVPN container](https://github.com/NethServer/nethsecurity-controller/tree/master/vpn) tunnels connection from the NethSecurity to the NS8 through a VPN tunnel, due to [firewall configuration](https://github.com/NethServer/ns8-nethsecurity-controller/blob/main/imageroot/actions/configure-module/20configure#L87) in NS8, no client can be reached from other clients and only client-server communication is allowed.
### Prometheus

### Proxy and UI
[Prometheus](https://prometheus.io/) is a metrics collector, it scrapes metrics from the connected machines. The configuration is available at `/home/nethsecurity-controller1/.config/state/local.yml` and it's generated every time by the `configure-module` action.
It has a the following targets:
- static target with job_name `loki` that scrapes Loki metrics
- dynamic targets with job_name `node` that scrapes metrics from the connected machines from the `prometheus.d/` directory under the state directory (eg. `/home/nethsecurity-controller1/.config/state/prometheus.d`)

The [UI](https://github.com/NethServer/nethsecurity-controller/tree/master/ui) allows the browse of the interface directly off the NethSecurity installation, this is possible due to the [Traefik Proxy](https://github.com/NethServer/nethsecurity-controller/tree/master/proxy) server that redirects the urls to the correct IP inside the VPN.
Each dynamic target is created by the `metrics-exporter` and has the following labels:

- `instance` the VPN IP of the connected machine with the netdata port (eg. `172.19.64.3:19999`)
- `job` fixed to `node`
- `node` the VPN IP of the connected machine
- `unit` the unit unique name of the connected machine

### Loki

[Loki](https://grafana.com/oss/loki/) is a log storage, it stores logs from promtail. The configuration is available at `/home/nethsecurity-controller1/.config/etc/loki.yml`.

It uses TSDB as storage and it's configured to store logs for `loki_retention` days.

You can use `logcli` to query the logs.
First, access the module with `runagent` and source the environment file `loki.env` to set the `LOKI_ADDR` variable:
```
runagent -m nethsecurity-controller1 /bin/bash
. loki.env
```

List labels:
```
LOKI_ADDR=http://127.0.0.1:${LOKI_HTTP_PORT} logcli labels
```

You can do the same with curl: `curl -v http://127.0.0.1:${LOKI_HTTP_PORT}/loki/api/v1/labels`

Query logs:
```
LOKI_ADDR=http://127.0.0.1:${LOKI_HTTP_PORT} logcli series --analyze-labels '{hostname="NethSec"}'
LOKI_ADDR=http://127.0.0.1:${LOKI_HTTP_PORT} logcli query '{hostname="NethSec"}'
```

### Grafana

[Grafana](https://grafana.com/grafana/) is a metrics visualization, it visualizes metrics from prometheus and logs from loki. It's configured via environment variables and the configuration is available at `/home/nethsecurity-controller1/.config/state/grafana.env`.

The modules has already two pre-configured datasources: Loki and Prometheus containers.
It has also some pre-configured dashboards:

- nethsecurity.json: a dashboard with the most important metrics from the connected machines, like CPU, memory, disk, network, and system load
- logs.json: a dashboard where you can visualize the logs from all the connected machines and filter them by hostname, application, and priority
- loki.json: a dashboard with the most important metrics from Loki, like the number of logs ingested, the number of logs dropped, and the status of queriers

Default credentials are `admin`/`admin`. You can change them on the first login.

### Promtail and Metrics

Using [`ns-plug`](https://nethserver.github.io/nethsecurity/packages/ns-plug/) the module automatically provides Prometheus and Loki endpoints so that NS8 can have all the data in the same place. You can browse the logs with [the command provided in configuration](#configure) while prometheus will most likely be already scraping off the NethSecurity data shortly after the first connection using [Service Discovery](https://github.com/NethServer/ns8-prometheus/#service-discovery).
Using [`ns-plug`](https://dev.nethsecurity.org/nethsecurity/packages/ns-plug/) the module automatically provides Prometheus and Loki endpoints so that NS8 can have all the data in the same place. You can browse the logs with [the command provided in configuration](#configure) while prometheus will most likely be already scraping off the NethSecurity data shortly after the first connection using [Service Discovery](https://github.com/NethServer/ns8-prometheus/#service-discovery).

## Uninstall

Expand Down
8 changes: 6 additions & 2 deletions build-images.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ repobase="${REPOBASE:-ghcr.io/nethserver}"
# Configure the image name
reponame="nethsecurity-controller"
tag=${IMAGE_TAG:-0.0.1}
promtail_version=2.7.1
loki_version=2.9.4
prometheus_version=2.50.1
grafana_version=10.3.3

# Create a new empty container image
container=$(buildah from scratch)
Expand All @@ -33,8 +37,8 @@ buildah add "${container}" ui/dist /ui
# Setup the entrypoint, ask to reserve one TCP port with the label and set a rootless container
buildah config --entrypoint=/ \
--label="org.nethserver.authorizations=traefik@any:routeadm node:tunadm" \
--label="org.nethserver.tcp-ports-demand=5" \
--label="org.nethserver.images=ghcr.io/nethserver/nethsecurity-vpn:$tag ghcr.io/nethserver/nethsecurity-api:$tag ghcr.io/nethserver/nethsecurity-ui:$tag ghcr.io/nethserver/nethsecurity-proxy:$tag docker.io/grafana/promtail:2.7.1" \
--label="org.nethserver.tcp-ports-demand=9" \
--label="org.nethserver.images=ghcr.io/nethserver/nethsecurity-vpn:$tag ghcr.io/nethserver/nethsecurity-api:$tag ghcr.io/nethserver/nethsecurity-ui:$tag ghcr.io/nethserver/nethsecurity-proxy:$tag docker.io/grafana/promtail:$promtail_version docker.io/grafana/loki:$loki_version docker.io/prom/prometheus:v$prometheus_version docker.io/grafana/grafana:$grafana_version" \
"${container}"
# Commit the image
buildah commit "${container}" "${repobase}/${reponame}"
Expand Down
128 changes: 106 additions & 22 deletions imageroot/actions/configure-module/20configure
Original file line number Diff line number Diff line change
Expand Up @@ -11,38 +11,87 @@ import agent
import agent.tasks
import os
import hashlib
import uuid

request = json.load(sys.stdin)

ports = os.environ["TCP_PORTS"].split(',')
(start,end) = os.environ["TCP_PORTS_RANGE"].split('-')
ports = [*range(int(start), int(end)+1)]

try:
with open('config.json', 'r') as tmp:
config = json.load(tmp)
except:
config = request

for path in ['loki_path', 'prometheus_path']:
if not config.get(path):
config[path] = f'/{uuid.uuid4()}'

# Configure Traefik to route requests to the nethsec-controller service
response = agent.tasks.run(
agent_id=agent.resolve_agent_id('traefik@node'),
action='set-route',
data={
'instance': os.environ['MODULE_ID'],
'url': 'http://127.0.0.1:' + ports[3],
'url': f'http://127.0.0.1:{ports[3]}',
'http2https': True,
'lets_encrypt': request["lets_encrypt"],
'host': request["host"],
},
)
agent.assert_exp(response['exit_code'] == 0)
response = agent.tasks.run(
agent_id=agent.resolve_agent_id('traefik@node'),
action='set-route',
data={
'instance': os.environ['MODULE_ID'] + '_grafana',
'url': f'http://127.0.0.1:{ports[8]}',
'http2https': True,
'lets_encrypt': request["lets_encrypt"],
'host': request["host"],
'path': '/grafana'
},
)
agent.assert_exp(response['exit_code'] == 0)
response = agent.tasks.run(
agent_id=agent.resolve_agent_id('traefik@node'),
action='set-route',
data={
'instance': os.environ['MODULE_ID'] + '_loki',
'url': f'http://127.0.0.1:{ports[5]}',
'http2https': True,
'lets_encrypt': request["lets_encrypt"],
'host': request["host"],
'path': config['loki_path']
},
)
agent.assert_exp(response['exit_code'] == 0)
response = agent.tasks.run(
agent_id=agent.resolve_agent_id('traefik@node'),
action='set-route',
data={
'instance': os.environ['MODULE_ID'] + '_prometheus',
'url': f'http://127.0.0.1:{ports[7]}',
'http2https': True,
'lets_encrypt': request["lets_encrypt"],
'host': request["host"],
'path': config['prometheus_path']
},
)
agent.assert_exp(response['exit_code'] == 0)


# Replace password if passed as parameter, otherwise read the old one
if 'api_password' in request and request['api_password'] != '':
request['api_password'] = hashlib.sha256(request['api_password'].encode('utf-8')).hexdigest()
else:
with open('config.json', 'r') as tmp:
tmp = json.load(tmp)
request['api_password'] = tmp['api_password']
request['api_password'] = config['api_password']

# Check if traefik configuration has been successfull
agent.assert_exp(response['exit_code'] == 0)

# Save configuration to JSON for later user and backup
with open('config.json', 'w') as config:
config.write(json.dumps(request))
with open('config.json', 'w') as cfp:
cfp.write(json.dumps(config | request))

with open('config.env', 'w') as env:
env.write(f'ADMIN_USER={request["api_user"]}\n')
Expand All @@ -54,20 +103,55 @@ with open('config.env', 'w') as env:

server_address = request["ovpn_network"].removesuffix('.0') + '.1'
with open('promtail.env', 'w') as promtail:
redis_client = agent.redis_connect()
loki_instance = redis_client.get('cluster/default_instance/loki')
loki_addr = redis_client.hget(f'module/{loki_instance}/environment', 'LOKI_ADDR')
loki_http_port = redis_client.hget(f'module/{loki_instance}/environment', 'LOKI_HTTP_PORT')
loki_logs_ingress_token = redis_client.hget(f'module/{loki_instance}/environment', 'LOKI_LOGS_INGRESS_TOKEN')
promtail_address = server_address
promtail_port = ports[4]

promtail.write(f'LOKI_ADDR={loki_addr}\n')
promtail.write(f'LOKI_HTTP_PORT={loki_http_port}\n')
promtail.write(f'LOKI_LOGS_INGRESS_TOKEN={loki_logs_ingress_token}\n')
promtail.write(f'PROMTAIL_ADDRESS={promtail_address}\n')
promtail.write(f'PROMTAIL_PORT={promtail_port}\n')
promtail.write('LOKI_ADDR=127.0.0.1\n')
promtail.write(f'LOKI_HTTP_PORT={ports[5]}\n')
promtail.write(f'PROMTAIL_ADDRESS={server_address}\n')
promtail.write(f'PROMTAIL_PORT={ports[4]}\n')

with open('loki.env', 'w') as lfp:
lfp.write(f"LOKI_HTTP_PORT={ports[5]}\n")
lfp.write(f"LOKI_GRPC_PORT={ports[6]}\n")
lfp.write(f"LOKI_RETENTION={request.get('loki_rentention', '180')}d\n") # retention in days

with open('grafana.env', 'w') as gfp:
gfp.write(f"GF_DEFAULT_INSTANCE_NAME={request['host']}\n")
gfp.write(f"GF_SERVER_ROOT_URL=https://{request['host']}/grafana\n")
gfp.write("GF_SERVER_SERVE_FROM_SUB_PATH=true\n")
gfp.write(f"GF_SERVER_HTTP_PORT={ports[8]}\n")
gfp.write("GF_SERVER_HTTP_ADDR=127.0.0.1\n")

with open('prometheus.env', 'w') as pfp:
pfp.write(f"PROMETHEUS_PORT={ports[7]}\n")
pfp.write(f"PROMETHEUS_PATH={config['prometheus_path']}\n")
pfp.write(f"PROMETHEUS_RETENTION={request.get('prometheus_retention', '15')}d\n")

with open('prometheus.yml', 'w', encoding='utf-8') as fp:
fp.write("global:\n")
fp.write("scrape_configs:\n")
fp.write(' - job_name: "node"\n')
fp.write(' file_sd_configs:\n')
fp.write(' - files:\n')
fp.write(' - "/prometheus/prometheus.d/*.yml"\n')
fp.write(' - job_name: "loki"\n')
fp.write(' static_configs:\n')
fp.write(' - targets:\n')
fp.write(f' - 127.0.0.1:{ports[5]}\n')

# Grafana configuration
with open('local.yml', 'w') as fp:
fp.write("apiVersion: 1\n")
fp.write("datasources:\n")
fp.write(' - name: Local Promethus\n')
fp.write(' type: prometheus\n')
fp.write(' uid: prometheus\n')
fp.write(' access: proxy\n')
fp.write(f' url: http://127.0.0.1:{ports[7]}{config["prometheus_path"]}\n')

fp.write(' - name: Local Loki\n')
fp.write(' type: loki\n')
fp.write(' uid: loki\n')
fp.write(' access: proxy\n')
fp.write(f' url: http://127.0.0.1:{ports[5]}\n')

network = agent.read_envfile('network.env')
tun = network.get('OVPN_TUN')
Expand Down
14 changes: 13 additions & 1 deletion imageroot/actions/configure-module/validate-input.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@
"lets_encrypt": true,
"ovpn_network": "127.2.10.0",
"ovpn_netmask": "255.255.0.0",
"ovpn_cn": "nethsec"
"ovpn_cn": "nethsec",
"loki_retention": 180,
"prometheus_retention": 15
}
],
"type": "object",
Expand Down Expand Up @@ -51,6 +53,16 @@
"type": "string",
"description": "Controller name, it must be a valid CN of x509 certificate'",
"minLength": 2
},
"loki_retention": {
"type": "integer",
"description": "Retention policy for Loki logs, default is 180 days",
"minimum": 1
},
"prometheus_retention": {
"type": "integer",
"description": "Retention policy for Prometehus metrics, default is 15 days",
"minimum": 1
}
}
}
Loading

0 comments on commit f0967ca

Please sign in to comment.