A data analytics pipeline for pSSID that receives, stores, and visualizes WiFi test metrics gathered by Raspberry Pi WiFi probes.
Left image: Overview of the entire pSSID architecture with the data analytics pipeline highlighted. The pipeline receives test results (metrics) from probes, stores them, and provides visualization.
Right image: Architecture of the pipeline itself, leveraging the ELK stack concept with Opensearch
replacing Elasticsearch
and Grafana
replacing Kibana
.
- Ubuntu 22 virtual machine
- Docker and Docker Compose installed
Reference the official Docker install documentation and follow their steps to install Docker Engine.
Verify your Docker installation:
# in home directory
sudo docker run hello-world
You might need to start Docker (this will also make it automatically run on system boot):
# in home directory
sudo systemctl enable --now docker
Clone this repository to your host machine. Each service has its own docker-compose
file for better modularization, so when you scale, you could simply provision more nodes
without touching other components of the pipeline.
OpenSearch requires passwords since version 2.12.0. Set up environment variables by adding these lines to your .bashrc
file (this documentation uses admin
as the username and OpensearchInit2024
as the password for demonstration),
nano ~/.bashrc
.
export OPENSEARCH_HOST=https://opensearch-node1:9200
export OPENSEARCH_INITIAL_ADMIN_PASSWORD=OpensearchInit2024
export OPENSEARCH_USER=admin
export OPENSEARCH_PASSWORD=OpensearchInit2024
β οΈ Note: These variable names are used byopensearch-one-node.yml
andlogstash.yml
. You can freely change their values, but do not edit names unless for a good reason.
Then reload the environment variables:
source ~/.bashrc
Add your user to the docker group to avoid using sudo
(since the root user cannot read the environment variables defined by non-root users):
sudo usermod -aG docker ${USER} && newgrp docker
Ensure that Docker is working without root:
docker run hello-world
β οΈ Important: Running withsudo
prevents access to user environment variables.
OpenSearch requires vm.max_map_count
of at least 262144.
Check current value:
sysctl vm.max_map_count
If it's too low, edit /etc/sysctl.conf
as sudo and add:
vm.max_map_count=262144
Apply changes:
sudo sysctl -p
- Use the provided logstash.conf in the
logstash-pipeline
directory of this repository as a starting point. You can revise this file later to meet your testing needs. - Edit
logstash.yml
(from the root of the cloned repo) to mount your pipeline directory:
# TODO: mount your pipeline directory into the container. USE ABSOLUTE PATH!
# abs path example: /home/uniqname/pssid-data-pipeline/logstash-pipeline
- <ABS_PATH_TO_YOUR_PIPELINE_DIRECTORY>:/usr/share/logstash/pipeline
π‘ Tip: To disable SSO or email alerting, comment out variables starting with
GF_AUTH_
orGF_SMTP_
ingrafana.yml
β οΈ Important: Even if you skip this step, be sure to editGF_SERVER_ROOT_URL=https://<PIPELINE-HOSTNAME>
- Register with Google: Follow Grafana's Google Authentication guide
- Create
.env
file in the same directory asgrafana.yml
with:GOOGLE_CLIENT_ID=your-google-client-id GOOGLE_CLIENT_SECRET=your-google-client-secret
- Configure following Grafana's email alert documentation
- For Gmail, see Google's app password guide
- Add SMTP credentials to
.env
file
β οΈ Note: If using version control, please use the.env
file method and add.env
to your.gitignore
! Github doesn't allow pushing commits that contain secrets.
π‘ Tip: To disable Grafana HTTPs, remove the nginx and certbot sections under
services
in the clonedgrafana.yml
, and removenginx-html
andcertbot-etc
undervolumes
.
Before you start, open nginx/conf.d/grafana.conf and make the following edits:
server_name <YOUR-PIPELINE-HOSTNAME-HERE>;
Then, open nginx/conf.d/grafana.conf.https
and make the following edits:
server_name <YOUR-PIPELINE-HOSTNAME-HERE>;
...
server_name <YOUR-PIPELINE-HOSTNAME-HERE>;
ssl_certificate /etc/letsencrypt/live/<YOUR-PIPELINE-HOSTNAME-HERE>/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/<YOUR-PIPELINE-HOSTNAME-HERE>/privkey.pem;
If this is your first time running grafana.yml, run this command to create the network:
# From your cloned repository's directory:
docker network create pssid-data-pipeline_opensearch-net
After that, you can run nginx:
# From your cloned repository's directory:
docker compose -f grafana.yml up -d nginx
Then run certbot to generate certificates:
# From your cloned repository's directory:
docker compose -f grafana.yml run --rm --entrypoint="" certbot \
certbot certonly --webroot -w /var/www/certbot \
-d <PIPELINE-HOSTNAME> \
--email [email protected] --agree-tos --no-eff-email
If you're getting the error Certbot failed to authenticate some domains (authenticator: webroot)
, use docker ps
to check that your nginx container is running without errors. You can troubleshoot using docker logs -f <nginx-container-id>
.
After successfully running the command above, rename the grafana.conf
file to grafana.conf.old
and rename grafana.conf.https
to grafana.conf
.
# From your cloned repository's directory:
mv nginx/conf.d/grafana.conf nginx/conf.d/grafana.conf.old
mv nginx/conf.d/grafana.conf.https nginx/conf.d/grafana.conf
Then run:
# From your cloned repository's directory:
docker exec <nginx-container-name> wget -O /etc/letsencrypt/options-ssl-nginx.conf https://raw.githubusercontent.com/certbot/certbot/master/certbot-nginx/certbot_nginx/_internal/tls_configs/options-ssl-nginx.conf
docker exec <nginx-container-name> wget -O /etc/letsencrypt/ssl-dhparams.pem https://raw.githubusercontent.com/certbot/certbot/master/certbot/certbot/ssl-dhparams.pem
π‘ Tip: To check what your nginx container name is, try running
docker ps | grep nginx
, it'll be the last item listed in the row.
Verify that the certificate and key files are in the right place:
docker exec -it <nginx-container-name> \
ls -l /etc/letsencrypt/live/<PIPELINE-HOSTNAME>
If that returns a list of .pem files, go ahead and test the nginx config:
# From your cloned repository's directory:
docker exec <nginx-container-name> nginx -t
If you see nginx: configuration file /etc/nginx/nginx.conf test is successful
, then run:
# From your cloned repository's directory:
docker exec <nginx-container-name> nginx -s reload
Use curl to test HTTPS access:
# From your cloned repository's directory:
curl -I https://<PIPELINE-HOSTNAME>
Or navigate to https://<PIPELINE-HOSTNAME>
in a web browser and it should redirect you to Grafana's login page.
π‘ Before running grafana: If this is your first time running grafana.yml (having skipped the HTTPS setup step), then create the network before running Grafana:
# From your cloned repository's directory:
docker network create pssid-data-pipeline_opensearch-net
Also, give Grafana permission to access your plugins directory:
# From your cloned repository's directory:
# Set ownership to the Grafana user (UID 472)
sudo chown -R 472:472 ./plugins
# Set appropriate permissions
sudo chmod -R 755 ./plugins
Bring up OpenSearch, Logstash, and Grafana:
# From your cloned repository's directory:
docker compose -f opensearch-one-node.yml up -d
docker compose -f logstash.yml up -d
# Run this if you had set up SMTP or SSO for Grafana:
docker compose -f grafana.yml --env-file .env up -d
# Otherwise, run this:
docker compose -f grafana.yml up -d
(Optional) Start OpenSearch Dashboard:
docker compose -f opensearch-dashboard.yml up -d
π‘ For debugging: To check Logstash output, run this command after starting the logstash service:
docker logs -f logstash
π‘ Common Error: When restarting the Grafana container, you might occasionally see
KeyError: 'Container Config'
. To resolve this issue, usedocker ps
and then rundocker rm -f <container-id>
for each container in the list. After starting Grafana, rerun all the commands above to start the services again.
Service | Port | Purpose |
---|---|---|
Logstash | 9400 | Filebeat input |
OpenSearch | 9200 | Logstash input |
Grafana | 3000 | Web dashboard |
Nginx | 80 | HTTP (ACME challenges + HTTPS redirects) |
Nginx | 443 | HTTPS (secure web dashboard) |
OpenSearch Dashboard | 5601 | Web dashboard (optional) |
π₯ Firewall:
- If with HTTPS: Ensure ports 80, 443, 9400, and 5601 are open for external traffic
- If without HTTPS: Ensure ports 9400, 3000, and 5601 are open for external traffic
- Note: Port 9200 is for internal container communication only
Use the Ansible playbook to install Filebeat on probes. Ensure SSH access from your Ansible control node to all target probes in inventory/hosts.ini
For configuration changes:
- Clone the Ansible role into the playbook directory
- Edit
defaults/main.yml
:
# add your pipeline hostname under this list variable
# you can add multiple hosts
filebeat_output_logstash_hosts:
- "<PIPELINE-HOSTNAME>:9400"
Contains input sources, custom filters, and output destinations. Most customization happens in the filter
section.
Ruby parsing scripts sourced from perfSONAR logstash repository.
Access at <pipeline-hostname>:5601
- Default credentials:
admin
/OpensearchInit2024
(as defined in env variables above) - Use Dev Tools to inspect indices and output:
GET <index-name>/_search
- Use Dev Tools to delete indices from old probes that are no longer sending data:
DELETE pscheduler_*_<probe-name>_*
(this command may need to be adjusted if you change the index naming scheme specified in the output field of Logstash) - Use this Index State Management guide to configure custom policies that manage your indices (for example, creating a policy to delete an index that hasn't been updated for a few days).
- If HTTPs configured: Navigate to
https://<pipeline-hostname>/
- If HTTPS not configured: Navigate to
<pipeline-hostname>:3000
- Default credentials:
admin
/admin
- Google SSO available for view-only access (if configured)
- Navigate to Connections β Data Sources β Add New Data Source
- Select OpenSearch from available sources
- Configure as shown:
Configuration details:
- URL: Use
https://opensearch-node1:9200
(Docker hostname) - Auth: Enable
Basic auth
andSkip TLS Verify
- Credentials: Use your OpenSearch username/password (default credentials:
admin
/OpensearchInit2024
as defined in env variables above) - Index: Use wildcards (e.g.,
pscheduler_*
)
To list available indices:
# From your cloned repository's directory:
curl -u <OPENSEARCH_USER>:<OPENSEARCH_PASSWORD> --insecure \
"https://localhost:9200/_cat/indices?v"
Or you can use Dev Tools on OpenSearch Dashboard to check. This repository's provided logstash.conf has all indices configured with a pscheduler_*
index pattern.
- Navigate to Dashboards β New β Import
- Drag and drop JSON file from
exported-grafana-json
folder
After configuring data sources, you can create custom visualization panels and dashboards using Grafana's query builder with your OpenSearch indices.
β οΈ Note: After each time you start OpenSearch on Docker, you have to manually refresh each panel in Grafana by clicking 'Edit' and then 'Refresh'; the mass-refresh button on Grafana only works after this first manual refresh. This is because the id of the datasource gets changed, and manually refreshing the panel is updates it.