Skip to content

numtide/nix-fleet

Nix(OS) fleet management

This project aims to build robust and user-friendly fleet management tooling, tailored for asynchronously managing devices that are capable of and intended to run NixOS. It inherits its motivational roots from NITS.

Features & Rationale

The logical model encompasses Coordinators, Agents, and Admins. CI/CD systems are merged into the Admin category.

flowchart TD;
    Ag[Agents]
    Ad[Admins]
    C[Coordinator]
    DB[(State)]

    C --- DB

    Ag -- report status --> C
    C -- send updates --> Ag

    Ad -- submit updates --> C
Loading
  • Asynchronous update chain: Admin -> Coordinator -> Agent
  • Overview all managed devices and their update status in as real-time technically possible
  • No evaluation during update procedure
  • Capability model suitable for organizations
  • Audit trail of administrative actions

Why Asynchronous/Pull-Based?

For machines that are not always directly reachable via a direct SSH connection, or may never be - e.g. if they are behind NAT or even Carrier-Grade-NAT.

Why Agent-based?

Having intelligent process on-site allows a more sophisticated request towards the update repository, as well as more sophisticated update execution and reporting.

Why no on-device evaluation?

The assumption is that the devices are not capable of or it's undesired to build the configurations on them. Hence, there is a need for a build-cache and a trusted signature. And something like a CI/CD pipeline, or even just an admin that builds, signs, and pushes the binaries.

In this scenario it's redundant to evaluate again and there's already the need for a trusted signature for the binary cache. It's a low-hanging fruit to make the final closure the update payload, and transmit metadata to the devices that enables them to download and apply the update.

Why a new tool?

Nix-Fleet continues on the closely proximate NITS experiment and puts a different technological spin on the principles by swapping Go and NATS for Rust and Iroh for a few reasons. To start with the most subjective, it's the general purpose programming language that the initial author has been enjoying most in recent years for application development. More objectively, it promises for easier integration with the Rust-based Iroh, Snix, and NixOps4. All of which are promising integrations at various points down the line. Iroh has been selected for its native support for endpoint discovery in any network topology without the reliance on external overlay networking, and for its ease of building custom protocols on top of it.

Looking at the wider Nix ecosystem, there are open-source tools for pull-based updates that can provide valuable inspiration. The following list gives an analysis with counter indications that prevent each respective project to be a viable base for the architecture this project aims for. Please raise an issue or pull-request if you notice incorrect or missing important information.

Project Evaluation Admin Server Agent
Bento on-device Shell script SFTP Same script as Admin on a systemd timer
Comin on-device git commit/push Git repository Golang Agent Daemon periodically polls Git repositories
npcnix on-device Rust CLI "packs" Nix Flake source and uploads it to S3 (AWS) S3 Rust Agent Daemon polls Nix Flake from S3
NixOS' native system.autoUpgrade on-device All supported Flake URL types depends on flake storage Shell script on a timer

Contributing

This project heavily relies on Nix to provide a uniform developer workflow. It can be installed from here.

Development Environment

There's a Nix devShell definition with all Rust dependencies available at .#devShells.${system}.rust:

nix develop .#rust

From here tools like cargo are provided and can be used with your favorite IDE. For an integrated experience there's a direnv configuration provided that automatically loads the Rust development shell.

CI Tests

This project is set up to build in Numtide's buildbot-nix instance. All tests that are run on CI can be run locally with Nix:

nix flake check

Rust native testing

There's a Rust test suite which can of course be run outside of Nix for quicker development iterations. Rust users will most likely be familiar with the vanilla cargo test command.

In addition the development shell also comes with cargo-nextest which can be used with cargo nextest run. The latter is used on CI in a Nix build context so using this locally comes closer to what is used on CI. With a notable exception being that CI doesn't have access to the internet during test runtime.

Nix Binary Cache

The CI publishes its build outputs to a public HTTP binary cache instance.

Configuring your local Nix to use it can speed deployment as well as some actions in local development by downloading pre-built dependencies.

The substitutor URL is https://numtide.cachix.org and the public key is numtide.cachix.org-1:2ps1kLBUWjxIneOy1Ik6cQjb41X0iXVXeHigGmycPPE=.

The cache has also been specified in the flake.nix' nixConfig attribute for the sake of communication. The practical effects of this apply only if you run nix as a trusted user, which has considerable security risks and is not recommended. Please carefully read the warning in the linked documentation for more context.

In a non-trusted user setup, the binary cache is thus configured on the system level.

If you're on NixOS, you can use the following snippet in your configuration accordingly:

# /etc/nixos/configuration.nix
{
  ...

  nix = {
    settings = {
      substituters = [
        "https://numtide.cachix.org"
      ];
      trusted-public-keys = [
        "numtide.cachix.org-1:2ps1kLBUWjxIneOy1Ik6cQjb41X0iXVXeHigGmycPPE="
      ];
    };
  };

  ...
}

Otherwise, of if you choose to configure the cache outside of the NixOS configuration, the system's Nix configuration at /etc/nix/nix.conf can be extended with the following:

extra-substituters = https://numtide.cachix.org
extra-trusted-public-keys = numtide.cachix.org-1:2ps1kLBUWjxIneOy1Ik6cQjb41X0iXVXeHigGmycPPE=

There is some more guidance on this in the nix.dev binary cache recipe.

You could also use the Cachix CLI to configure the binary cache as described at the cache site itself.

Repository Layout

The code is grouped by language or framework name.

Nix

This repository uses the blueprint structure.

/flake.nix
/flake.lock
/nix/ # blueprint set up underneath here.

Rust

/Cargo.toml
/Cargo.lock
/rust/ # all rust code lives here.
/rust/common/Cargo.toml
/rust/common/src/lib.rs


Funding

This project is currently funded through NGI Fediversity Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet program. Learn more at the NLnet project page.

NLnet foundation logo

License

SPDX-License-Identifier: MIT OR Apache-2.0


About

Nix(OS) fleet management

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-Apache-2.0
MIT
LICENSE-MIT

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks