Skip to content
This repository has been archived by the owner on Nov 10, 2023. It is now read-only.

Prototype #1

Merged
merged 87 commits into from
Oct 19, 2023
Merged
Show file tree
Hide file tree
Changes from 85 commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
b128bdd
Basic keys working
richfitz Oct 13, 2023
d919036
Fix lint
richfitz Oct 13, 2023
322ed5e
Correct format for known_hosts
richfitz Oct 13, 2023
e840e54
Relint
richfitz Oct 13, 2023
94df3e0
Check a configuration
richfitz Oct 13, 2023
674e38c
Tidy up to work locally
richfitz Oct 16, 2023
c0d4144
Add simple server commands
richfitz Oct 16, 2023
d151bea
Smalll pull utility
richfitz Oct 16, 2023
3aa477f
Add ssh configuration
richfitz Oct 16, 2023
0ee8303
Allow basic backup
richfitz Oct 16, 2023
efb6ff9
Basic support for restore
richfitz Oct 16, 2023
57f0b31
Tidy up
richfitz Oct 16, 2023
dbe078d
Better handling of input commands
richfitz Oct 16, 2023
86fd3c1
Support data export
richfitz Oct 16, 2023
3f9cf45
Big tidyup, simpler now
richfitz Oct 16, 2023
d93b3cd
Add manual import path
richfitz Oct 16, 2023
409931e
Reformat
richfitz Oct 16, 2023
4f6f43a
Ignore more files
richfitz Oct 16, 2023
b265da9
Add docker images
richfitz Oct 13, 2023
a095405
Add volume marker
richfitz Oct 16, 2023
3d12d0d
Update ssh config
richfitz Oct 16, 2023
28b2ccc
Basic config validation
richfitz Oct 16, 2023
bde6225
Fix tests
richfitz Oct 16, 2023
777ddec
Expand config examples
richfitz Oct 16, 2023
e25da6d
Add dev notes
richfitz Oct 16, 2023
3357c88
Drop extra spaces in cli usage
richfitz Oct 17, 2023
156cce6
Generate all keys at once
richfitz Oct 17, 2023
21d533c
Use correct vault
richfitz Oct 17, 2023
c910290
Compatibility tweaks
richfitz Oct 17, 2023
bd3a78e
Start moving to having an identity file
richfitz Oct 17, 2023
835b277
Test for unconfigured
richfitz Oct 17, 2023
3fc461e
Better error with invalid name given
richfitz Oct 17, 2023
6d4cbff
Expand testing
richfitz Oct 17, 2023
9889dde
Expand testing
richfitz Oct 17, 2023
6ce7f86
Fix lint
richfitz Oct 17, 2023
357a70b
Basic tests for server
richfitz Oct 17, 2023
54b341c
Bunch more mocks
richfitz Oct 17, 2023
8e89f5b
A little utility testing
richfitz Oct 17, 2023
729daee
Expand server control
richfitz Oct 17, 2023
ca229c3
Tidy up docs
richfitz Oct 17, 2023
9bf458d
Basic dry run tests for backup/restore
richfitz Oct 17, 2023
faa1cf9
Interaction tests of running backup/restore
richfitz Oct 17, 2023
ce7b332
Tidy up docs
richfitz Oct 17, 2023
c293b94
Expand testing
richfitz Oct 17, 2023
1c11d19
More utility tests
richfitz Oct 18, 2023
e536299
Share approach from containers to volumes
richfitz Oct 18, 2023
da9b625
New docker fixture
richfitz Oct 18, 2023
be20b85
Isolate tests with fixtures
richfitz Oct 18, 2023
99aedd5
Add connection test
richfitz Oct 18, 2023
9594d3e
Expand development docs
richfitz Oct 18, 2023
3290070
Expand main docs
richfitz Oct 18, 2023
1eea639
Add extra cli test
richfitz Oct 18, 2023
eba7ca6
Drop archive creation for now
richfitz Oct 18, 2023
173a17c
Shut down server
richfitz Oct 18, 2023
dbb4ff9
Try installing vault from git
richfitz Oct 18, 2023
d5f6246
Correct package-from-github syntax
richfitz Oct 18, 2023
dafd0b5
Another attempt at versions
richfitz Oct 18, 2023
1b00827
Install vault if required
richfitz Oct 18, 2023
10e2397
Add tmate
richfitz Oct 18, 2023
58ea371
Skip test for now
richfitz Oct 18, 2023
89e16be
Correct skip syntax
richfitz Oct 18, 2023
59f5f39
Fix lint
richfitz Oct 18, 2023
f16b601
Replace integration test with unit test
richfitz Oct 18, 2023
10c61a8
Fix lint
richfitz Oct 18, 2023
a8ddeb0
Change mount point
richfitz Oct 18, 2023
9f0a6b7
Use different structure for storage
richfitz Oct 18, 2023
5145c13
Move keys too
richfitz Oct 18, 2023
ab1eab6
Add buildkite configuration
richfitz Oct 18, 2023
4ce714c
Fix lint
richfitz Oct 18, 2023
22890cf
Fix docker build
richfitz Oct 18, 2023
71898cd
Relax naming requirement
richfitz Oct 18, 2023
fba1963
Use correct branch for images
richfitz Oct 18, 2023
f71a507
Pull images in action
richfitz Oct 18, 2023
63b3d65
Better docs
richfitz Oct 18, 2023
16fbbdf
Remove dep from docker
richfitz Oct 18, 2023
c8e28e6
Remove redundant call
richfitz Oct 18, 2023
b44e61b
Add missing arg to backup
richfitz Oct 18, 2023
1f15bbd
Don't store things double-secret
richfitz Oct 18, 2023
c7478c1
Don't hardcode branch in tests
richfitz Oct 18, 2023
569b308
Use literal branch name
richfitz Oct 18, 2023
43e3648
Allow ommitting the path
richfitz Oct 18, 2023
4e14809
Apply suggestions from code review
richfitz Oct 19, 2023
6bc9e1c
Fix fstring, with test
richfitz Oct 19, 2023
6765c34
Use more descriptive name for utility
richfitz Oct 19, 2023
859dae2
Apply suggestions from code review
richfitz Oct 19, 2023
d5e2233
Update README.md
richfitz Oct 19, 2023
c876866
Refactor to use a common service approach
richfitz Oct 19, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ jobs:
run:

runs-on: ubuntu-latest
timeout-minutes: 5
strategy:
fail-fast: false
matrix:
Expand All @@ -32,6 +33,10 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install hatch
- name: Pull images
run: |
docker pull mrcide/privateer-server:prototype
docker pull mrcide/privateer-client:prototype
- name: Test
run: |
hatch run cov-ci
Expand Down
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
*.pyc
__pycache__
dist/
.coverage
coverage.xml
*.tar
.privateer_identity
tmp/
.coverage.*
90 changes: 87 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,94 @@

-----

**Table of Contents**
## The idea

- [Installation](#installation)
- [License](#license)
We need a way of synchronising some docker volumes from a machine to some backup server, incrementally, using `rsync`. We previously used [`offen/docker-volume-backup`](https://github.com/offen/docker-volume-backup) to backup volumes in their entirety to another machine as a tar file but the space and time requirements made this hard to use in practice.

### The setup

We assume some number of **server** machines -- these will receive data, and some number of **client** machines -- these will send data to the server(s). A client can back any number of volumes to any number of servers, and a server can receive and serve any number of volumes to any number of clients.

A typical framework for us would be that we would have a "production" machine which is backing up to one or more servers, and then some additional set of "staging" machines that receive data from the servers, which in practice never send any data.

Because we are going to use ssh for transport, we assume existence of [HashiCorp Vault](https://www.vaultproject.io/) to store secrets.

### Configuration

The system is configured via a single `json` document, `privateer.json` which contains information about all the moving parts: servers, clients, volumes and the vault configuration. See [`example/`](example/) for some examples.

We imagine that your configuration will exist in some repo, and that that repo will be checked out on all involved machines. Please add `.privateer_identity` to your `.gitignore` for this repo.

### Setup

After writing a configuration, on any machine run

```
privateer2 keygen --all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to replace privateer eventually?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'll do some sort of merge into that package source

```

which will generate ssh keypairs for all machines and put them in the vault. This only needs to be done once, but you might need to run it again if

* you add more machines to your system
* you want to rotate keys

Once keys are written to the vault, on each machine run

```
privateer2 configure <name>
```

replacing `<name>` with the name of the machine within either the `servers` or `clients` section of your configuration. This sets up a special docker volume that will persist ssh keys and configurations so that communication between clients and servers is straightforward and secure. It also leaves a file `.privateer_identity` at the same location as the configuration file, which is used as the default identity for subsequent commands. Typically this is what you want.

### Manual backup

```
privateer backup <volume>
richfitz marked this conversation as resolved.
Show resolved Hide resolved
```

Add `--dry-run` to see the commands to run it yourself

### Restore

Restoration is always manual

```
privateer2 restore <volume> [--server=NAME] [--source=NAME]
```

where `--server` controls the server you are pulling from (useful if you have more than one configured) and `--source` controls the original machine that backed the data up (if more than one machine is pushing backups).

For example, if you are on a "staging" machine, connecting to the "backup" server and want to pull the "user_data" volume that was backed up from "production" machine called you would type

```
privateer2 restore user_data --server=backup --source=production
```


## What's the problem anyway?

[Docker volumes](https://docs.docker.com/storage/volumes/) are useful for abstracting away some persistent storage for an application. They're much nicer to use than bind mounts because they don't pollute the host system with immovable files (docker containers often running as root or with a uid different to the user running docker). The docker [docs](https://docs.docker.com/storage/volumes/#back-up-restore-or-migrate-data-volumes) describe some approaches to backup and restore but in practice this ignores many practical issues, especially when the volumes are large or off-site backup is important.

We want to be able to synchronise a volume to another volume on a different machine; our setup looks like this:

```
bob alice
+-------------------+ +-----------------------+
| | | |
| application | | |
| | | | |
| volume1 | | volume2 |
| | | ssh/ | | |
| privateer-client--=----------=---> privateer-server |
| | | rsync | | |
| keys | | keys |
| | | |
+-------------------+ +-----------------------+
```

so in this case `bob` runs a privateer client which sends data over ssh+rsync to a server running on `alice`, eventually meaning that the data in `volume1` on `bob` is replicated to `volume2` on `alice`. This process uses a set of ssh keys that each client and server will hold in a `keys` volume. This means that they do not interact with any ssh systems on the host. Note that if `alice` is also running sshd, this backup process will use a *second* ssh connection.

In addition, we will support point-in-time backups on `alice`, creating `tar` files of the volume onto disk that can be easily restored onto any host.

## Installation

Expand Down
3 changes: 3 additions & 0 deletions buildkite/pipeline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
steps:
- label: ":whale: Build and push"
command: docker/build
115 changes: 115 additions & 0 deletions development.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Development notes

Because this uses docker, vault and requires work with hostnames, this is going to be hard to test properly without a lot of mocking. We'll update this as our strategy improves.

## Vault server for testing

We use [`vault-dev`](https://github.com/vimc/vault-dev) to bring up vault in testing mode. You can also do this manually (e.g., to match the configuration in [`example/simple.json`](example/simple.json) by running

```
vault server -dev -dev-kv-v1
```

If you need to interact with this on the command line, use:

```
export VAULT_ADDR='http://127.0.0.1:8200'
export VAULT_TOKEN=$(cat ~/.vault-token)
```

within the hatch environment before running any commands.

## Worked example

We need to swap in the globally-findable address for alice (`alice.example.com`) for the value of the machine this is tested on:

```
mkdir -p tmp
sed "s/alice.example.com/$(hostname)/" example/local.json > tmp/privateer.json
```

Create a set of keys

```
privateer2 --path tmp keygen --all
```

You could also do this individually like

```
privateer2 --path tmp keygen alice
```

Set up the key volumes

```
privateer2 --path tmp configure alice
privateer2 --path tmp configure bob
```

Start the server, as a background process (note that if these were on different machine the `privateer2 configure <name>` step would generate the `.privateer_identity` automatically so the `--as` argument is not needed)

```
privateer2 --path tmp --as=alice server start
```

Once `alice` is running, we can test this connection from `bob`:

```
privateer2 --path tmp --as=bob check --connection
```

This command would be simpler to run if we are in the `tmp` directory, which would be the usual situation in a multi-machine setup

```
privateer2 check --connection
```

For all other commands below, you can drop the `--path` and `--as` arguments if you change directory.

Create some random data within the `data` volume (this is the one that we want to send from `bob` to `alice`)

```
docker volume create data
docker run -it --rm -v data:/data ubuntu bash -c "base64 /dev/urandom | head -c 100000 > /data/file1.txt"
```

We can now backup from `bob` to `alice` as:

```
privateer2 --path tmp --as=bob backup data
```

or see what commands you would need in order to try this yourself:

```
privateer2 --path tmp --as=bob backup data --dry-run
```

Delete the volume

```
docker volume rm data
```

We can now restore it:

```
privateer2 --path tmp --as=bob restore data
```

or see the commands to do this ourselves:

```
privateer2 --path tmp --as=bob restore data --dry-run
```

Tear down the server with

```
privateer2 --path tmp --as=alice server stop
```

## Writing tests

We use a lot of global resources, so it's easy to leave behind volumes and containers (often exited) after running tests. At best this is lazy and messy, but at worst it creates hard-to-diagnose dependencies between tests. Try and create names for auto-cleaned volumes and containers using the `managed_docker` fixture (see [`tests/conftest.py`](tests/conftest.py) for details).
10 changes: 10 additions & 0 deletions docker/Dockerfile.client
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
FROM ubuntu

RUN apt-get update && \
apt-get install -y --no-install-recommends \
openssh-client \
rsync && \
mkdir -p /root/.ssh

COPY ssh_config /etc/ssh/ssh_config
VOLUME /privateer/keys
17 changes: 17 additions & 0 deletions docker/Dockerfile.server
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
FROM ubuntu

RUN apt-get update && \
apt-get install -y --no-install-recommends \
openssh-client \
openssh-server \
rsync && \
mkdir -p /var/run/sshd && \
mkdir -p /root/.ssh

COPY sshd_config /etc/ssh/sshd_config

VOLUME /privateer/keys
VOLUME /privateer/volumes
EXPOSE 22

ENTRYPOINT ["/usr/sbin/sshd", "-D", "-E", "/dev/stderr"]
30 changes: 30 additions & 0 deletions docker/build
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/usr/bin/env bash
set -exu

HERE=$(dirname $0)
. $HERE/common

docker build --pull \
--tag $TAG_SERVER_SHA \
-f $HERE/Dockerfile.server \
$HERE

docker build --pull \
--tag $TAG_CLIENT_SHA \
-f $HERE/Dockerfile.client \
$HERE

docker push $TAG_SERVER_SHA
docker push $TAG_CLIENT_SHA

docker tag $TAG_SERVER_SHA $TAG_SERVER_BRANCH
docker push $TAG_SERVER_BRANCH
docker tag $TAG_CLIENT_SHA $TAG_CLIENT_BRANCH
docker push $TAG_CLIENT_BRANCH

if [ $GIT_BRANCH == "main" ]; then
docker tag $TAG_SERVER_SHA $TAG_SERVER_LATEST
docker push $TAG_SERVER_LATEST
docker tag $TAG_CLIENT_SHA $TAG_CLIENT_LATEST
docker push $TAG_CLIENT_LATEST
fi
24 changes: 24 additions & 0 deletions docker/common
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# -*-sh-*-
DOCKER_ROOT=$(realpath $HERE/..)
PACKAGE_ORG=mrcide
CLIENT_NAME=privateer-client
SERVER_NAME=privateer-server

# Buildkite doesn't check out a full history from the remote (just the
# single commit) so you end up with a detached head and git rev-parse
# doesn't work
if [ false && "$BUILDKITE" = "true" ]; then
GIT_SHA=${BUILDKITE_COMMIT:0:7}
GIT_BRANCH=$BUILDKITE_BRANCH
else
GIT_SHA=$(git -C "$DOCKER_ROOT" rev-parse --short=7 HEAD)
GIT_BRANCH=$(git -C "$DOCKER_ROOT" symbolic-ref --short HEAD)
fi

TAG_CLIENT_SHA="${PACKAGE_ORG}/${CLIENT_NAME}:${GIT_SHA}"
TAG_CLIENT_BRANCH="${PACKAGE_ORG}/${CLIENT_NAME}:${GIT_BRANCH}"
TAG_CLIENT_LATEST="${PACKAGE_ORG}/${CLIENT_NAME}:latest"

TAG_SERVER_SHA="${PACKAGE_ORG}/${SERVER_NAME}:${GIT_SHA}"
TAG_SERVER_BRANCH="${PACKAGE_ORG}/${SERVER_NAME}:${GIT_BRANCH}"
TAG_SERVER_LATEST="${PACKAGE_ORG}/${SERVER_NAME}:latest"
6 changes: 6 additions & 0 deletions docker/ssh_config
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
PasswordAuthentication no
IdentityFile /privateer/keys/id_rsa
SendEnv LANG LC_*
HashKnownHosts no
UserKnownHostsFile /privateer/keys/known_hosts
Include /privateer/keys/config
28 changes: 28 additions & 0 deletions docker/sshd_config
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
Include /etc/ssh/sshd_config.d/*.conf

PermitRootLogin prohibit-password
#StrictModes yes
#MaxAuthTries 1
#MaxSessions 10

PubkeyAuthentication yes

AuthorizedKeysFile /privateer/keys/authorized_keys
HostKey /privateer/keys/id_rsa

PasswordAuthentication no
ChallengeResponseAuthentication no
UsePAM no

#AllowAgentForwarding yes
#AllowTcpForwarding yes
#GatewayPorts no
X11Forwarding no
# PermitTTY no
PrintMotd no

# Allow client to pass locale environment variables
AcceptEnv LANG LC_*

# override default of no subsystems
# Subsystem sftp /usr/lib/openssh/sftp-server
Loading