Skip to content

Commit

Permalink
docs: ks1 install (#411)
Browse files Browse the repository at this point in the history
Setting up new server for images.

---------

Co-authored-by: Pierre Slamich <[email protected]>
  • Loading branch information
alexgarel and teolemon authored Oct 17, 2024
1 parent 099eead commit 62c3151
Show file tree
Hide file tree
Showing 6 changed files with 283 additions and 19 deletions.
4 changes: 4 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ off1:
- changed-files:
- any-glob-to-any-file: '**/*off1*'

ks1:
- changed-files:
- any-glob-to-any-file: '**/*ks1*'

ovh1:
- changed-files:
- any-glob-to-any-file: '**/*ovh1*'
Expand Down
5 changes: 3 additions & 2 deletions docs/mail.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,8 @@ We normally keeps a standard `/etc/aliases`.
We have specific groups to receive emails: `[email protected]` and `[email protected]`

You may add some redirections for non standard users to one of those groups.
Do not forget to run `newaliases`, and [`etckeeper`](./linux-server.md#etckeeper).
Do not forget to run `newaliases`, and [`etckeeper`](./linux-server.md#etckeeper)
and restart the postfix service (`postfix.service` and/or `[email protected]`).

### Postfix configuration

Expand All @@ -159,7 +160,7 @@ Run: `dpkg-reconfigure postfix`:
**IMPORTANT:**
On some system, the real daemon is not `postfix.service` but `[email protected]`

(so eg., if you touch `/etc/alias` (with after `sudo newaliases`) you need to `systemctl reload [email protected]`
So, for example, if you touch `/etc/alias` (with after `sudo newaliases`) you need to `systemctl reload [email protected]`

### Exim4 configuration

Expand Down
2 changes: 1 addition & 1 deletion docs/reports/2024-06-05-off1-reverse-proxy-install.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Network: name=eth0,bridge=vmbr1,ip=10.1.0.100/24,gw=10.0.0.1

I then simply install `nginx` using apt.

I also [configure postfix](../mail#postfix-configuration) and tested it.
I also [configure postfix](../mail.md#postfix-configuration) and tested it.

### Adding the IP

Expand Down
240 changes: 236 additions & 4 deletions docs/reports/2024-09-24-kimsufi-stor-ks1-installation.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,260 @@
# Kimsufi STOR - ks1 installation

## Rationale for new server
## Rationale for new server

We have performance issues on off1 and off2 that are becoming unbearable, in particular disk usage on off2 is so high that 60% of processes are in iowait state.

We just moved today (24/09/2024) images serving from off2 to off1, but that just move the problem to off1.

We are thus installing a new cheap Kimsufi server to see if we can move the serving of images to it.

## Server specs
## Server specs

KS-STOR - Intel Xeon-D 1521 - 4 c / 8 t - 16 Gb RAM - 4x 6 Tb HDD + 500 Gb SSD

## Install
## Install

We create a A record ks1.openfoodfacts.org to point it to the IP of the server: 217.182.132.133
In OVH's console, we rename the server to ks1.openfoodfacts.org

On OVH console, we install Debian 12 Bookworm on the SSD.

**IMPORTANT:** this was not an optimal choice, we should have reserved part of the SSD to use it as a cache drive for the ZFS pool.

Once the install is complete, OVH sends the credentials by email.

We add users for the admin(s) and give sudo access:

```bash
sudo usermod -aG sudo [username]
```

Set hostname `hostnamectl hostname ks1`

I also manually runned the usual commands found in ct_postinstall.

I also followed [How to have server config in git](../how-to-have-server-config-in-git.md)

I also added the email on failure systemd unit.

I edited `/etc/netplan/50-cloud-init.yaml` to add default search
```yaml
network:
version: 2
ethernets:
eno3:
(...)
nameservers:
search: [openfoodfacts.org]
```
and run `netplan try`.

## Email

Email is important to send alert on service failure.

I also configured email by removing exim4 and installing postfix.
```bash
sudo apt purge exim4-base exim4-config && \
sudo apt install postfix bsd-mailx
```
and following [Server, postfix configuration](../mail.md#postfix-configuration).

I also had to had ks1 ip address to [forwarding rules on ovh1 to the mail gateway](../mail.md#redirects).
```bash
iptables -t nat -A PREROUTING -s 217.182.132.133 -d pmg.openfoodfacts.org -p tcp --dport 25 -j DNAT --to 10.1.0.102:25
iptables-save > /etc/iptables/rules.v4.new
# control
diff /etc/iptables/rules.v4{,.new}
mv /etc/iptables/rules.v4{.new,}
etckeeper commit "added rule for ks1 email"
```

Test from ks1:
```bash
echo "test message from ks1" |mailx -s "test root ks1" -r [email protected] root
```

## Install and setup ZFS

### Install ZFS
```bash
sudo apt install zfsutils-linux
sudo /sbin/modprobe zfs
```

Added the `zfs.conf` file to `/etc/modprobe.d`
Then run `update-initramfs -u -k all`

### Create ZFS pool

`lsblk` shows me existing disks. The 4 disks are available, system is installed on the NVME SSD.

So I created the pool with them (see [How to create a zpool](../zfs-overview.md#how-to-create-a-zpool))

```bash
zpool create zfs-hdd /dev/sd{a,b,c,d}
```

### Setup compression

We want to enable compression on the pool.

```bash
zfs set compression=on zfs-hdd
```

Note: in reality it was not enabled from start, I enabled it after first snapshot sync,
as I saw is was taking much more space than on the original server.

### Fine tune zfs

Set `atime=off` et `relatime=no` on the ZFS dataset `zfs-hdd/off/images` to avoid writting.

## Install sanoid / syncoid

I installed the sanoid.deb that I got from the off1 server.

```bash
apt install libcapture-tiny-perl libconfig-inifiles-perl
apt install lzop mbuffer pv
dpkg -i /home/alex/sanoid_2.2.0_all.deb
```

## Sync data

After installing sanoid, I am ready to sync data.

I first create a off dataset to have same structure as on other servers:
```bash
zfs create zfs-hdd/off
```

I 'll sync the data from OVH3 since it's the same data-center.

I created a ks1operator user on ovh3, following [creating operator on PROD_SERVER](../sanoid.md#creating-operator-on-prod_server)

I also had to make a `ln -s /usr/sbin/zfs /usr/bin/zfs` on ovh3

Then I used:

```bash
time syncoid --no-sync-snap --no-privilege-elevation [email protected]:rpool/off/images zfs-hdd/off/images
```

It took 3594 minutes, that is 60 hours or 2.5 days.

I removed old snapshots (old style) from ks1, as they are not needed here):
```bash
for f in $(zfs list -t snap -o name zfs-hdd/off/images|grep "images@202");do zfs destroy $f;done
```
the other snapshot will normally be pruned by sanoid.

## Configure sanoid

I created the sanoid and syncoid configuration.

I added ks1operator on off2.

Finally I also installed the standard sanoid / syncoid systemd units and the sanoid_check unit.

and enable them:

```bash
systemctl enable --now sanoid.timer
systemctl enable syncoid.service
systemctl enable --now sanoid_check.timer
## Firewall
As the setting will be simple (no masquerading / forwarding), we will use ufw.
```bash
apt install ufw

ufw allow OpenSSH
ufw allow http
ufw allow https
ufw default deny incoming
ufw default allow outgoing

# verify
ufw show added
# go
ufw enable
```

fail2ban is already installed, but failing with:
```
Failed during configuration: Have not found any log file for sshd jail
```
This is because the sshd daemon logs into systemd-journald, not in a log file.
To fix that, I modified `/etc/fail2ban/jail.d/defaults-debian.conf` to be:
```ini
[sshd]
enabled = true
backend = systemd
```

Addendum: after Christian installed Munin node, I added port 4949

## NGINX

### Install

I installed nginx and certbot:
```bash
apt install nginx
apt install python3-certbot python3-certbot-nginx
```

I also added the nginx.service.d override to email on failure.

### Configure

Created `confs/ks1/nginx/sites-available/images-off` akin to off1 configuration.

`ln -s /opt/openfoodfacts-infrastructure/confs/ks1/nginx/sites-available/images-off /etc/nginx/sites-enabled/images-off`

### Certificates

As I can't use certbot until having the DNS pointing to this server,
I copied the one from off1.

```bash
ssh -A off1
sudo -E bash
# see active certificates
ls -l /etc/letsencrypt/live/images.openfoodfacts.org/
# here it's 19, copy them
scp /etc/letsencrypt/archive/images.openfoodfacts.org/*19* [email protected]:

exit
exit
```

On ks1:
```bash
mkdir -p /etc/letsencrypt/{live,archive}/images.openfoodfacts.org
mv /home/alex/*19* /etc/letsencrypt/archive/images.openfoodfacts.org/
ln -s /etc/letsencrypt/archive/images.openfoodfacts.org/cert19.pem /etc/letsencrypt/live/images.openfoodfacts.org/cert.pem
ln -s /etc/letsencrypt/archive/images.openfoodfacts.org/chain19.pem /etc/letsencrypt/live/images.openfoodfacts.org/chain.pem
ln -s /etc/letsencrypt/archive/images.openfoodfacts.org/fullchain19.pem /etc/letsencrypt/live/images.openfoodfacts.org/fullchain.pem
ln -s /etc/letsencrypt/archive/images.openfoodfacts.org/privkey19.pem /etc/letsencrypt/live/images.openfoodfacts.org/privkey.pem
chown -R root:root /etc/letsencrypt/
chmod go-rwx /etc/letsencrypt/{live,archive}
```

## Testing

On my host I modified /etc/hosts to have:
```hosts
217.182.132.133 images.openfoodfacts.org
```
and visited the website with my browser, with developer tools open.

I can also use curl:
```bash
curl --resolve images.openfoodfacts.org:443:217.182.132.133 https://images.openfoodfacts.org/images/products/087/366/800/2989/front_fr.3.400.jpg --output /tmp/front_fr.jpg -v
xdg-open /tmp/front_fr.jpg
```
10 changes: 7 additions & 3 deletions docs/sanoid.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,12 +152,13 @@ mkdir /home/$OPERATOR/.ssh
vim /home/$OPERATOR/.ssh/authorized_keys
# copy BACKUP_SERVER root public key

chown -R /home/$OPERATOR
chown $OPERATOR:$OPERATOR -R /home/$OPERATOR
chmod go-rwx -R /home/$OPERATOR/.ssh
```

Adding needed permissions to pull zfs syncs
```bash
# choose the right dataset according to your needs
zfs allow $OPERATOR hold,send zfs-hdd
zfs allow $OPERATOR hold,send zfs-nvme
zfs allow $OPERATOR hold,send rpool
Expand All @@ -169,7 +170,7 @@ On BACKUP_SERVER, test ssh connection:

```bash
OPERATOR=${BACKUP_SERVER}operator
ssh $OPERATOR@<ip for server>
ssh $OPERATOR@<ip or host>
```

#### config syncoid
Expand All @@ -187,4 +188,7 @@ Use `--recursive` to also backup subdatasets.

Don't forget to create a sane retention policy (with `autosnap=no`) in sanoid on $BACKUP_SERVER to remove old data.

**Note:** because of the 6h timeout, if you have big datasets, you may want to do the first synchronization before enabling the service.
**Note:** because of the 6h timeout, if you have big datasets, you may want to do the first synchronization before enabling the service.

**Important:** try to have a good hierarchy of datasets, and separate what's from the server and what's from other servers.
Normally we put other servers backups in a off-backups dataset. It's important not to mix it with backups dataset which is for the server itself.
Loading

0 comments on commit 62c3151

Please sign in to comment.