Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Litestream seemingly stops replicating before "waking up" when it receives a SIGTERM #592

Open
RaghavSood opened this issue Jun 24, 2024 · 0 comments

Comments

@RaghavSood
Copy link

I'm using litestream with a systemd service (on NixOS

The systemd unit file is:

[nix-shell:~]# cat /etc/systemd/system/litestream.service
[Unit]
After=networking.target
Description=Litestream

[Service]
Environment="LOCALE_ARCHIVE=/nix/store/9fadhkswwnhfzkadsfvs3rh7g8cv92i5-glibc-locales-2.39-52/lib/locale/locale-archive"
Environment="PATH=/nix/store/i7qhgc0bs725qw3wdanznfkdna4z2ns2-coreutils-9.5/bin:/nix/store/rr1yixvn0z63mgq9s04ig9j9qlz23s2g-findutils-4.9.0/bin:/nix/store/md9apn3290h7kv0x198ihaaa3k6icg4b-gnugrep-3.11/bin:/nix/store/ks6c62g0m3gqrs5i7m0cv6d6aqhdvirn-gnused-4.9/bin:/nix/store/cym39cl4v79bmbng4bvafkxagz96f4hw-systemd-255.6/bin:/nix/store/i7qhgc0bs725qw3wdanznfkdna4z2ns2-coreutils-9.5/sbin:/nix/store/rr1yixvn0z63mgq9s04ig9j9qlz23s2g-findutils-4.9.0/sbin:/nix/store/md9apn3290h7kv0x198ihaaa3k6icg4b-gnugrep-3.11/sbin:/nix/store/ks6c62g0m3gqrs5i7m0cv6d6aqhdvirn-gnused-4.9/sbin:/nix/store/cym39cl4v79bmbng4bvafkxagz96f4hw-systemd-255.6/sbin"
Environment="TZDIR=/nix/store/xfj6pakd03j0vq3qp7gyhsympb735a8r-tzdata-2024a/share/zoneinfo"
EnvironmentFile=/run/keys/litestream
ExecStart=/nix/store/fm1n1bhdxdh9611fvih1qv558q4rhpkx-litestream-0.3.13/bin/litestream replicate
Group=litestream
Restart=always
User=litestream

[Install]
WantedBy=multi-user.target

I'm replicating to Cloudflare R2 using the S3 type as follows:

[nix-shell:~]# cat /etc/litestream.yml 
dbs:
- path: /var/lib/btcsupply/prod.sqlite
  replicas:
  - bucket: burned-money
    endpoint: https://something.r2.cloudflarestorage.com
    path: btcsupply
    type: s3

The correct credentials are provided as environment values.

Every now and then, Litestream seemingly just gives up on replication, with no log output. As far as I can tell, this happens after repeated database is locked errors.

If I then call systemctl stop litestream or systemctl restart litestream, the SIGTERM sent by systemd suddenly seems to "wake up" litestream and it races to catch on on WAL uploads before being killed.

Usually, it can't catch up before systemd force-kills it. Subsequent starts of the litestream service result in it complaining about a WAL header mismatch and force generating a snapshot, before continuing with WAL uploads until it gets stuck again.


[nix-shell:~]# journalctl --output cat -fu litestream
time=2024-06-24T14:34:24.507Z level=ERROR msg="sync error" db=/var/lib/btcsupply/prod.sqlite error="checkpoint: mode=PASSIVE err=database is locked"
time=2024-06-24T14:35:29.054Z level=ERROR msg="sync error" db=/var/lib/btcsupply/prod.sqlite error="checkpoint: mode=PASSIVE err=database is locked"
time=2024-06-24T14:40:45.160Z level=ERROR msg="sync error" db=/var/lib/btcsupply/prod.sqlite error="checkpoint: mode=PASSIVE err=database is locked"
time=2024-06-24T14:40:48.479Z level=ERROR msg="sync error" db=/var/lib/btcsupply/prod.sqlite error="checkpoint: mode=PASSIVE err=database is locked"
time=2024-06-24T14:40:49.491Z level=ERROR msg="sync error" db=/var/lib/btcsupply/prod.sqlite error="checkpoint: mode=PASSIVE err=database is locked"
time=2024-06-24T14:40:50.503Z level=ERROR msg="sync error" db=/var/lib/btcsupply/prod.sqlite error="checkpoint: mode=PASSIVE err=database is locked"
time=2024-06-24T14:41:47.092Z level=ERROR msg="sync error" db=/var/lib/btcsupply/prod.sqlite error="checkpoint: mode=PASSIVE err=database is locked"
time=2024-06-24T14:55:39.625Z level=ERROR msg="sync error" db=/var/lib/btcsupply/prod.sqlite error="checkpoint: mode=RESTART err=database is locked"
time=2024-06-24T14:57:34.006Z level=ERROR msg="sync error" db=/var/lib/btcsupply/prod.sqlite error="checkpoint: mode=PASSIVE err=database is locked"
time=2024-06-24T14:59:24.856Z level=ERROR msg="sync error" db=/var/lib/btcsupply/prod.sqlite error="checkpoint: mode=PASSIVE err=database is locked"
Stopping Litestream...
time=2024-06-24T15:02:23.573Z level=INFO msg="signal received, litestream shutting down"
time=2024-06-24T15:02:23.589Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000377:222129832
time=2024-06-24T15:02:55.358Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000377:222129832 elapsed=31.769224893s sz=1770701840
time=2024-06-24T15:02:55.358Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000378:0
time=2024-06-24T15:02:55.850Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000378:0 elapsed=491.823896ms sz=5022312
time=2024-06-24T15:02:55.850Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000379:0
time=2024-06-24T15:02:59.568Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000379:0 elapsed=3.718017742s sz=179063472
time=2024-06-24T15:02:59.568Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/0000037a:0
time=2024-06-24T15:03:03.617Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/0000037a:0 elapsed=4.048626554s sz=183278232
time=2024-06-24T15:03:03.617Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/0000037b:0
time=2024-06-24T15:03:07.691Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/0000037b:0 elapsed=4.073859418s sz=172360232
time=2024-06-24T15:03:07.692Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/0000037c:0
time=2024-06-24T15:03:09.860Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/0000037c:0 elapsed=2.16823659s sz=54668312
time=2024-06-24T15:03:09.860Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/0000037d:0
time=2024-06-24T15:03:13.367Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/0000037d:0 elapsed=3.506619997s sz=120398792
time=2024-06-24T15:03:13.367Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/0000037e:0
time=2024-06-24T15:03:16.189Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/0000037e:0 elapsed=2.821972233s sz=101932952
time=2024-06-24T15:03:16.189Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/0000037f:0
time=2024-06-24T15:03:19.087Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/0000037f:0 elapsed=2.898215394s sz=113444232
time=2024-06-24T15:03:19.087Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000380:0
time=2024-06-24T15:03:21.882Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000380:0 elapsed=2.794396049s sz=119924992
time=2024-06-24T15:03:21.882Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000381:0
time=2024-06-24T15:03:24.095Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000381:0 elapsed=2.213006245s sz=71968192
time=2024-06-24T15:03:24.095Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000382:0
time=2024-06-24T15:03:26.471Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000382:0 elapsed=2.375819593s sz=63715832
time=2024-06-24T15:03:26.471Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000383:0
time=2024-06-24T15:03:30.445Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000383:0 elapsed=3.97403464s sz=184287632
time=2024-06-24T15:03:30.446Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000384:0
time=2024-06-24T15:03:32.385Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000384:0 elapsed=1.939251933s sz=52138632
time=2024-06-24T15:03:32.385Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000385:0
time=2024-06-24T15:03:35.558Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000385:0 elapsed=3.17241464s sz=123970832
time=2024-06-24T15:03:35.558Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000386:0
time=2024-06-24T15:03:40.869Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000386:0 elapsed=5.31052176s sz=252230552
time=2024-06-24T15:03:40.869Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000387:0
time=2024-06-24T15:03:43.108Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000387:0 elapsed=2.239154383s sz=59851272
time=2024-06-24T15:03:43.108Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000388:0
time=2024-06-24T15:03:45.696Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000388:0 elapsed=2.587942412s sz=88163912
time=2024-06-24T15:03:45.696Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000389:0
time=2024-06-24T15:03:48.755Z level=INFO msg="wal segment written" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/00000389:0 elapsed=3.058842434s sz=114383592
time=2024-06-24T15:03:48.755Z level=INFO msg="write wal segment" db=/var/lib/btcsupply/prod.sqlite replica=s3 position=4ecedef7797ccf65/0000038a:0
litestream.service: State 'stop-sigterm' timed out. Killing.
litestream.service: Killing process 175336 (litestream) with signal SIGKILL.
litestream.service: Failed to kill control group /system.slice/litestream.service, ignoring: Invalid argument
litestream.service: Main process exited, code=killed, status=9/KILL
litestream.service: Failed with result 'timeout'.
Stopped Litestream.
litestream.service: Consumed 1h 5min 31.922s CPU time, received 167.1M IP traffic, sent 99.6G IP traffic.

This is a fairly large database (~60GB), with busy_timeout = 30000 and synchronous=NORMAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant