Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal: unable to create lock in backend: repository is already locked by PID 55 on docker by root (UID 0, GID 0) #171

Open
benjamin051000 opened this issue Aug 9, 2023 · 17 comments
Labels

Comments

@benjamin051000
Copy link

Running resticker latest, trying to backup both my docker volumes and a folder in my homedir to backblaze. I also use immich so I dump the immich db with the before command, and exclude some folders I don't want backed up.

docker-compose (running as a portainer stack):

version: "3.3"

services:
  backup:
    image: mazzolino/restic
    hostname: docker
    # restart: unless-stopped
    environment:
      RUN_ON_STARTUP: "true"
      BACKUP_CRON: "0 0 3 * * *"
      RESTIC_REPOSITORY: b2:hpmelab-backup:/restic-repo
      RESTIC_PASSWORD: [redacted]
      RESTIC_BACKUP_SOURCES: /backup/
      RESTIC_BACKUP_ARGS: --tag docker-volumes --exclude "/backup/services/immich/backup/encoded-video" --exclude "/backup/services/immich/backup/thumbs"
      RESTIC_FORGET_ARGS: --keep-daily 7 --keep-weekly 4
      B2_ACCOUNT_ID: [redacted]
      B2_ACCOUNT_KEY: [redacted]
      PRE_COMMANDS: |-
        docker exec -t immich_postgres pg_dumpall -c -U postgres | gzip > "/backup/services/immich/db_dumps/dump.sql.gz"

      TZ: America/Chicago
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock  # for pre commands
      - /var/lib/docker/volumes:/backup/volumes:ro
      - /home/bw/services:/backup/services
      

  check:
    image: mazzolino/restic
    hostname: docker
    # restart: unless-stopped
    environment:
      RUN_ON_STARTUP: "false"
      CHECK_CRON: "0 0 4 * * *"  # 1hr after resticker backup/upload
      RESTIC_CHECK_ARGS: >-
        --read-data-subset=10%
      RESTIC_REPOSITORY: b2:hpmelab-backup:/restic-repo
      RESTIC_PASSWORD: [redacted]
      B2_ACCOUNT_ID: [redacted]
      B2_ACCOUNT_KEY: [redacted]
      TZ: America/Chicago

log:

Checking configured repository 'b2:hpmelab-backup:/restic-repo' ...
Repository found.
Executing backup on startup ...
docker exec -t immich_postgres pg_dumpall -c -U postgres | gzip > "/backup/services/immich/db_dumps/dump.sql.gz"
Starting Backup at 2023-08-08 23:36:02
no parent snapshot found, will read all files
Files:       43987 new,     0 changed,     0 unmodified
Dirs:         5203 new,     0 changed,     0 unmodified
Added to the repository: 30.791 GiB (29.821 GiB stored)
processed 43987 files, 33.560 GiB in 17:24
snapshot 5ebdeacd saved
Backup successful
Forget about old snapshots based on RESTIC_FORGET_ARGS = --keep-daily 7 --keep-weekly 4
Fatal: unable to create lock in backend: repository is already locked by PID 55 on docker by root (UID 0, GID 0)
lock was created at 2023-08-08 23:35:05 (18m25.400514635s ago)
storage ID f8c80095

Any ideas on why it fails after backup is complete? I even see the repo in backblaze. It seems like the step it's failing on is scheduling the cron task. Any help would be greatly appreciated, thanks!

@benjamin051000
Copy link
Author

Looks like this is the spot where it's failing, line 94:

resticker/backup

Lines 93 to 94 in 65e361d

echo Forget about old snapshots based on RESTIC_FORGET_ARGS = "${RESTIC_FORGET_ARGS[@]}"
restic forget "${tag_options[@]}" "${RESTIC_FORGET_ARGS[@]}"

@benjamin051000
Copy link
Author

1.6.0 doesn't appear to have this issue and makes it past the forget flag. I'll stay on that version until I hear about a fix

@benjamin051000
Copy link
Author

restic/restic#3491 may be related?

@razaqq
Copy link

razaqq commented Sep 6, 2023

Same issue here, no idea how to downgrade tho once the repo is in the newer state

@djmaze
Copy link
Owner

djmaze commented Sep 27, 2023

Not using b2 myself. Anyone wants to try out the mazzolino/restic:latest which now contains Restic 0.16.0?

@djmaze djmaze added the bug label Sep 27, 2023
@littlegraycells
Copy link

Has there been any progress on this issue? Checking my restic logs today, I realized my backups haven't been working for several weeks and found this error in my logs. Following the comment above, I pulled version 1.7.1 and this error seems to still be there.

Forget about old snapshots based on RESTIC_FORGET_ARGS = --keep-last 10 --keep-daily 7 --keep-weekly 5 --keep-monthly 12

repo already locked, waiting up to 0s for the lock

unable to create lock in backend: repository is already locked by PID 29 on restic_server by root (UID 0, GID 0)

lock was created at 2023-12-07 01:25:45 (922h55m39.463073982s ago)

storage ID 08cb065a

the `unlock` command can be used to remove stale locks

Based on restic/restic#2736, it appears that the current guidance is to basically use the unlock command before running other commands.

@djmaze
Copy link
Owner

djmaze commented Jan 15, 2024

@littlegraycells Yes, manually unlocking is still the suggested advice.

To be more precise, I would suggest to always have a monitoring solution for backups (which you do not seem to have). For example by sending emails in error cases (as shown in the documentation). Or, even better, using something like Healthchecks in order to make sure failures are not being missed. (I can warmly recommend the latter one, you can also host it yourself!)

For me, having about 15 different servers, this procedure works very well.

That said, I can see that an auto-unlock solution as proposed in the restic issue could work. But that should be implemented there.

@littlegraycells
Copy link

@djmaze Thanks. I do run a self-hosted version of healthchecks.io currently.

Would you recommend running the unlock command with PRE_COMMANDS in the backup container?

@ThomDietrich
Copy link
Contributor

ThomDietrich commented Jan 15, 2024

Hey @djmaze, that's an interesting comment. I use Uptime Kuma to observer services directly, as well as containers an their healthcheck status. I did not kno that healthchecks can be self-hosted!

Irrespective, what I found challenging with resticker is:

  • I do not want a notification if there is just a one-time sync issue. That can happen and is not a problem.
  • I want a warning after x consecutive unsuccessful backup attempts.
  • I want a waning if the backup did not run for x days.

Do you have a solution to this? Cheers!

@djmaze
Copy link
Owner

djmaze commented Jan 15, 2024

  • I do not want a notification if there is just a one-time sync issue. That can happen and is not a problem.

That's what POST_COMMANDS_INCOMPLETE is for. (Tbh, I personally do not (yet) use it because I have very few failures and it does not bug me.)

I want a warning after x consecutive unsuccessful backup attempts.
I want a waning if the backup did not run for x days.

Mhh.. We could implement this in resticker. But if you are using Healthchecks, you could also solve it by just pinging Healthchecks using POST_COMMANDS_SUCCESS and then set the healthcheck grace period to x days. So Healthchecks will notify when the grace period has been exceeded.

@djmaze
Copy link
Owner

djmaze commented Jan 15, 2024

Would you recommend running the unlock command with PRE_COMMANDS in the backup container?

If you have only one host using the repository, this might make sense, but if there is more than one (as is the case e.g. when running prunes on a bigger server, like I do) in my opinion that is too dangerous.

(I could agree with a solution which automatically removes locks that are e.g. > 24 hours old. But as I said I would prefer this to be solved upstream.)

@razaqq
Copy link

razaqq commented Jan 16, 2024

Well currently resticker is completely unusable for many people, because of the issue detailed above. Every time it tries to backup, it goes into an infinite loop trying to lock the repo

@djmaze
Copy link
Owner

djmaze commented Jan 16, 2024

@razaqq Well, afaics there is still no reproducible test case.

As another workaround, you could also remove RESTIC_FORGET_ARGS and run the forget manually at times.

@thierrybla
Copy link

I am also running into this issue.

pre_commands with restic unlock also doesn't seem to work for me.

If I unlock the repository from another machine it starts backing up again for a while only to get locked again.
See:
2024-05-02 13:58:07.091559+02:00 Checking configured repository 'rclone:google-drive:backups/restic' ...
2024-05-02 13:58:12.984656+02:00 unable to create lock in backend: repository is already locked exclusively by PID 1425 on restic-backup-custom-app-859c787754-w8n9n by root (UID 0, GID 0)
2024-05-02 13:58:12.984702+02:00lock was created at 2024-04-10 05:15:04 (536h43m8.169813168s ago)
2024-05-02 13:58:12.984710+02:00 storage ID de9659b9
2024-05-02 13:58:12.984716+02:00 the unlock command can be used to remove stale locks
2024-05-02 13:58:12.984741+02:00 Could not access the configured repository.
2024-05-02 13:58:12.984748+02:00 Trying to initialize (in case it has not been initialized yet) ...
2024-05-02 13:58:14.908435+02:00 Fatal: create repository at rclone:google-drive:backups/restic failed: config file already exists
2024-05-02 13:58:14.908483+02:00 2024-05-02T13:58:14.908483735+02:00
2024-05-02 13:58:14.908621+02:00 Initialization failed. Please see error messages above and check your configuration. Exiting.

@djmaze
Copy link
Owner

djmaze commented May 9, 2024

@thierrybla It would help if you could the original container / job that the lock came from. In your example the lock is quite old, maybe it was a prune which did not finish (because of lack of memory or similar)?

@thierrybla
Copy link

@thierrybla It would help if you could the original container / job that the lock came from. In your example the lock is quite old, maybe it was a prune which did not finish (because of lack of memory or similar)?

It should not be lack of memory I am running 128gb of RAM but its not near full at all time.

@cyl3x
Copy link

cyl3x commented Sep 20, 2024

@djmaze

That's what POST_COMMANDS_INCOMPLETE is for.

I just discovered that I have had a stale lock for a while now. I use the POST_COMMANDS_ to get discord notifications, all commands use the same script. While SUCCESS and ERROR work fine, I didn't get a notification about the stale lock, which I suspect should have been triggered as INCOMPLETE.
In conclusion, it seems that a stale lock doesn't trigger the incomplete or error flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants