Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fly pg backup list/restore failed with deadline_exceeded sometimes #4194

Open
kubosuke opened this issue Jan 29, 2025 · 1 comment
Open

fly pg backup list/restore failed with deadline_exceeded sometimes #4194

kubosuke opened this issue Jan 29, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@kubosuke
Copy link

kubosuke commented Jan 29, 2025

Please only report specific issues with flyctl behavior. Anything like a support request for your application should go to https://community.fly.io. More people watch that space and can help you faster!

Describe the bug

when we execute fly pg backup list, it sometimes failed with deadline_exceeded

❯ f pg backup list -a ***
Error: failed to exec on VM 2874262c353518: deadline_exceeded: Post "http://unix/v1/exec": net/http: request canceled (Client.Timeout exceeded while awaiting headers) (Request ID: 01JJRPZ859TZM6BZ1A9RN4AZ5N-fra)
  • Operating system
❯ sw_vers
ProductName:		macOS
ProductVersion:		14.5
BuildVersion:		23F79
  • fly version
❯ f version
fly v0.3.70 darwin/arm64 Commit: 239bc529874fd0d24276eb3fdee0d79722ad0a34 BuildDate: 2025-01-28T18:39:08Z

** Paste your fly.toml

# fly.toml app configuration file generated for ***on 2023-11-17T10:14:15+01:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#

app = "***"
primary_region = "fra"
kill_signal = "SIGTERM"

[build]

[deploy]
  release_command = "/app/bin/migrate"
  strategy = "bluegreen"

[env]
  DNS_CLUSTER_QUERY = "***"
  PHX_HOST = "***"
  PORT = "8080"
  PRIMARY_REGION = "fra"
  RELEASE_COOKIE = "***"
[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = false
  auto_start_machines = true
  min_machines_running = 2
  processes = ["app"]
  [http_service.concurrency]
    type = "connections"
    hard_limit = 1000
    soft_limit = 1000
  [[http_service.checks]]
    grace_period = "60s"
    interval = "30s"
    method = "GET"
    timeout = "5s"
    path = "/_healthy"
    tls_skip_verify = false

** Command output: **

❯ f pg backup list -a ***
Error: failed to exec on VM 2874262c353518: deadline_exceeded: Post "http://unix/v1/exec": net/http: request canceled (Client.Timeout exceeded while awaiting headers) (Request ID: 01JJRPZ859TZM6BZ1A9RN4AZ5N-fra)

fyi, Tigris and backup config

Image

❯ f pg backup config show -a ***
  ArchiveTimeout = 60s
  RecoveryWindow = 30d
  FullBackupFrequency = 1h
  MinimumRedundancy = 3
@kubosuke kubosuke added the bug Something isn't working label Jan 29, 2025
@kubosuke
Copy link
Author

kubosuke commented Jan 30, 2025

flexctl backup list sets timeout 10 seconds and it's too tight
https://github.com/fly-apps/postgres-flex/blob/bb46120d4617bef3b7c4cb0a8e21998e37cf87d7/cmd/flexctl/backups.go#L150

when I exec barman-cloud-backup-list it took 144sec, can we increate timeout? or it'd be nice if we could specify timeout from CLI

root@2871961a5dd468:/# start_time=$(date +%s)
root@2871961a5dd468:/# barman-cloud-backup-list --cloud-provider aws-s3 --endpoint-url https://fly.storage.tigris.dev --profile barman s3://*** *** > /dev/null
root@2871961a5dd468:/# end_time=$(date +%s)
root@2871961a5dd468:/# echo "Time taken: $((end_time - start_time)) seconds"
Time taken: 144 seconds

same happened when we run fly pg backup restore

@kubosuke kubosuke changed the title fly pg backup list failed with deadline_exceeded sometimes fly pg backup list/restore failed with deadline_exceeded sometimes Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant