Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filesystem backup fails with "error to initialize data path" #8767

Open
darnone opened this issue Mar 7, 2025 · 2 comments
Open

Filesystem backup fails with "error to initialize data path" #8767

darnone opened this issue Mar 7, 2025 · 2 comments

Comments

@darnone
Copy link

darnone commented Mar 7, 2025

What steps did you take and what happened:
I have Velero deployed to a cluster using helm chart and ArgoCD. I have a test example with 2 deployments - on for ebs and one for efs. Backups and restore work with the folllowing commands:

velero backup create backup-fs --include-namespaces snapshot --default-volumes-to-fs-backup=true
velero restore create restore-fs --from-backup backup-fs --include-namespaces snapshot

Backups and restores complete successfully. /backups, /kopia, and /restores appear in S3.

Then I tried to back up kube-prometheus-stack. it has 2 ebs volumes - one for prometheus and one for grafana. But the backups partially failed with:

Errors:
  Velero:    message: /pod volume backup failed: error to initialize data path: error to boost backup repository connection velero-backup-storage-location-monitoring-kopia: error to connect backup repo: error to connect repo with storage: error to connect to repository: repository not initialized in the provided storage
             message: /pod volume backup failed: error to initialize data path: error to boost backup repository connection velero-backup-storage-location-monitoring-kopia: error to connect backup repo: error to connect repo with storage: error to connect to repository: repository not initialized in the provided storage
             message: /pod volume backup failed: error to initialize data path: error to boost backup repository connection velero-backup-storage-location-monitoring-kopia: error to connect backup repo: error to connect repo with storage: error to connect to repository: repository not initialized in the provided storage
             message: /pod volume backup failed: error to initialize data path: error to boost backup repository connection velero-backup-storage-location-monitoring-kopia: error to connect backup repo: error to connect repo with storage: error to connect to repository: repository not initialized in the provided storage
             message: /pod volume backup failed: error to initialize data path: error to boost backup repository connection velero-backup-storage-location-monitoring-kopia: error to connect backup repo: error to connect repo with storage: error to connect to repository: repository not initialized in the provided storage
             message: /pod volume backup failed: error to initialize data path: error to boost backup repository connection velero-backup-storage-location-monitoring-kopia: error to connect backup repo: error to connect repo with storage: error to connect to repository: repository not initialized in the provided storage
             message: /pod volume backup failed: error to initialize data path: error to boost backup repository connection velero-backup-storage-location-monitoring-kopia: error to connect backup repo: error to connect repo with storage: error to connect to repository: repository not initialized in the provided storage

What did you expect to happen:
I expected everything to work as my example as the same storage class is used etc.

bundle-2025-03-07-15-42-09.tar.gz

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

Anything else you would like to add:
The cluster has multiple nodegroups. Everything, is deployed to a specific nodegroup, including Kube-prometheus-stack and velero (both into same). The node group has 4 nodes labeled:

nodeSelector:
  node: infra

The nodeagents do not have a node selector so a velero nodeagent is running on all 9 nodes of the cluster

So what a am I doing wrong.

My velero configuration is also listed here:

nodeSelector:
  node: infra
  
image:
  repository: velero/velero
  tag: v1.15.2
  pullPolicy: IfNotPresent

configuration:
  features: EnableCSI
  uploaderType: kopia
  backupStorageLocation:
  - name: velero-backup-storage-location
    #bucket: {{ .Values.velero_backups_bucket }}
    bucket: gts-argocd-ci-velero-dev
    default: true
    provider: aws
    config:
      region: us-east-1
  volumeSnapshotLocation:
  - name: velero-volume-storage-location
    provider: aws
    config:
      region: us-east-1

initContainers:
- name: velero-plugin-for-aws
  image: velero/velero-plugin-for-aws:v1.11.1
  volumeMounts:
  - mountPath: /target
    name: plugins

credentials:
  useSecret: false

resources:
  requests:
    cpu: 500m
    memory: 128Mi
  limits:
    cpu: 1000m
    memory: 512Mi

deployNodeAgent: true

nodeAgent:
  podVolumePath: /var/lib/kubelet/pods

  resources:
    requests:
      cpu: 500m
      memory: 512Mi
    limits:
      cpu: 1000m
      memory: 1024Mi

Environment:

  • Velero version (use velero version): 1.15.2
  • Velero features (use velero client config get features):
  • Kubernetes version (use kubectl version): features:
  • Kubernetes installer & version: terraform 1.30.9
  • Cloud provider or hardware configuration: AWS EKS 1.30
  • OS (e.g. from /etc/os-release): Amazon Linux 2 optimized

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@darnone
Copy link
Author

darnone commented Mar 7, 2025

If I describe of the failed jobs I see

Warning BackoffLimitExceeded 3m5s job-controller Job has reached the specified backoff limit

but I don't know what to do to fix it or if there is a configuration chart to change it. What I don't understand is I have another cluster the the same config (in the same AWS account ) that is running with out problems.

@darnone
Copy link
Author

darnone commented Mar 7, 2025

So what I found and did to fix this was to set:

configuration:
  repositoryMaintenanceJob:
    requests:
      cpu: 1000m
      memory: 1024Mi
    limits:
      memory: 2048Mi

I don't know if the resources are overkill but the backup now completes with no errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant