Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc error: code = Unavailable desc = error reading from server: EOF #6780

Closed
manneymc opened this issue Sep 6, 2023 · 5 comments
Closed
Assignees
Labels
Area/Cloud/Azure Needs info Waiting for information

Comments

@manneymc
Copy link

manneymc commented Sep 6, 2023

What steps did you take and what happened:

created a test schedule backup : velero create schedule velero-scheduled-test --schedule="0,15,30,45 * * * *" --include-namespaces velero-test --ttl 24h

  • backup runs a expected times, but are always marked as Failed

velero get backups
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
velero-scheduled-test-20230906104549 Failed 0 0 2023-09-06 11:45:49 +0100 BST 23h azure-uksouth
velero-scheduled-test-20230906103049 Failed 0 0 2023-09-06 11:30:50 +0100 BST 23h azure-uksouth

velero describe backup velero-scheduled-test-20230906104549
Name: velero-scheduled-test-20230906104549
Namespace: velero
Labels: velero.io/schedule-name=velero-scheduled-test
velero.io/storage-location=azure-uksouth
Annotations: velero.io/source-cluster-k8s-gitversion=v1.25.11
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=25

  Phase:  Failed (run `velero backup logs velero-scheduled-test-20230906104549` for more information)
  
  
  Namespaces:
    Included:  velero-test
    Excluded:  <none>
  
  Resources:
    Included:        *
    Excluded:        <none>
    Cluster-scoped:  auto
  
  Label selector:  <none>
  
  Storage Location:  azure-uksouth
  
  Velero-Native Snapshot PVs:  auto
  
  TTL:  24h0m0s
  
  CSISnapshotTimeout:    10m0s
  ItemOperationTimeout:  1h0m0s
  
  Hooks:  <none>
  
  Backup Format Version:  1.1.0
  
  Started:    2023-09-06 11:45:49 +0100 BST
  Completed:  <n/a>
  
  Expiration:  2023-09-07 11:45:49 +0100 BST
  
  Total items to be backed up:  4
  Items backed up:              4
  
  Velero-Native Snapshots: <none included>
  
  velero backup logs velero-scheduled-test-20230906104549
  has no errors or warnings in it

if i check the storage account i can see that the backups are being created in the storage account successfully
however if i look at the velero pod's log in k8s

velero time="2023-09-06T10:30:52Z" level=info msg="Collected 4 items matching the backup spec from the Kubernetes API (actual number of items backed up may be more or less depending on velero.io/exclude-from-backup annotation, plugins returning additional related items to back up, etc.)" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/backup/backup.go:280" progress=
velero time="2023-09-06T10:30:52Z" level=info msg="Processing item" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/backup/backup.go:365" name=velero-test namespace= progress= resource=namespaces
velero time="2023-09-06T10:30:52Z" level=info msg="Backing up item" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/backup/item_backupper.go:173" name=velero-test namespace= resource=namespaces
velero time="2023-09-06T10:30:52Z" level=info msg="Backed up 1 items out of an estimated total of 4 (estimate will change throughout the backup)" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/backup/backup.go:405" name=velero-test namespace= progress= resource=namespaces
velero time="2023-09-06T10:30:52Z" level=info msg="Processing item" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/backup/backup.go:365" name=default namespace=velero-test progress= resource=serviceaccounts
velero time="2023-09-06T10:30:52Z" level=info msg="Backing up item" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/backup/item_backupper.go:173" name=default namespace=velero-test resource=serviceaccounts
velero time="2023-09-06T10:30:52Z" level=info msg="Executing custom action" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/backup/item_backupper.go:325" name=default namespace=velero-test resource=serviceaccounts
velero time="2023-09-06T10:30:52Z" level=info msg="Running ServiceAccountAction" backup=velero/velero-scheduled-test-20230906103049 cmd=/velero logSource="pkg/backup/service_account_action.go:77" pluginName=velero
velero time="2023-09-06T10:30:52Z" level=info msg="Done running ServiceAccountAction" backup=velero/velero-scheduled-test-20230906103049 cmd=/velero logSource="pkg/backup/service_account_action.go:120" pluginName=velero
velero time="2023-09-06T10:30:52Z" level=info msg="Backed up 2 items out of an estimated total of 4 (estimate will change throughout the backup)" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/backup/backup.go:405" name=default namespace=velero-test progress= resource=serviceaccounts
velero time="2023-09-06T10:30:52Z" level=info msg="Processing item" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/backup/backup.go:365" name=arcus-backend-service namespace=velero-test progress= resource=configmaps
velero time="2023-09-06T10:30:52Z" level=info msg="Backing up item" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/backup/item_backupper.go:173" name=arcus-backend-service namespace=velero-test resource=configmaps
velero time="2023-09-06T10:30:52Z" level=info msg="Backed up 3 items out of an estimated total of 4 (estimate will change throughout the backup)" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/backup/backup.go:405" name=arcus-backend-service namespace=velero-test progress= resource=configmaps
velero time="2023-09-06T10:30:52Z" level=info msg="Processing item" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/backup/backup.go:365" name=kube-root-ca.crt namespace=velero-test progress= resource=configmaps
velero time="2023-09-06T10:30:52Z" level=info msg="Backing up item" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/backup/item_backupper.go:173" name=kube-root-ca.crt namespace=velero-test resource=configmaps
velero time="2023-09-06T10:30:52Z" level=info msg="Backed up 4 items out of an estimated total of 4 (estimate will change throughout the backup)" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/backup/backup.go:405" name=kube-root-ca.crt namespace=velero-test progress= resource=configmaps
velero time="2023-09-06T10:30:52Z" level=info msg="Skipping resource customresourcedefinitions.apiextensions.k8s.io, because it's cluster-scoped and only specific namespaces or namespace scope types are included in the backup." backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/util/collections/includes_excludes.go:155"
velero time="2023-09-06T10:30:52Z" level=info msg="Backed up a total of 4 items" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/backup/backup.go:436" progress=
velero time="2023-09-06T10:30:52Z" level=info msg="Setting up backup store to persist the backup" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/controller/backup_controller.go:771"
velero time="2023-09-06T10:30:52Z" level=info msg="Using storage account key: true" backup=velero/velero-scheduled-test-20230906103049 cmd=/plugins/velero-plugin-for-microsoft-azure logSource="/go/src/velero-plugin-for-microsoft-azure/velero-plugin-for-microsoft-azure/object_store.go:364" pluginName=velero-plugin-for-microsoft-azure
velero time="2023-09-06T10:30:52Z" level=info msg="Plugin process exited - restarting." backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/plugin/clientmgmt/process/restartable_process.go:155"
velero time="2023-09-06T10:30:53Z" level=info msg="Using storage account key: true" backup=velero/velero-scheduled-test-20230906103049 cmd=/plugins/velero-plugin-for-microsoft-azure logSource="/go/src/velero-plugin-for-microsoft-azure/velero-plugin-for-microsoft-azure/object_store.go:364" pluginName=velero-plugin-for-microsoft-azure
velero time="2023-09-06T10:30:53Z" level=info msg="Backup completed" backup=velero/velero-scheduled-test-20230906103049 logSource="pkg/controller/backup_controller.go:785"
velero time="2023-09-06T10:30:53Z" level=error msg="backup failed" backuprequest=velero/velero-scheduled-test-20230906103049 controller=backup error="[rpc error: code = Unavailable desc = error reading from server: EOF, rpc error: code = Unavailable desc = connection error: desc = \"transport: error while dialing: dial unix /tmp/plugin2309637438: connect: connection refused\"]" logSource="pkg/controller/backup_controller.go:290"

What did you expect to happen:
the backup job to complete successfully

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

Anything else you would like to add:

the same happens even if i just create a normall backup job (ie not scheuled)

Environment:

  • Velero version (use velero version): velero:v1.11.1, installed using helm chart velero-4.1.4
  • Velero features (use velero client config get features): velero-plugin-for-microsoft-azure:v1.7.1
  • Kubernetes version (use kubectl version): 1.25.11
  • Kubernetes installer & version: aks
  • Cloud provider or hardware configuration: Microsoft Azure
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@qiuming-best
Copy link
Contributor

qiuming-best commented Sep 7, 2023

@manneymc Maybe there (1, 2) are some issues with transport: error while dialing: dial unix /tmp/plugin2309637438: connect: connection refused, the reason for the previous is killed by OOM, maybe this is one similar.

If it's not the lack of memory reason, you could provide us with more details logs using velero debug command

@qiuming-best qiuming-best self-assigned this Sep 12, 2023
@qiuming-best qiuming-best added the Needs info Waiting for information label Sep 12, 2023
@manneymc
Copy link
Author

hi thanks for replying... attached the debug output

bundle-2023-09-13-17-14-23.tar.gz

@ksudarsh00
Copy link

I am also facing the same error, Any update on this?

@qiuming-best
Copy link
Contributor

Most probably, the node-agent pod is OOM, what is needed is to increase the Memory of node-agent

@qiuming-best
Copy link
Contributor

I closed it as not responding for a long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Cloud/Azure Needs info Waiting for information
Projects
None yet
Development

No branches or pull requests

3 participants