Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore From S3 Backup Doesn't Work #333

Open
lots0logs opened this issue Mar 11, 2022 · 0 comments
Open

Restore From S3 Backup Doesn't Work #333

lots0logs opened this issue Mar 11, 2022 · 0 comments

Comments

@lots0logs
Copy link

lots0logs commented Mar 11, 2022

Using v1.3.0 of the provider. The snapshot exists on s3 and the zip includes the rke statefile. Here is the output with debug enabled for the provider:

Error: 
============= RKE outputs ==============
time="2022-03-11T20:53:08Z" level=info msg="[rke_provider] rke cluster changed arguments: map[addons:true nodes:true restore:true]"
time="2022-03-11T20:53:08Z" level=debug msg="[rke_provider] addons values old: ---
apiVersion: v1
kind: Secret
metadata:
    name: digitalocean
    namespace: kube-system
stringData:
    access-token: "*redacted plain-text key*"
 new: ---
apiVersion: v1
kind: Secret
metadata:
    name: digitalocean
    namespace: kube-system
stringData:
    access-token: "*redacted plain-text key*"
"
time="2022-03-11T20:53:08Z" level=debug msg="[rke_provider] nodes values old: [map[address:*redacted* docker_socket: hostname_override: internal_address: labels:map[] node_name: port: role:[controlplane etcd worker] roles: ssh_agent_auth:false ssh_cert: ssh_cert_path: ssh_key: ssh_key_path: taints:[] user:*redacted*]] new: [map[address:*redacted* docker_socket: hostname_override: internal_address:*redacted* labels:map[] node_name: port:*redacted* role:[controlplane etcd worker] roles: ssh_agent_auth:false ssh_cert: ssh_cert_path: ssh_key:*redacted*
 ssh_key_path: taints:[] user:*redacted*]]"
time="2022-03-11T20:53:08Z" level=debug msg="[rke_provider] restore values old: [map[restore:false snapshot_name:]] new: [map[restore:true snapshot_name:2022-03-11T00:03:29Z_etcd.zip]]"
time="2022-03-11T20:53:08Z" level=info msg="Updating RKE cluster..."
time="2022-03-11T20:53:08Z" level=debug msg="audit log policy found in cluster.yml"
time="2022-03-11T20:53:09Z" level=info msg="Checking if state file is included in snapshot file for [2022-03-11T00:03:29Z_etcd.zip]"
time="2022-03-11T20:53:09Z" level=debug msg="No DNS provider configured, setting default based on cluster version [1.21.7-rancher1-1]"
time="2022-03-11T20:53:09Z" level=debug msg="DNS provider set to [coredns]"
time="2022-03-11T20:53:09Z" level=debug msg="Checking if cluster version [1.21.7-rancher1-1] needs to have kube-api audit log enabled"
time="2022-03-11T20:53:09Z" level=debug msg="Cluster version [1.21.7-rancher1-1] needs to have kube-api audit log enabled"
time="2022-03-11T20:53:09Z" level=debug msg="Enabling kube-api audit log for cluster version [v1.21.7-rancher1-1]"
time="2022-03-11T20:53:09Z" level=debug msg="Host: *redacted* has role: controlplane"
time="2022-03-11T20:53:09Z" level=debug msg="Host: *redacted* has role: etcd"
time="2022-03-11T20:53:09Z" level=debug msg="Host: *redacted* has role: worker"
time="2022-03-11T20:53:09Z" level=info msg="[dialer] Setup tunnel for host [*redacted*]"
time="2022-03-11T20:53:09Z" level=debug msg="Connecting to Docker API for host [*redacted*]"
time="2022-03-11T20:53:09Z" level=debug msg="Docker Info found for host [*redacted*]: types.Info{ID:"*redacted*", Containers:1, ContainersRunning:0, ContainersPaused:0, ContainersStopped:1, Images:1, Driver:"overlay2", DriverStatus:[][2]string{[2]string{"Backing Filesystem", "extfs"}, [2]string{"Supports d_type", "true"}, [2]string{"Native Overlay Diff", "true"}, [2]string{"userxattr", "false"}}, SystemStatus:[][2]string(nil), Plugins:types.PluginsInfo{Volume:[]string{"local"}, Network:[]string{"bridge", "host", "ipvlan", "macvlan", "null", "overlay"}, Authorization:[]string(nil), Log:[]string{"awslogs", "fluentd", "gcplogs", "gelf", "journald", "json-file", "local", "logentries", "splunk", "syslog"}}, MemoryLimit:true, SwapLimit:true, KernelMemory:true, KernelMemoryTCP:true, CPUCfsPeriod:true, CPUCfsQuota:true, CPUShares:true, CPUSet:true, PidsLimit:true, IPv4Forwarding:true, BridgeNfIptables:true, BridgeNfIP6tables:true, Debug:false, NFd:25, OomKillDisable:true, NGoroutines:34, SystemTime:"2022-03-11T20:53:09.937546774Z", LoggingDriver:"json-file", CgroupDriver:"cgroupfs", CgroupVersion:"1", NEventsListener:0, KernelVersion:"*redacted*", OperatingSystem:"*redacted*", OSVersion:"*redacted*", OSType:"linux", Architecture:"x86_64", IndexServerAddress:"https://index.docker.io/v1/", RegistryConfig:(*registry.ServiceConfig)(0xc000958070), NCPU:4, MemTotal:8343859200, GenericResources:[]swarm.GenericResource(nil), Docker*redacted*Dir:"/var/lib/docker", HTTPProxy:"", HTTPSProxy:"", NoProxy:"", Name:"*redacted*", Labels:[]string{}, ExperimentalBuild:false, ServerVersion:"*redacted*", ClusterStore:"", ClusterAdvertise:"", Runtimes:map[string]types.Runtime{"io.containerd.runc.v2":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}, "io.containerd.runtime.v1.linux":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}, "runc":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}}, DefaultRuntime:"runc", Swarm:swarm.Info{NodeID:"", NodeAddr:"", LocalNodeState:"inactive", ControlAvailable:false, Error:"", RemoteManagers:[]swarm.Peer(nil), Nodes:0, Managers:0, Cluster:(*swarm.ClusterInfo)(nil), Warnings:[]string(nil)}, LiveRestoreEnabled:false, Isolation:"", InitBinary:"docker-init", ContainerdCommit:types.Commit{ID:"*redacted*", Expected:"*redacted*"}, RuncCommit:types.Commit{ID:"*redacted*", Expected:"*redacted*"}, InitCommit:types.Commit{ID:"*redacted*", Expected:"*redacted*"}, SecurityOptions:[]string{"name=*redacted*", "name=*redacted*,profile=*redacted*"}, ProductLicense:"", DefaultAddressPools:[]types.NetworkAddressPool(nil), Warnings:[]string(nil)}"
time="2022-03-11T20:53:09Z" level=debug msg="Extracted version [v0.1.78] from image [rancher/rke-tools:v0.1.78]"
time="2022-03-11T20:53:09Z" level=debug msg="Extracted version [v0.1.78] from image [rancher/rke-tools:v0.1.78]"
time="2022-03-11T20:53:09Z" level=debug msg="[etcd] Image used for etcd snapshot is: [rancher/rke-tools:v0.1.78]"
time="2022-03-11T20:53:09Z" level=info msg="[etcd] Snapshots configured to S3 compatible backend at [sfo3.digitaloceanspaces.com] to bucket [*redacted*] using accesskey [*redacted plain-text key*] and using region [sfo3]"
time="2022-03-11T20:53:09Z" level=debug msg="[remove/etcd-extract-statefile] Checking if container is running on host [*redacted*]"
time="2022-03-11T20:53:09Z" level=debug msg="[remove/etcd-extract-statefile] Removing container on host [*redacted*]"
time="2022-03-11T20:53:09Z" level=info msg="Removing container [etcd-extract-statefile] on host [*redacted*], try #1"
time="2022-03-11T20:53:10Z" level=info msg="[remove/etcd-extract-statefile] Successfully removed container on host [*redacted*]"
time="2022-03-11T20:53:10Z" level=debug msg="Checking if image [rancher/rke-tools:v0.1.78] exists on host [*redacted*], try #1"
time="2022-03-11T20:53:10Z" level=info msg="Image [rancher/rke-tools:v0.1.78] exists on host [*redacted*]"
time="2022-03-11T20:53:10Z" level=info msg="Starting container [etcd-extract-statefile] on host [*redacted*], try #1"
time="2022-03-11T20:53:10Z" level=info msg="Successfully started [etcd-extract-statefile] container on host [*redacted*]"
time="2022-03-11T20:53:10Z" level=info msg="Waiting for [etcd-extract-statefile] container to exit on host [*redacted*]"
time="2022-03-11T20:53:10Z" level=info msg="Waiting for [etcd-extract-statefile] container to exit on host [*redacted*]"
time="2022-03-11T20:53:10Z" level=info msg="Container [etcd-extract-statefile] is still running on host [*redacted*]: stderr: [time="2022-03-11T20:53:10Z" level=info msg="invoking set s3 service client" s3-accessKey="*redacted base64-encoded key*" s3-bucketName=*redacted* s3-endpoint=sfo3.digitaloceanspaces.com s3-endpoint-ca= s3-folder= s3-region=sfo3
], stdout: []"
time="2022-03-11T20:53:11Z" level=info msg="Waiting for [etcd-extract-statefile] container to exit on host [*redacted*]"
time="2022-03-11T20:53:11Z" level=debug msg="Exit code for [etcd-extract-statefile] container on host [*redacted*] is [1]"
time="2022-03-11T20:53:11Z" level=info msg="Could not extract state file from snapshot [2022-03-11T00:03:29Z_etcd.zip] on host [*redacted*]"
time="2022-03-11T20:53:11Z" level=info msg="Could not extract state file from snapshot [2022-03-11T00:03:29Z_etcd.zip] on any host, falling back to local state file: Unable to find statefile in snapshot [2022-03-11T00:03:29Z_etcd.zip]"
time="2022-03-11T20:53:11Z" level=info msg="Restoring etcd snapshot 2022-03-11T00:03:29Z_etcd.zip"
time="2022-03-11T20:53:11Z" level=debug msg="No DNS provider configured, setting default based on cluster version [1.21.7-rancher1-1]"
time="2022-03-11T20:53:11Z" level=debug msg="DNS provider set to [coredns]"
time="2022-03-11T20:53:11Z" level=debug msg="Checking if cluster version [1.21.7-rancher1-1] needs to have kube-api audit log enabled"
time="2022-03-11T20:53:11Z" level=debug msg="Cluster version [1.21.7-rancher1-1] needs to have kube-api audit log enabled"
time="2022-03-11T20:53:11Z" level=debug msg="Enabling kube-api audit log for cluster version [v1.21.7-rancher1-1]"
time="2022-03-11T20:53:11Z" level=debug msg="Host: *redacted* has role: controlplane"
time="2022-03-11T20:53:11Z" level=debug msg="Host: *redacted* has role: etcd"
time="2022-03-11T20:53:11Z" level=debug msg="Host: *redacted* has role: worker"
time="2022-03-11T20:53:11Z" level=debug msg="[state] previous state found, this is not a legacy cluster"
time="2022-03-11T20:53:11Z" level=info msg="Successfully Deployed state file at [/terraform/rke-cluster/prod/terraform-provider-rke-tmp-089023307/cluster.rkestate]"
time="2022-03-11T20:53:11Z" level=info msg="[dialer] Setup tunnel for host [*redacted*]"
time="2022-03-11T20:53:11Z" level=debug msg="Connecting to Docker API for host [*redacted*]"
time="2022-03-11T20:53:12Z" level=debug msg="Docker Info found for host [*redacted*]: types.Info{ID:"*redacted*", Containers:1, ContainersRunning:0, ContainersPaused:0, ContainersStopped:1, Images:1, Driver:"overlay2", DriverStatus:[][2]string{[2]string{"Backing Filesystem", "extfs"}, [2]string{"Supports d_type", "true"}, [2]string{"Native Overlay Diff", "true"}, [2]string{"userxattr", "false"}}, SystemStatus:[][2]string(nil), Plugins:types.PluginsInfo{Volume:[]string{"local"}, Network:[]string{"bridge", "host", "ipvlan", "macvlan", "null", "overlay"}, Authorization:[]string(nil), Log:[]string{"awslogs", "fluentd", "gcplogs", "gelf", "journald", "json-file", "local", "logentries", "splunk", "syslog"}}, MemoryLimit:true, SwapLimit:true, KernelMemory:true, KernelMemoryTCP:true, CPUCfsPeriod:true, CPUCfsQuota:true, CPUShares:true, CPUSet:true, PidsLimit:true, IPv4Forwarding:true, BridgeNfIptables:true, BridgeNfIP6tables:true, Debug:false, NFd:26, OomKillDisable:true, NGoroutines:35, SystemTime:"2022-03-11T20:53:12.256869829Z", LoggingDriver:"json-file", CgroupDriver:"cgroupfs", CgroupVersion:"1", NEventsListener:0, KernelVersion:"*redacted*", OperatingSystem:"*redacted*", OSVersion:"*redacted*", OSType:"linux", Architecture:"x86_64", IndexServerAddress:"https://index.docker.io/v1/", RegistryConfig:(*registry.ServiceConfig)(0xc000958070), NCPU:4, MemTotal:8343859200, GenericResources:[]swarm.GenericResource(nil), Docker*redacted*Dir:"/var/lib/docker", HTTPProxy:"", HTTPSProxy:"", NoProxy:"", Name:"*redacted*", Labels:[]string{}, ExperimentalBuild:false, ServerVersion:"*redacted*", ClusterStore:"", ClusterAdvertise:"", Runtimes:map[string]types.Runtime{"io.containerd.runc.v2":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}, "io.containerd.runtime.v1.linux":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}, "runc":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}}, DefaultRuntime:"runc", Swarm:swarm.Info{NodeID:"", NodeAddr:"", LocalNodeState:"inactive", ControlAvailable:false, Error:"", RemoteManagers:[]swarm.Peer(nil), Nodes:0, Managers:0, Cluster:(*swarm.ClusterInfo)(nil), Warnings:[]string(nil)}, LiveRestoreEnabled:false, Isolation:"", InitBinary:"docker-init", ContainerdCommit:types.Commit{ID:"*redacted*", Expected:"*redacted*"}, RuncCommit:types.Commit{ID:"*redacted*", Expected:"*redacted*"}, InitCommit:types.Commit{ID:"*redacted*", Expected:"*redacted*"}, SecurityOptions:[]string{"name=*redacted*", "name=*redacted*,profile=*redacted*"}, ProductLicense:"", DefaultAddressPools:[]types.NetworkAddressPool(nil), Warnings:[]string(nil)}"
time="2022-03-11T20:53:12Z" level=info msg="Checking if container [cert-deployer] is running on host [*redacted*], try #1"
time="2022-03-11T20:53:12Z" level=debug msg="Checking if image [rancher/rke-tools:v0.1.78] exists on host [*redacted*], try #1"
time="2022-03-11T20:53:12Z" level=info msg="Image [rancher/rke-tools:v0.1.78] exists on host [*redacted*]"
time="2022-03-11T20:53:12Z" level=info msg="Starting container [cert-deployer] on host [*redacted*], try #1"
time="2022-03-11T20:53:12Z" level=debug msg="[certificates] Successfully started Certificate deployer container: cert-deployer"
time="2022-03-11T20:53:12Z" level=info msg="Checking if container [cert-deployer] is running on host [*redacted*], try #1"
time="2022-03-11T20:53:17Z" level=info msg="Checking if container [cert-deployer] is running on host [*redacted*], try #1"
time="2022-03-11T20:53:17Z" level=info msg="Removing container [cert-deployer] on host [*redacted*], try #1"
time="2022-03-11T20:53:17Z" level=debug msg="Extracted version [v0.1.78] from image [rancher/rke-tools:v0.1.78]"
time="2022-03-11T20:53:17Z" level=debug msg="Extracted version [v0.1.78] from image [rancher/rke-tools:v0.1.78]"
time="2022-03-11T20:53:17Z" level=debug msg="[etcd] Image used for etcd snapshot is: [rancher/rke-tools:v0.1.78]"
time="2022-03-11T20:53:17Z" level=info msg="[etcd] etcd s3 backup configuration found, will use s3 as source"
time="2022-03-11T20:53:17Z" level=info msg="[etcd] Snapshot [2022-03-11T00:03:29Z_etcd.zip] will be downloaded on host [*redacted*] from S3 compatible backend at [sfo3.digitaloceanspaces.com] from bucket [*redacted*] using accesskey [*redacted plain-text key*] and using region [sfo3]"
time="2022-03-11T20:53:17Z" level=debug msg="[remove/etcd-download-backup] Checking if container is running on host [*redacted*]"
time="2022-03-11T20:53:17Z" level=debug msg="[remove/etcd-download-backup] Container doesn't exist on host [*redacted*]"
time="2022-03-11T20:53:17Z" level=debug msg="Checking if image [rancher/rke-tools:v0.1.78] exists on host [*redacted*], try #1"
time="2022-03-11T20:53:17Z" level=info msg="Image [rancher/rke-tools:v0.1.78] exists on host [*redacted*]"
time="2022-03-11T20:53:18Z" level=info msg="Starting container [etcd-download-backup] on host [*redacted*], try #1"
time="2022-03-11T20:53:18Z" level=info msg="[etcd] Successfully started [etcd-download-backup] container on host [*redacted*]"
time="2022-03-11T20:53:18Z" level=info msg="Waiting for [etcd-download-backup] container to exit on host [*redacted*]"
time="2022-03-11T20:53:18Z" level=info msg="Container [etcd-download-backup] is still running on host [*redacted*]: stderr: [time="2022-03-11T20:53:18Z" level=info msg="invoking set s3 service client" s3-accessKey="*redacted base64-encoded key*" s3-bucketName=*redacted* s3-endpoint=sfo3.digitaloceanspaces.com s3-endpoint-ca= s3-folder= s3-region=sfo3
], stdout: []"
time="2022-03-11T20:53:19Z" level=info msg="Waiting for [etcd-download-backup] container to exit on host [*redacted*]"
time="2022-03-11T20:53:19Z" level=debug msg="Exit code for [etcd-download-backup] container on host [*redacted*] is [1]"
time="2022-03-11T20:53:19Z" level=info msg="Removing container [etcd-download-backup] on host [*redacted*], try #1"

Failed restoring cluster err:Failed to download etcd snapshot from s3, exit code [1]: time="2022-03-11T20:53:18Z" level=fatal msg="failed to download s3 backup: no backups found"

cc: @snasovich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant