Backup not functional #2270

JimMadge · 2024-10-29T11:15:47Z

✅ Checklist

I have searched open and closed issues for duplicates.
This is a problem observed when managing a Data Safe Haven.
I can reproduce this with the latest version.
I have read through the documentation.
This isn't an open-ended question (open a discussion if it is).

💻 System information

Operating System:
Data Safe Haven version: 5.0.1

📦 Packages

List of packages

Paste list of packages here

🚫 Describe the problem

Backup not functional, workaround in docs doesn't work.

A major problem here is, the storage account types we have chosen (for good reasons of performance, ability to upload/download to/from) are not compatible with Azure backup services.

🚂 Workarounds or solutions

Because of the incompatibility, I think our two options are

Change the type of storage accounts we use to ones which are compatible
- This will mean we loose nice features like performance, nice integration with NFS, upload/download using Azure storage explorer
- Quite a significant infrastructure change to make
Implement our own backup
- Requires writing/supporting our own solution
- There are off-the-shelf tools we could use (I think Borg/Borgmatic)
- Should be fairly simple to implement with a VM or container, as we can mount all of our storage as NFS
- I have a proposal for how this would look using Borg/Borgmatic in the comments below

The text was updated successfully, but these errors were encountered:

JimMadge · 2024-10-29T14:34:57Z

Following the docs, fixing the backup instance fails (after 22 minutes 😱)

Fixing protection error for BlobBackupSensitiveData
UserErrorMissingRequiredPermissions: Appropriate permissions to perform the operation is missing.

Possibly because my user is missing some backup roles, I am an Owner and Storage Blob Data Owner though.

JimMadge · 2024-10-29T14:45:59Z

The problem may be that the backup vault doesn't have the correct role assigned in the target storage account.

JimMadge · 2024-10-29T15:07:49Z

Assigning the needed role to the backup vault fixed that issue.
Should be easy to do in Pulumi.

Next problem is

UserErrorUnsupportedStorageAccountType: The storage account type is not supported for backup.

JimMadge · 2024-10-29T15:10:28Z

Storage account kind is BlockBlobStorage

operational backup supports block blobs in standard general-purpose v2 storage accounts only

You can back up only block blobs in a standard general-purpose v2 storage account using the vaulted backup solution for blobs.

from here

JimMadge · 2024-10-29T15:35:18Z

Is this the correct storage account. The target is sensitivedata which has the inputs and outputs. That feels like the least important thing to backup. In fact, I could see a strong case for not making copies of the input data when you are acting as a data processor.

jemrobinson · 2024-10-29T15:38:40Z

I don't think we're ever acting as a data processor - pretty much all useful research is going to involve the researchers making significant-enough decisions that they are a data controller.

@Davsarper : can you remember what we should be backing up for DSPT-compatibility?

Davsarper · 2024-10-29T15:56:06Z

Agreed on us being data controllers.

So we refer to backups when asked about our business continuity plan, and currently answer that after failure we would recover 'as much as possible'. I don't see on first instance a hard description of what needs to be recovered for our organisation.

I think critical things to recover would be those necessary for offering whichever (healthcare) services, since we do not then it might be relatively up to us to decide what is key to backup.

I will have a read on business continuity plans, which we should develop https://www.digitalcarehub.co.uk/resource/creating-and-testing-a-business-continuity-plan-for-data-and-cyber-security/

JimMadge · 2024-10-30T09:48:18Z

I think we should be careful not to focus too much on us. I think the use case for a ephemeral TRE strictly for data processing is strong.

I feel we might have tried to backup the inverse of what we really want,

Backup
- working data (/home, /shared)
- databases
- state data? (like container data)
Don't backup
- input data
- staged outputs

JimMadge · 2024-10-30T10:42:49Z

Also,

Azure Backup currently doesn't support NFS shares.

🫠

JimMadge · 2024-10-30T10:55:17Z

So, backing up either the block blobs or NFS shares will require either,

Major infra changes, using different storage account and container/share types to use Azure Backup Vaults and Recovery Services
Implementing our own solution

My feeling is backing up to some redundant storage using borg is the most flexible and easiest to implement.
The biggest downside is restoring backups would involve running commands rather than clicking buttons in the portal.

JimMadge · 2024-10-31T09:03:48Z

Maybe the best way to implement our own is,

Container or VM (configured by templated files or Ansible)
- user data (/home, /shared) mounted as rw
- backup share mounted as rw
- Borgmatic + Borg
  - configured to make incremental, encrypted backups every x hours
  - retention rules for e.g. 6 monthly checkpoints, 4 weekly checkpoints, 7 daily checkpoints, 24 hourly checkpoints
- Script templates/commands for backup restoration

That way you don't need to worry about multiple workspaces running conflicting backup jobs.

JimMadge · 2024-10-31T09:47:34Z

@jemrobinson @craddm I think we need to reach a consensus for the change we want to make here.

jemrobinson · 2024-10-31T10:38:20Z

I would like to see all the data/state necessary to recreate the environment backed up.

Imagine, for example, that an SRE has been compromised by a ransomware attack 4 years into a 5 year project. Would our backups be sufficient to deploy an identical (or updated) new environment that results in only days/weeks of work being lost? We should also ensure that we have tiered backups (e.g. X days of daily backups + Y weeks of weekly backups + Z months of monthly backups) in case of longer-term problems.

I'm agnostic as to the method used to achieve this: using Azure built-in solutions would be nice, but it's more important to have something that works than something with a point-and-click interface.

@martintoreilly do you agree with this?

JimMadge · 2024-10-31T11:19:12Z

Agreed, in terms of borg+borgmatic as a solution,

Backups are incremental, encrypted and hashed. That gives good protection over that data becoming corrupted (you can schedule regular checks) while reducing the space needed.
There are flexible and convenient retention rules, so declaring something like keep three annual snapshots, the last 12 monthly snapshots, ... is easy.

For restoration/disaster recovery. I think what we need to backup is /mnt/shared, /home and possibly /mnt/input, /mnt_output. As we can mount all of those into a VM I think that should be easy. I think all of the other data is not needed, either we can recreate it with desired state or it is cache.

JimMadge · 2024-11-11T10:55:26Z

@craddm @jemrobinson Any objection before I start along the lines I've posted here?

jemrobinson · 2024-11-11T11:02:50Z

Sounds good to me. It would be worth thinking how we could backup/restore /mnt/input since this is mounted as read-only. Possibly we'd restore it to /shared and then manually move it across with storage explorer?

JimMadge · 2024-11-12T11:47:32Z

For container instances, it is possible to mount Azure file shares but not blobs.

jemrobinson · 2024-11-12T11:48:54Z

This doesn't work for NFS file shares - it's just the standard kind that you can browse in the portal or with Storage Explorer.

JimMadge · 2024-11-12T11:57:20Z

In that case, I think the best way forward is to add a VM and configure with cloud-init and Ansible.
We already know how to mount all of the storage that way.

The workload should fit a small burstable size.

JimMadge added bug Problem when deploying a Data Safe Haven. hotfix An issue that should be fixed on a hotfix branch, with a point release labels Oct 29, 2024

JimMadge changed the title ~~<short description of issue>~~ Backup not functional Oct 29, 2024

JimMadge self-assigned this Oct 29, 2024

JimMadge linked a pull request Oct 30, 2024 that will close this issue

Fix backup functionality #2272

Draft

3 tasks

JimMadge removed the hotfix An issue that should be fixed on a hotfix branch, with a point release label Nov 12, 2024

JimMadge added this to the Release 5.1.0 milestone Nov 12, 2024

JimMadge moved this to Ready to Work in Data Safe Haven Nov 18, 2024

JimMadge added this to Data Safe Haven Nov 18, 2024

JimMadge mentioned this issue Nov 18, 2024

Restore backup folder #2100

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backup not functional #2270

Backup not functional #2270

JimMadge commented Oct 29, 2024 •

edited

Loading

JimMadge commented Oct 29, 2024 •

edited

Loading

JimMadge commented Oct 29, 2024

JimMadge commented Oct 29, 2024

JimMadge commented Oct 29, 2024

JimMadge commented Oct 29, 2024

jemrobinson commented Oct 29, 2024

Davsarper commented Oct 29, 2024

JimMadge commented Oct 30, 2024

JimMadge commented Oct 30, 2024

JimMadge commented Oct 30, 2024

JimMadge commented Oct 31, 2024

JimMadge commented Oct 31, 2024

jemrobinson commented Oct 31, 2024 •

edited

Loading

JimMadge commented Oct 31, 2024

JimMadge commented Nov 11, 2024

jemrobinson commented Nov 11, 2024

JimMadge commented Nov 12, 2024

jemrobinson commented Nov 12, 2024

JimMadge commented Nov 12, 2024

Backup not functional #2270

Backup not functional #2270

Comments

JimMadge commented Oct 29, 2024 • edited Loading

✅ Checklist

💻 System information

📦 Packages

🚫 Describe the problem

🚂 Workarounds or solutions

JimMadge commented Oct 29, 2024 • edited Loading

JimMadge commented Oct 29, 2024

JimMadge commented Oct 29, 2024

JimMadge commented Oct 29, 2024

JimMadge commented Oct 29, 2024

jemrobinson commented Oct 29, 2024

Davsarper commented Oct 29, 2024

JimMadge commented Oct 30, 2024

JimMadge commented Oct 30, 2024

JimMadge commented Oct 30, 2024

JimMadge commented Oct 31, 2024

JimMadge commented Oct 31, 2024

jemrobinson commented Oct 31, 2024 • edited Loading

JimMadge commented Oct 31, 2024

JimMadge commented Nov 11, 2024

jemrobinson commented Nov 11, 2024

JimMadge commented Nov 12, 2024

jemrobinson commented Nov 12, 2024

JimMadge commented Nov 12, 2024

JimMadge commented Oct 29, 2024 •

edited

Loading

JimMadge commented Oct 29, 2024 •

edited

Loading

jemrobinson commented Oct 31, 2024 •

edited

Loading