Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need Guidance on Backing Up Running Database #65

Open
rebaz94 opened this issue Nov 23, 2023 · 7 comments
Open

Need Guidance on Backing Up Running Database #65

rebaz94 opened this issue Nov 23, 2023 · 7 comments

Comments

@rebaz94
Copy link

rebaz94 commented Nov 23, 2023

Hey there,

I wanted to start by saying a big thank you for your library—it's been a real game-changer for us! The speed it provides is just incredible.

I'd love to know the best way to backup the database while it's running. Can you share some guidance or tips on how we can ensure a proper backup process without disrupting the ongoing operations? we're wondering if it's possible to copy the database folder directly and expect everything to work seamlessly if we restore that folder onto another machine.

Thank you

@EmmanuelOga
Copy link

I was wondering the same thing, it seems to me just copying the data folder would be a bad idea if there was still some Pogreb process writing to them. It sounds like any writer process would have to halt operations until the copy is done... so one would call db.Sync(), stop writing to the DB until a copy is done, then resume operations.

The main caveat is that depending on the data size this could take a few seconds, even on a system with a quick SSD.

Thoughts?

@rebaz94
Copy link
Author

rebaz94 commented Nov 28, 2023

Indeed, the library should offer a simple method for backing up data. It's possible to copy the database while it's running, but during restoration, it might need to rebuild the index, which could be time-consuming.

As I use it as a caching database, before backing up, I create a new database in a separate directory, switch to the new one, then proceed to upload the file. Once completed, in my case, I upload to Cloud storage, and nearly 1GB takes about 15 seconds to upload each file of the database. After the upload, I refill the database and revert back to the original one. This method works seamlessly, but it required me to write nearly 600 lines of code to manage backup and restoration.

@EmmanuelOga
Copy link

Let me see if I get it right...

You create a db TMP and switch from ORIGINAL to handle writes, while you backup ORIGINAL? Do you still read from ORIGINAL while doing the backup? It sounds like during backup you need to lookup first in TMP then in ORIGINAL to handle reads...

And finally, you need to dump anything in TMP to ORIGINAL and get back to normal. Is that right?

@rebaz94
Copy link
Author

rebaz94 commented Nov 30, 2023

I set up a TMP db to take care of both reading and writing while backing up the ORIGINAL one. While the backup's happening since TMP is empty any reads have to hit up Redis or the MySQL database because the local data won't be there. Then whatever we grab from Redis or MySQL gets cached in TMP.

Once the backup's done, I'll make sure to transfer all the data from TMP back to the original table. That way we're back to our regular setup

@EmmanuelOga
Copy link

EmmanuelOga commented Dec 1, 2023

I think I get it.

A question is if iterator would be able to continue normally in the face of inserts, deletes and updates. If it can, then maybe it would be ok to backup like this, while CRUD ops are still going on:

backupDB, err := pogreb.Open("new/db/for/backup", nil)

existingDB.sync()
it := existingDB.Items()
for {
    key, val, err := it.Next()
    if err == pogreb.ErrIterationDone {
    	break
    }
    if err != nil { 
        log.Fatal(err)
    }
    backupDB.Set(key, val)
}

I also wonder if compaction would affect this somehow. Perhaps compaction needs to be paused while iterating? ... if the above snippet could work without needing to stop writes, perhaps is a better way to backup since the backup would end up fully compacted right away.

@akrylysov
Copy link
Owner

hi!

I wanted to start by saying a big thank you for your library—it's been a real game-changer for us! The speed it provides is just incredible.

Thank you!

copying the data folder would be a bad idea if there was still some Pogreb process writing to them. It sounds like any writer process would have to halt operations until the copy is done

You are correct, you can't just copy the database files and expect the copied database to work, unless nothing is writing to the database, while the files are copied.

If the database size is not large, forcing a recovery and rebuilding the index when the backup opened for the first time doesn't sound that terrible.

A question is if iterator would be able to continue normally in the face of inserts, deletes and updates

It's safe to insert, delete or run a compaction during iteration. The only drawback is that this backup method is going to take longer compared to just copying files.

Adding a proper backup mechanism that preserves the index might be tricky, let me think about this more. I may start with adding a backup method that requires rebuilding the index on first start from a backup, which is cheaper than iterating the entire database every time to make a backup. I assume creating a backup is a more frequent operation, than restoring from a backup.

@visopsys
Copy link

visopsys commented Nov 1, 2024

I might be late in this discussion but why can't we do backing up by copying WAL? WAL is just a list of segment files and there is only 1 active segment for writing at any moment. You can copy the rest of the segment.

Even for active segment, you can query how long the segment is and read up to that location of the segment. And since segments are immutable, you can do incremental backup (you don't have to re-backup segments that were backed up before).

If we have a list of segments (aka WAL), we can reconstruct the index table from scratch. Just curious why this approach is complicated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants