Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create tool to vacate drive #55

Open
jayroos opened this issue Dec 15, 2017 · 14 comments
Open

Create tool to vacate drive #55

jayroos opened this issue Dec 15, 2017 · 14 comments

Comments

@jayroos
Copy link

jayroos commented Dec 15, 2017

It would be helpful to have a tool that could be used to vacate a drive in a pool to prepare it for removal or replacement. Assuming the pool has sufficient capacity to hold the files from the target drive, the utility would redistribute the files in a similar manner to mergerfs.balance.

If there is already in-built support for this, I'm happy to hear about it.

Thanks!

@trapexit
Copy link
Owner

trapexit commented Dec 15, 2017

I suppose a tool could be created. I've always just created a second pool which excludes the drive in question then rsync into that pool. Allows for picking the specific policy which may be necessary since you probably want to keep the existing policy while you continue to use the pool but want data spread differently when writing. That becomes far more complicated with an app since it'd need to replicate all that builtin logic. mergerfs.balance ignores the pool's policies and just rsyncs files around underneath mergerfs.

@jayroos
Copy link
Author

jayroos commented Dec 15, 2017

I don't know enough about how everything works, but here is one thought.

What if the vacate tool was a variant of balance that created large zero-fill temp files on the drive to be vacated which would be skipped during the balance phase. Then the cycle would look like this:

  1. Fill drive with temp files
  2. Balance
  3. Return to initial commit of fsck.mergerfs from mergerfs v2.8.0 #1 until no non-temp files remain.
  4. Exit loop and delete temp files.

One could probably do this manually, but having it all in one tool would be slick.

Maybe not the best approach, but would it work?

@trapexit
Copy link
Owner

trapexit commented Dec 15, 2017

I really don't follow. The balance tool works independent of mergerfs outside querying it for the drives. It ignores the policies the pool have completely. It's a simple way to redistribute files when you don't care at all which drive they end up on.

Vacating a drive is different. At least it always is for me. I more often than not want to obey the policies. Which means I need to use mergerfs to make the decisions. That can only be done by creating another mount and moving files into it from the drive in question.

If you don't care about where files end up then changing the balance tool to use the drive as the source rather than the drive with the most used space is easy. But in my opinion so is removing the drive from the pool and running rsync. My point is that creating a tool to do something that is somewhat custom for each user and not all that difficult to do by one's self means creating a somewhat complicated tool. I'm not opposed to creating a tool. I'm just not clear the specific usecase.

@jayroos
Copy link
Author

jayroos commented Dec 15, 2017

I'm trying to find a way to do it while "online". Removing the drive from the pool effectively takes that data "offline" until the rsync is done.

@trapexit
Copy link
Owner

trapexit commented Dec 15, 2017

That's why I said use a second pool with the drive missing then just rsync between the drive in question and the new pool.

@jayroos
Copy link
Author

jayroos commented Dec 15, 2017

Let me see if I understand:

  1. Create pool2 excluding the drive to be removed. Both pools are active simultaneously.
  2. Rsync from pool1 to pool2
  3. Remove pool2
  4. Remove drive to be removed from pool1

@trapexit
Copy link
Owner

No. Not rsync from pool1 to pool2. Rsync from excluded drive to pool2.

@trapexit
Copy link
Owner

trapexit commented Dec 15, 2017

This is generally documented in my extra docs.

https://github.com/trapexit/backup-and-recovery-howtos/blob/master/docs/recovery_(mergerfs).md

Thats to replace a drive straight up. If you want files distributed in some way requires the second pool.

@jayroos
Copy link
Author

jayroos commented Dec 15, 2017

Thanks. I missed that because I didn't think of it as recovery.

@eduncan911
Copy link

eduncan911 commented Feb 4, 2018

I would like to vote for this as well.

I've read the link above and understand how to add another drive, and move data over.

I suspect to just "vacate" a drive, I would:

  • first remove it from the pool
  • second, copy/rsync the data from the drive i just removed back into the pool via the normal mergefs endpoint (to continue the use of its policy settings)
  • third, once copy is completed, then remove the physical drive

I think the OP's original thought would be for mergefs itself to handle this, if you remove the drive. Like an option or flag to move all the data.

However, I also see the author's point: that mergefs is just a proxy service or sort. And there are these tools here in this repo for more out-of-band functionality. one could write a simple tool (and PR it here) for exactly that. maybe mergerfs.remove or mergerfs.vacate or something.

I think it would come down to this:

If mergefs is intended to be responsible for maintenance of the pool at all, then yes this feature should be added.

But right now, reading the mergefs docs and reviewing these Tools, it would seem to be that:

  • mergefs is intended to be a simple proxy, perhaps to keep the code simple (it's got a lot of files already!).
  • And maintenance of the pool remains dedicated to these tools here in this repo.

Disclaimer: I just found out about mergefs and think it finally ticks most of the boxes I've been looking for to move off of DrivePool and back to full Linux servers again. Especially for Plex and media. I'm sitting on 15 or so disks and nearly 60 TB of space, largely duplicated 2x to 3x over. I'm still reading on it but I think I am ready to take the plunge!

@celogeek
Copy link

celogeek commented Aug 4, 2020

I'm doing it right now.
I have 3 disk of 1TB and one of 2TB.
I want to split my 2 TB disk into 2 partition of 1 TB.
What I do in a non efficient way is:

  • remove the disk from the pool
  • rsync the disk of 2 TB to the pool
  • repartition the disk
  • add back the 2 partitions to the pool

during the process a part of my data is missing.
I could have done it a bit more manually by moving the data from this disk to others also.

A tools to empty a disk from a pool to the other disks of the pool could be helpful.

@trapexit
Copy link
Owner

trapexit commented Aug 4, 2020

The biggest thing that needs to be done to make it practical is the ability to get all options and create a list of them compatible for a mounting as the only good way to manage this behavior is to create a second, temporary pool without the drive in question and then rsync. Shouldn't be hard just needs to be done.

@jscoys
Copy link

jscoys commented Oct 26, 2021

Hello!

Well I understand the need here. Being a long time user of Stablebit drive pool on Windows, currently in the process of going to Linux with MergerFS it's indeed a solution that would be awesome!

In Stablebit drive pool you have a list of your disks, you click "remove" on one of them and the software manage to empty your drive (Checking at the same time if he's got enough space on the others, of course), then run the process of moving all the files from the disk you want to remove. It the meanwhile all files remain available during the whole process, ensuring king of 99.999% uptime of your storage system.

If not possible, I would say that at list the process described here...

  1. Create pool2 excluding the drive to be removed. Both pools are active simultaneously.
  2. Rsync from excluded drive to pool2
  3. Remove pool2
  4. Remove drive to be removed from pool1

...should be manage by MergerFS. Because let's say someone or a process is writing on pool1 during the step 2) and MergerFS is then writing on the disk you're trying to remove? In the end only MergerFS could say "I'm in the middle of moving files so I lock this disk in writing/adding files to it, just read only". I know that doing a small rsync after 2), like "2.5) Rsync --append-verify from excluded drive to pool2" could do the trick, but it's not a clean solution IMO, especially on disks where files are very often written.

Btw @trapexit I know you heard it a million times, but I tell you: MergerFS is awesome and well written! That's why we want to add features to it, the "price of success" :-)

@trapexit
Copy link
Owner

Well I understand the need here.

As do I. I'm in the process of doing this very thing and have done so many times over the years. The issue is that 1) mergerfs itself does not have any active logic like Stablebit does. It does not actively monitor filesystems and act. It is entirely reactive. So it would require building all that logic. And 2) I believe it is risky and bad practice to do moves like that. Often people vacate drives that are under duress or damaged. To remove data from it is riskier than simply copying.

should be manage by MergerFS

You're mistaken about the workflow. You absolutely can keep mergerfs from writing to drives. That's what the "RO" setting for branches is for. You update the main pool and set it RO. And if you're concerned about writes to existing files then remount read-only. Having mergerfs attempt to synchronously manage this workflow just doesn't make sense to me. It is perfectly straight forward to do out of band.

It is straight forward and trivial to automate the process. I've just not gotten to it as there are a bunch of other things I've been working on and this process is pretty straight forward to do manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants