Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard links are used to snapshot RRun workspace #50

Open
Xarthisius opened this issue Dec 30, 2022 · 3 comments
Open

Hard links are used to snapshot RRun workspace #50

Xarthisius opened this issue Dec 30, 2022 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@Xarthisius
Copy link
Contributor

Xarthisius commented Dec 30, 2022

We're using self.snapshotRecursive on version/workspace to create run/workspace. That's wrong cause it makes a hardlink instead of a copy. Can you guess the result?

How to reproduce?

  1. On your local machine, create a file:
    echo "This and nothing else" > /tmp/file_that_should_have_one_line
    
  2. On your local machine, create run.sh with the following content:
    #!/bin/sh
    echo "This shouldn't be here..." >> file_that_should_have_one_line
    echo "Nor this. I'm a very sad panda now." >> file_that_should_have_one_line
    
  3. Create a Tale, upload file_that_should_have_one_line and run.sh to Tale's workspace, perform Recorded Run

Actual Result

file_that_should_have_one_line in 1) Tale's active workspace, 2) version's immutable workspace and 3) RRun's workspace contains:

This and nothing else
This shouldn't be here...
Nor this. I'm a very sad panda now.

Expected Result

file_that_should_have_one_line in 1) Tale's active workspace, 2) version immutable workspace contains:

This and nothing else

whereas file_that_should_have_one_line has 3 lines in RRun's workspace.

@Xarthisius Xarthisius added the bug Something isn't working label Dec 30, 2022
@Xarthisius Xarthisius self-assigned this Dec 30, 2022
@hategan
Copy link
Collaborator

hategan commented Dec 30, 2022

That's wrong cause it makes a hardlink instead of a copy. Can you guess the result?

Hmm, once we ran tales in containers which mounted workspace through davfs. It looks like that was essential for the overall scheme to work. The design doc (https://docs.google.com/document/d/1b2xZtIYvgVXz7EVeV-C18So_a7QLGg59dPQMxvBcA5o/edit#) mentions this (and limitations thereof) explicitly. In particular, the pseudocode uses a copy operation that could be replaced with a hard link for efficiency but only if copy-on-write semantics are in place when writing to files in workspace during normal operations. Copy-on-write was more or less an accident with our webdav client, but it did fit the bill.

Under the assumption that workspace only contains source-like files, it would still be significantly more efficient to do copy-on-write when modifying workspace files than when creating versions/runs, since it is unlikely that every workspace file is modified with every version/run.

@Xarthisius
Copy link
Contributor Author

Xarthisius commented Dec 30, 2022

Yeah, that's definitely my fault. I didn't realize the implication of switching to bind mounts. I see 3 options:

  1. Revert to WebDav (which I'm doing right now to prevent data corruption in versions)
  2. Keep the usage of hardlinks in versions, but implement mounts with COW (overlayfs with syncing back works quite nicely for that)
  3. Change versions backing to something that's not using hard links (git?)

1. is definitely out of the question long term. I'm only bringing it back as an emergency fix. We can ponder 2 or 3 once the dust settles.

@hategan
Copy link
Collaborator

hategan commented Dec 30, 2022

Yeah, that's definitely my fault. I didn't realize the implication of switching to bind mounts.

Not my point. Because I didn't see it either. And we might want to leave a warning sign in a more conspicuous place than some dusty doc in the googles.

[...]

  1. Revert to WebDav (which I'm doing right now to prevent data corruption in versions)
  2. Keep the usage of hardlinks in versions, but implement mounts with COW (overlayfs with syncing back works quite nicely for that)
  3. Change versions backing to something that's not using hard links (git?)

1. is definitely out of the question long term. I'm only bringing it back as an emergency fix. We can ponder 2 or 3 once the dust settles.

  1. Use copy instead of hard link in snapshotRecursive.
  2. Use (4) when creating a run, but hard link for snapshotting versions.

I agree that (1) is not ideal long term. I like (2). We are ultimately doing trickery for performance reasons. So, while using git as a backing store might work, I'm not sure (3) will work well. Even the current scheme is not quite right, since every snapshot is O(n) in the number of files in the workspace even if no file has been touched, but any solution based on a generic FS will probably have to be at least that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants