Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProjFS + Copy-on-Write #96

Open
forrestthewoods opened this issue Jul 9, 2024 · 2 comments
Open

ProjFS + Copy-on-Write #96

forrestthewoods opened this issue Jul 9, 2024 · 2 comments

Comments

@forrestthewoods
Copy link

Microsoft DevDrive supports copy-on-write semantics which is super cool. I'd like combine ProjFS + CoW so that large binary assets can be shared.

Currently the way ProjFS works is you call PrjWriteFileData. What I'd like to do instead is perform a block clone.

Is there any way to do this today? I believe the answer is no. But I wanted to check.

In case it helps, my intended use case is an experimental version control system. I'd like to support monorepos with very large files which are deduplicated via block cloning.

Thanks!

@cgallred
Copy link
Contributor

It isn't quite clear to me what you're doing.

  • Are you thinking of using a block clone to recall the data instead of PrjWriteFileData?
  • Or are you thinking of using block clone to preserve deduplication state that is in the backing store? That is, your backing store contains large files that have been deduplicated, and when you open such a file on the client you want the provider application to be able to preserve that deduplication on the client?

@forrestthewoods
Copy link
Author

It isn't quite clear to me what you're doing.

Imagine a version control system with a CAS cache. The cache may contain a very large, multi-tens of gigabyte file containing AI model weights. I don't want to make a copy of those bytes on the projected client. Similarly, consider a cache containing large files like LLVM toolchains. I want multiples repos to be deduplicated and share the same bytes from the CAS cache. And I want a virtual file system so that large repos only need to remotely fetch files for the cache if and as they are needed.

Does that help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants