Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: support version_aware ? #207

Open
efiop opened this issue Mar 15, 2023 · 5 comments
Open

fs: support version_aware ? #207

efiop opened this issue Mar 15, 2023 · 5 comments

Comments

@efiop
Copy link
Contributor

efiop commented Mar 15, 2023

Currently, gitfs works on one single revision, but we could totally make version_aware version (similar to s3fs, gcsfs, adlfs) and support revisions as version_id. The implementation is fairly straightforward (just use Tree for a particular version_id and the rest is the same). This seems to make a lot of sense in dvc context of unifying get/import with get-url/import-url and gettind rid of DependencyRepo.

@skshetry
Copy link
Member

What’s the difference between opening a new filesystem and this?

@efiop
Copy link
Contributor Author

efiop commented Mar 15, 2023

@skshetry No difference, but possible to use in one place. Similar to how s3fs/etc supports it directly and not as a new fs instance (though that obviously doesn't make any sense for them since only files can be versioned and not directories).

Having it all in one fs will allow us to treat it as we treat versioned filesystems. For example, to check if updates are available we could do it the same way we do it for version_aware dependencies through the same filesystem instead of having to create 2.

@efiop
Copy link
Contributor Author

efiop commented Mar 15, 2023

Similar to how s3fs/etc supports it directly and not as a new fs instance

Obviously this is kinda awkward when you start dealing with it initially, but it makes more sense the more you use it. Since we have s3fs/gcsfs/adlfs already and have to deal with those, it makes sense to consider the same for gitfs since they have a lot of similarities.

Maybe it could be done with two filesystems GitTreeFileSystem(like our current one) and GitFileSystem(the one that works on the whole git repo and so version_aware makes sense for it). We were also talking about GitIndexFileSystem before, but that's a whole another story 🙂

@skshetry
Copy link
Member

With version_aware in gitfs, we are effectively talking about a completely new revision/instance.

@skshetry
Copy link
Member

Maybe it could be done with two filesystems GitTreeFileSystem(like our current one) and GitFileSystem(the one that works on the whole git repo and so version_aware makes sense for it).

I have been thinking something similar on dvcfs side: TheOneFileSystem that can traverse between multiple revisions. But I just want to avoid filesystems in general. It already has gone too far. :)

efiop added a commit to efiop/dvc that referenced this issue Mar 15, 2023
Stepping stone to simplifying `dvc fetch/pull` by using index.

Fetch handles regular imports through index already, but not repo
imports because their processing is much more involved (e.g. chained
imports) in the current arch.

With `FileStorage` support introduced into `DataIndex` and `datafs`
supporting imports overall, `dvcfs` can now handle repo imports (even
chained ones). This will soon allow us to handle repo imports the same
way we handle regular ones, improve performance and get rid of a lot of
messy code (e.g. DependencyRepo).

Related iterative/scmrepo#207
Related iterative/dvc-data#315
Related https://github.com/iterative/studio/issues/5261
efiop added a commit to efiop/dvc that referenced this issue Mar 15, 2023
Stepping stone to simplifying `dvc fetch/pull` by using index.

Fetch handles regular imports through index already, but not repo
imports because their processing is much more involved (e.g. chained
imports) in the current arch.

With `FileStorage` support introduced into `DataIndex` and `datafs`
supporting imports overall, `dvcfs` can now handle repo imports (even
chained ones). This will soon allow us to handle repo imports the same
way we handle regular ones, improve performance and get rid of a lot of
messy code (e.g. DependencyRepo).

Related iterative/scmrepo#207
Related iterative/dvc-data#315
Related https://github.com/iterative/studio/issues/5261
efiop added a commit to efiop/dvc that referenced this issue Mar 15, 2023
Stepping stone to simplifying `dvc fetch/pull` by using index.

Fetch handles regular imports through index already, but not repo
imports because their processing is much more involved (e.g. chained
imports) in the current arch.

With `FileStorage` support introduced into `DataIndex` and `datafs`
supporting imports overall, `dvcfs` can now handle repo imports (even
chained ones). This will soon allow us to handle repo imports the same
way we handle regular ones, improve performance and get rid of a lot of
messy code (e.g. DependencyRepo).

Related iterative/scmrepo#207
Related iterative/dvc-data#315
Related https://github.com/iterative/studio/issues/5261
efiop added a commit to iterative/dvc that referenced this issue Mar 15, 2023
Stepping stone to simplifying `dvc fetch/pull` by using index.

Fetch handles regular imports through index already, but not repo
imports because their processing is much more involved (e.g. chained
imports) in the current arch.

With `FileStorage` support introduced into `DataIndex` and `datafs`
supporting imports overall, `dvcfs` can now handle repo imports (even
chained ones). This will soon allow us to handle repo imports the same
way we handle regular ones, improve performance and get rid of a lot of
messy code (e.g. DependencyRepo).

Related iterative/scmrepo#207
Related iterative/dvc-data#315
Related https://github.com/iterative/studio/issues/5261
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants