Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 backend and some Win32 cherry-picks from upstream #28

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

willkelleher
Copy link

We're using git-fat at our company and we found the two binary mode commits necessary for the smudge filer to work on Windows platforms.

I also made a few changes that abstract the backend push/pull operations to allow for different implementations because we tend to store our files in S3.

jedbrown and others added 3 commits April 7, 2014 17:41
Python-3 needs binary mode so that it doesn't try to read into unicode
strings. Python-2 just uses bytes on Linux, but needs the mode set to binary
on Windows.

The smudge filter must also read binary because we can have files with a
"managed" extension that is not actually managed by git-fat. In that
case, we get raw binary data on stdin. It will not match our cookie, but
we must not corrupt its contents in the working tree, thus we have to
treat it as binary.
Git is encoding-agnostic in the sense that it interprets file contents,
commit messages and paths as binary. In the case of paths, this means
that the non-NUL bytes returned from readdir(2) are stored and later
passed to lstat(2) and creat(2). See git-commit(1) for details.

To be compatible with Git's mode of operation, we also use raw bytes
whenever possible. hashlib's hexdigest returns Python 'str', which we
immediately encode as ASCII so that it can be used with path component
and cleaned bytes to be committed.

Renamed variable 'bytes' to 'bytecount' due to conflict with type

Includes contributions from: Stephen Miller <[email protected]>
@@ -14,6 +14,8 @@ import itertools
import threading
import time
import collections
from boto.s3.connection import S3Connection
from boto.s3.key import Key
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be guarded: not everyone using git-fat will (want to) have boto installed.

@jedbrown
Copy link
Owner

Sorry about the silence. I'm concerned about backward compatibility for .gitfat files. If compatibility must be broken, it should happen only once ever (I think named remotes are the way to go). The test suite currently fails:

$ ./test.sh 
+ git init fat-test
Reinitialized existing Git repository in /home/jed/src/git-fat/fat-test/.git/
+ cd fat-test
+ git fat init
Traceback (most recent call last):
  File "/home/jed/bin/git-fat", line 659, in <module>
    fat = GitFat()
  File "/home/jed/bin/git-fat", line 273, in __init__
    self.backend = self.get_backend(self.objdir)
  File "/home/jed/bin/git-fat", line 304, in get_backend
    raise RuntimeError('No supported backends specified in %s' % cfgpath)
RuntimeError: No supported backends specified in /home/jed/src/git-fat/fat-test/.gitfat

@shakaran
Copy link

Some progress with this? I am trying to use S3 with git-fat but this is still not merged and with conflicts. My only option at this moment is git-media

@judgeaxl
Copy link

judgeaxl commented Dec 2, 2014

Have you considered just using s3cmd for the S3 version? It's almost equivalent to rsync, but for s3. I've got a first test in my fork. I initially started playing with boto too, but it felt like reinventing the wheel too much.

@willkelleher
Copy link
Author

@jedbrown After seeing the HN discussion, I'll work on getting this PR updated. We've made some progress on our fork since this was created.

Regarding the config file format, do you have any specific preferences?

@jedbrown
Copy link
Owner

jedbrown commented Apr 9, 2015

@willkelleher That's great news. For the config file, I really want named remotes in every way analogous to Git remotes. I think the syntax could be like

[remote "foo"]
  url = rsync://user@host/path/
[remote "bar"]
  url = s3://.......

I don't know if I've talked with you about it, but I think it's worth making a Git backend for fat files (just tagged objects so they can be pushed and pulled independently) and I feel like `url = user@host:path/storer.git" should have that meaning (for consistency with normal Git remotes).

What do you think?

@willkelleher
Copy link
Author

@jedbrown Do you mean you want to evolve git-fat into a custom Git object backend that would store the large file objects in some other backend but leave other files in the flat file object store?

I'm not very familiar with Git's support for custom backends, but it looks like libgit2 enables something relevant. How would you 'tag' the specific objects that you want to use custom storage vs. the traditional backend?

Let me know if I totally missed your point, but this sounds interesting.

@jedbrown
Copy link
Owner

@willkelleher I have noticed people using git-fat to store somewhat compressible files, such that the "dumb" object store takes a lot more space than a git packfile (and more network bandwidth). I also see an overhead associated with maintaining two access control lists (one for the Git repository and one for the fat object store). The mechanics can be done with Git's core command line tools or with libgit2. There is more discussion in Issue #1.

@magec
Copy link

magec commented Dec 3, 2015

Hi guys, im really interested on this feature, is this usable 'as is' now. I would like to migrate a current repo which uses git-fat into S3. Is this possible right now with this branch and without so much hacking??

@gitfoxi
Copy link

gitfoxi commented Dec 17, 2015

+1

@zelonght
Copy link

@magec you can try to use https://github.com/PersonifyInc/git-fat (a fork) we used it for years and we can pull/push files to S3 rather well, it can also support large file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants