Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider improving transcrypt's handling of large files #85

Open
jmurty opened this issue May 21, 2020 · 5 comments
Open

Consider improving transcrypt's handling of large files #85

jmurty opened this issue May 21, 2020 · 5 comments

Comments

@jmurty
Copy link
Collaborator

jmurty commented May 21, 2020

As @perost-l14 mentioned in these comments on #78 transcrypt currently does some things that hinder its use for encrypting many and/or large files.

This ticket is to draw out suggested improvements so they don't get lost in the broader discussion in #78.

In particular, as paraphrased by me (@jmurty):

  • encrypted files are Base64 encoded with the -a flag to openssl enc which takes up more space than binary data. This textual encoding may not be necessary since encrypted content isn't human-readable or diff-able as text.

  • adding -delta to encrypted file lines in .gitattributes disables git packing (delta compression) to avoid wasteful work when pushing encrypted files that aren't compressible. Example:

    *.jpg filter=crypt diff=crypt -delta
    

Would it make sense to update transcrypt to use binary data instead of base64, and set or recommend -delta in .gitattributes by default?

What would be the implications of doing these things, for both new transcrypt'ed repos and existing ones?

@ZhymabekRoman
Copy link

What is the status of this improvement? I think it's a necessary feature even for small files. Using base64 is not necessarily or required for like git repos.

@ZhymabekRoman
Copy link

I'll try to improve that. And I'll also try to optimise transcrypt. Because I have a large git repository over 500 mb and decryption is so slow.

@jmurty
Copy link
Collaborator Author

jmurty commented Feb 16, 2023

Work on improving the efficiency of transcrypt would be welcome, though be warned that using it to encrypt large amounts of data or files isn't really the expected use-case – it's intended for a few small secret files that are part of a larger repo.

That said, there might be some easy wins that would improve things without requiring a major rewrite or breaking changes.

I'd encourage you to start by looking at the building block git_clean (encrypt) and git_smudge (decrypt) functions in the script. You can run these separately to simulate the steps taken behind the scenes by Git, and testing the performance and correctness of these atomic pieces is likely to be much easier than working with a real repository.

Examples of this based on the current main branch, run within this project's repository:

# Manual and minimal transcrypt config in repository
git config --local transcrypt.cipher aes-256-cbc
git config --local transcrypt.password 'correct horse battery staple'
git config --local transcrypt.openssl-path openssl

# Decrypt the encrypted sensitive_file
cat sensitive_file | ./transcrypt smudge context=default sensitive_file

# Encrypt the decrypted sensitive_file
cat sensitive_file | ./transcrypt clean context=default sensitive_file 2>/dev/null

@natew
Copy link

natew commented Jun 27, 2024

Was just wondering, is transcrypt meant to be pretty slow? I'm noticing really slow operations slowness even on smaller files, its especially painful if you move a lot around. We have only ~300 encrypted files but it'll take my M3 pro like 5-10 minutes for some operations when moving all of them around.

Could help sponsor speeding this up if there's some interest there.

@jmurty
Copy link
Collaborator Author

jmurty commented Jul 7, 2024

Hi @natew as mentioned in prior comments, transcrypt as currently implemented isn’t intended – and isn’t good at – handling large numbers of files, or files of large size.

I don’t have time to work on this even if sponsored, and to be honest I’m not sure how much faster it could be given transcrypt is at bottom a series of bash scripts invoked by Git that in turn call other shell commands to do the work.

Perhaps someone else would be able to do some investigation? The first place I would suggest researching is whether Git’s smudge and clean commands run in series or parallel, and if not parallel already can they be made so?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants