Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable Security Improvements & OpenSSL 3.x Support & Python API #132

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

Erotemic
Copy link

@Erotemic Erotemic commented May 6, 2022

Closes #133 #55

Supersedes #126

Introduction

This PR seeks to increase the security of transcrypt while also maintaining backwards compatibility (two opposing goals that made this tricky).

Currently transcrypt is using some weak encryption settings that contain several vulnerabilities with design goals that make solving them challenging:

  • OpenSSL recommends encrypting with pbkdf2 which severely hinds brute force attacks, but older openssl versions we want to support do not have this option.
  • The MD5 digest is not secure, but changing it would break backwards compat.
  • Using the password to derive the salt allows for a known plaintext attack, but we need determinism, so we cannot use a random salt.
  • OpenSSL 3.x is now shipping in Ubuntu 22.04 LTS and breaks the salt format transcript depends on.

To address these issues I've added the following config / CLI variables

  • --use-pbkdf2=<0|1> lets the user enable pbkdf2 if they have it. The default is 0 for compatibility (although we should consider changing that in the future).
  • --digest=<openssl-digest> lets the user switch from MD5 to SHA512 (or any other openssl digest)
  • --salt-method=<password|configured> lets the user switch to "configured" salt mode, which will check if a .transcrypt/config exists and if it does it will read the salt from there. If it doesn't exist, we generate a random salt and write it to that file. We then add that file into the repo as plaintext.

Additionally, I've added support for these in the display, gpg import/export. Configuration salt gets re-randomized on a rekey. There is a --config-salt method the user can also use to specify their salt manually.

In addition to these security improvements I've added support for OpenSSL V3.x, which requires transcrypt to manually prepend the salt encoding before we convert to base64 and save. Decryption still works normally. I simply introduced a wrapper that checks the version.

Lastly, to test these new features I implemented a Python API around transcrypt. There are still a few things that could be improved, but the basic functionality is there. I use this along with the PythonGit, gpg_lite, and ubelt Python packages to implement new tests of these features. We could use this to publish a package in pypi that will allow users to install trancrypt via pip. Personally, I like the idea of pip install transcrypt and having it install a python package as well as the primary bash script itself.

New Features

This PR adds:

  • Bumps the transcrypt version number to 3.0.0-pre (which can be changed to whatever, but this is a major change)
  • support for OpenSSL 3.x
  • support for the user to enable pbkdf2
  • support for the user to configure a specific hash digest
  • support for a new "salt method", which defaults to "password", which is the existing scheme.
  • support for a "versioned" .transcrypt/config, containing public information that can be committed in plaintext to make the configuration for the repo transparent (although this is not strictly necessary)

Details

@elasticdog @jmurty

I have a proof of concept that reworks the transcrypt config in a backwards compatible way. This should be considered a work in progress, but it is nearly complete and feedback at this stage would be helpful.

I've also written a small Python wrapper API to that allows me to test a wide range of the above configurations to make sure they all work correctly.

The main difference from a default user's perspective is that they will now be asked to configure more settings when they run transcrypt as is. All of the new options can be specified on the command line so running:

transcrypt -c 'aes-256-cbc' -p 'correct horse battery staple' -md 'md5' --use-pbkdf2 '0' -sm 'password' -cs '' -y

would decrypt the repo (assuming the correct password) without any interaction.

I made a lot of changes to the main file in order to support this. I tried to consolidate duplicated code that needed the entire parameter configuration. Previously this wasn't so bad because it was only 2 variables. Now there are 6 and I'm not 100% set on them as they are, so I wanted to write it in such a way that it wouldn't be too hard to change the details of the scheme I'm using.

I've prefixed the new helper functions I've written with an underscore (e.g. _get_user_input) to help make writing to help make get_cipher and corresponding other functions easier. The new code for those functions looks like:

# ensure we have a cipher to encrypt with
get_cipher() {
    local prompt
    prompt=$(printf 'Encrypt using which cipher? [%s] ' "$DEFAULT_CIPHER")
    _get_user_input cipher "$DEFAULT_CIPHER" "validate_cipher" "$prompt"
}

I'm not sure if there is any security issue passing around variables by name in bash like this. My bash is pretty good, but I'm a bit shy of an expert.

There was also a place that was using "current_" instead of just "" and it wrote custom code to load the entire configuration. I think just using <varname> everywhere and putting all of the code to load the state into one function (i.e. _load_transcrypt_config_vars) that can be called wherever it is needed should also work and be easier to work with. But I wanted to flag this in case I missed something and "current_password" and "current_cipher" had those local variable names for a reason.

Note that there is also a bash helper file that is also checked in. This is just a scratchpad I was using to ensure my bash was correct. I plan to remove it once I finish this.

I still need to write more tests, clean everything up, and ensure my tab/space usage is consistent with the existing script, but any general comments on structure or obvious issues would be helpful.

Discussion

On the salt method, I'm still somewhat unhappy with this method, but I can't think of anything better. I think we need to have some partial configuration file that ships with the repo.

@Erotemic Erotemic force-pushed the enhance-configuration branch from fdee722 to a7bb0fa Compare May 6, 2022 12:02
@jmurty
Copy link
Collaborator

jmurty commented May 7, 2022

Hi @Erotemic thanks for starting work on this, I'll aim to help out when I can though I don't tend to have a lot of time.

One suggestion: could you update this PRs description with a brief summary of the specific problem(s) this PR will solve? I'm familiar with your goals from prior discussions across tickets, PRs, email etc but I think it would be helpful to give some context up front, so others can follow along and contribute.

From my perspective, this PR is working towards enabling PBKDF2 for stronger encryption similar to #126 but with a different salting mechanism to avoid potential weaknesses discussed in #55 where the strength (and value) of using PBKDF2 could be bypassed, plus some refactoring and cleanup.

@Erotemic
Copy link
Author

Erotemic commented May 7, 2022

@jmurty Thanks for taking a look.

Yes, I'll make a high level motivation write-up. I can make single issues that this PR will close.

On the technical side, my recent pushes add support for OpenSSL 3.x (which was a huge pain, they removed the salted prefix) while still being backwards compatible with 3.x.

I've refactored the bash_helpers.sh file into transcript_library.sh that contains most of the new helper functions I've written with unit tests and more documentation. Stripped versions of these functions are also in the transcrypt executable itself. It would be nice to have a set of library calls to keep the size of the main transcrypt logic smaller, but having more than one file in a bash project seems like a packaging nightmare.

I'm not sure how to run the .bats style tests, so I've written all new tests in Python. I hope adding that as a test time dependency is ok. It's easy enough to get a Python env on any CI, so I hope it's not a problem. The Python tests let me define a grid:

    basis = {
        'cipher': ['aes-256-cbc', 'aes-128-ecb'],
        'password': ['correct horse battery staple'],
        'digest': ['md5', 'sha256'],
        'use_pbkdf2': ['0', '1'],
        'salt_method': ['password', 'configured'],
        'config_salt': ['', 'mylittlecustomsalt'],
    }

And then I can execute each test (currently encryption round trip, export/import gpg key, and rekey - which covers most functionality), over each of these 32 configuration settings.

Also, while writing the Python tests, I basically ended up writing Python bindings for the transcrypt API. It might make sense to distribute this as a Python package. It also wouldn't be hard for me to get that setup without breaking compatibility if you'd like.

@Erotemic Erotemic changed the title Enhance configuration Configurable Security Improvements & OpenSSL 3.x Support May 7, 2022
@Erotemic Erotemic changed the title Configurable Security Improvements & OpenSSL 3.x Support Configurable Security Improvements & OpenSSL 3.x Support & Python API May 7, 2022
@Erotemic
Copy link
Author

Erotemic commented May 7, 2022

I think this PR is ready. Can @elasticdog or @jmurty approve this for dashboards?

@jmurty
Copy link
Collaborator

jmurty commented May 8, 2022

Hi, there's quite a lot here to digest, I'll start to work my way through it.

One thing I'm concerned about is rushing to an implementation, when we might benefit from stepping back and thinking through the algorithm and techniques in a more abstract way, to arrive at the best future-proof approach given Transcrypt's requirements and desirable characteristics (strong encryption with fast performance).

I'll open another ticket to raise this question and ask for feedback.

For example, I'm wondering if it might make sense to decouple password key derivation from the per-file encryption step, such that we pre-generate the actual key and IV from the password once – at init time – using a strong and slow technique, then provide the key (-K) and IV (-iv) to the OpenSSL enc command instead of the original password with the pbkdf2 flag? Or is there a benefit to re-deriving the key from the password using relatively slower pbkdf2 every time a file is encrypted?

@Erotemic
Copy link
Author

Erotemic commented May 8, 2022

I think a step back makes sense, having an abstracted description of the algorithm will be critical for determining the correctness of the implementation. I'll see if I can pull something together.

I think it's also worth getting feedback on the public API I've chosen. I don't think --use-pbkdf2=1 --salt-method=configured --custom-salt='' are the best CLI names in the world. This first pass was just to get somI ething clear, usable, and it should be written in a way that it won't be hard to modify names or behaviors of the flags.

I think your idea about generating the key and iv securely once and then getting faster encrypt / decrypt times is a good one. There isn't much of a downside that I can see. Security against brute force should be the same. If they have your local git config you've already lost anyway, so it would only improve speed (unless I am missing something). But the downside is that would make this implementation more complex than it is, it wouldn't fundamentally change the functionality implemented here, and I've been using pbkdf2 on my fork for over a year and I've never had issues with slowness. Perhaps it is critical for some workflows, but my thought is encrypted files aren't changing that often, so the added complexity isn't worth it here (I do need to finish up on this and move onto other tasks). But I do think that feature should be on the roadmap. It might be a good idea to brainstorm about what that roadmap though.

The other thing to note is that transcrypt will break on everyone who recently upgraded to Ubuntu 22.04 LTS. So the sooner a patch that fixes that issue goes out the better. If necessary it shouldn't be hard to port that change to a separate PR, but I'd rather not do that myself. What I will do is fix the tests and try to boil the logic down to a simplified document.

@jmurty
Copy link
Collaborator

jmurty commented May 8, 2022

The breaking OpenSSL change was a nasty surprise to me, thanks for finding and fixing that. I'll backport your fix to make it releasable sooner.

And if you could write up an issue to discuss the new PBKDF2 + salt approach as a spec, that would be great.

I'm out of my depth with the intricacies of crypto algorithms and risks / trade-offs, so the more eyes and feedback we can get the better.

@Erotemic
Copy link
Author

@elasticdog @jmurty

I've put together a draft of an algorithmic writeup and checked it into this PR under docs/algorithm.rst Please take a look here: https://github.com/Erotemic/transcrypt/blob/enhance-configuration/docs/algorithm.rst

Also, feel free to directly commit any edits to this document. I did try to focus on the algorithms, but my thinking always tends to drift towards implementation, so you may notice me doing that here and there (I always feel compelled to make the names in the implementation and docs line up).

Now that I've written this draft my thought is that we may want to morph it into some sort of "transcrypt enhancement proposal" that finalizes the details of the open question I've outlined, from which I or someone else can write the implementation that agrees with it.

I will note that I've spent quite a bit of my vacation on this, and while I do truly enjoy this sort of work, I will only have a limited amount of time to dedicate to this (among the other FOSS projects I work on) moving forward, so any help with finalizing details on:

  • The names and valid settings of the new user configurations to be added
  • What is allowed in the new "versioned" .transcrypt/config file, and what the precedence of reading/writing to key/values shared between that and .git/config is

would be helpful as I could remove the open questions, write the candidate implementation, and ensure my fork will likely match whatever will (hopefully) be accepted into the main repo would be greatly helpful to me in my day-to-day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenSSL 3 will break transcrypt
2 participants