Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPG advanced key management #446

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

SlugFiller
Copy link
Contributor

@SlugFiller SlugFiller commented Sep 14, 2023

This is a big one. Let me try to cover this:

  • Separated trezor-gpg init to trezor-gpg init for creating the homedir and trezor-gpg add for creating keys. Allows adding multiple keys, instead of having multiple identities for the same key.
  • Removed the abuse of the Policy URI packet for a custom marker. Like a "License URI" entry in a package manager, just because no program parses it, doesn't mean it's free for arbitrary use. It also only identifies whether a keys made with the program, and not whether it was made with the same wallet/master key. Removing the marker also improves privacy, since the marker is currently stored with the public key sent to others.
  • Instead, generated keys now save the generation parameters in ${GNUPGHOME}/private-keys-v1.d. This allows keys to be freely created, and deleted as well (Without this PR, keys cannot be deleted). Since key generation is deterministic, the key can be recreated even if the key is deleted, by simply recreating it with the same parameters.
  • Added an option to set key derivation path for both primary keys and subkeys with --derivation-path. This allows having a derivation path that is different from the user id, as well as having different derivation paths for primary and subkeys if created separately (see flags below). The derivation path is stored with the rest of the key parameters in ${GNUPGHOME}/private-keys-v1.d. The same derivation path must be used if recreating a deleted key (Obviously).
  • Added the option to specify a different home directory for primary and subkeys when creating a subkey using --primary-homedir. This allows a subkey and primary key to use different wallets, either combining software and hardware keys, or using two hardware wallets. The subkey is created using the current hardware wallet, while the primary key used for signing is looked up from the specified home directory, and may belong to a software keystore, or a different hardware wallet.
  • Added options to configure the flags for keys, using --no-sign and --encrypt parameters. This includes making a certify-only (no signing) primary, creating one key instead of two (sign+encrypt) at once, and limiting encryption to communication or storage. By creating one key at a time, it's also possible to set a different derivation path for each.

This is a pretty big change, but it makes GPG far more usable and versatile. One aspect missing is the ability to add a software subkey with a hardware primary. GPG's default keystore doesn't have a "two home directories" key generator. Some clever design is necessary to allow signing a foreign key using a primary key from the current hardware wallet. Perhaps by adding a certify command, or somehow implementing the GPG key creation operations in a way that cooperates with an existing keystore.

Fixes #341. Makes #358 redundant.

Note: I've tested various combinations of hardware primary keys and subkey, but did not test non-hardware primary key with hardware subkey, which should work, but might not work when signing/encrypting, due to not being able to run commands on the primary key that's in another folder. This requires testing.

@SlugFiller
Copy link
Contributor Author

Okay, I have not a single clue why CI is failing. pydocstyle libagent is working fine for me both locally and inside docker. Any hints welcome.

@SlugFiller SlugFiller force-pushed the advanced-gpg branch 2 times, most recently from 239df93 to c0357c0 Compare September 14, 2023 02:00
@SlugFiller
Copy link
Contributor Author

Nevermind, I figured it out

@romanz
Copy link
Owner

romanz commented Sep 16, 2023

Many thanks for the contribution!
Would it please be possible to split it into separate PRs?

@SlugFiller
Copy link
Contributor Author

I'm... not sure how. I know it's big, but it's all part of the same thing. I don't think I could even split it into several commits in any way that makes sense and/or compiles.

Just as an example, removing the policy URI packet (which is also done in #298) would prevent the ability to trezor-gpg add without adding the 100 subpacket, since there'd be no way to tell apart a Trezor key (and derivation path) from a non-Trezor one, which changes which signer you need to use to certify it.

It's all interconnected.

@romanz
Copy link
Owner

romanz commented Sep 16, 2023

Maybe it would be please possible to split the following parts into separate commits/PRs?

  • Changed the way SIGINT is handled, so the agent can be stopped either with SIGINT or with a KILLAGENT message. Tested Linux version using Docker, but couldn't test if it works on Darwin as well. Also removed --timeout, as it no longer relies on busy-looping to listen for connections. This is rather ugly, and I have half a mind to make a PR converting everything to asyncio, so "waiting for a thing to happen" does not block the built-in signal handling.
  • Allow the GPG agent to handle concurrent requests, similar to the SSH agent. Also, it no longer crashes if a single request experiences an error.

@SlugFiller
Copy link
Contributor Author

@romanz Sorry for the delayed response. I wanted to be 100% certain about my answer.

I'm rather unhappy with the changes you've mentioned. They were mostly hacked together to make testing the GPG changes easier. To a certain extent, the GPG depends on them, but not vice versa, so you are correct in that it can be separated to another PR.

However, with your permission, I would like to try porting trezor-agent to trio. I've tested, and it seems to be able to handle signals in a sane way on both Windows and Linux. Although I've only done basic tests and didn't try spawning a server, it's showing signs of having less gotchas than asyncio and Python itself. And it's certainly a better solution that abusing ctypes and hoping it's not broken on certain platform variants in unexpected ways.

It would be a big patch, though, since it's essentially replacing all the IO.

Your thoughts?

(P.S. In the mean time, I've finished another PR that's a bit simpler)

@romanz
Copy link
Owner

romanz commented Sep 17, 2023

It would be a big patch, though, since it's essentially replacing all the IO.

IIRC, woudn't that require trezorlib to be async-friendly, to prevent blocking the event loop?

@SlugFiller
Copy link
Contributor Author

No. You just put all the device stuff in a thread. There are three ways to handle the concurrent calls it would inevitably create.

First, avoid them at the caller level, since you can (or rather, have to) just await the thread. If you have concurrent tasks, you can do locks between them.

Second, have a regular lock inside the thread. It won't influence the awaiting, since that would allow other tasks to run.

Third, and this is one I've done in not-Python a lot, spawn a single thread that acts as a job queue, receiving tasks from the other threads, and executing them one at a time in order.

TL;DR You just use trio.to_thread

@romanz
Copy link
Owner

romanz commented Sep 18, 2023

However, with your permission, I would like to try porting trezor-agent to trio. I've tested, and it seems to be able to handle signals in a sane way on both Windows and Linux.

Just to be sure - is the main issue cross-platform signal handling?

@SlugFiller
Copy link
Contributor Author

Just to be sure - is the main issue cross-platform signal handling?

It's not so much the signal handling itself. Rather, it's the lack of a way to wait for the first of multiple blocking actions, and signals is where it manifests the worst.

Another way to look at it is that there's no straightforward way to wait for the first of two threading.Events to be set. However, in this case there's a(n ugly) workaround: Create two threads, each waiting for one of the signals. When the thread ends, trigger a common event that represents the combination of the two. When the program ends, force-trigger the two events, in order to kill the threads.

For something like sockets, you can create a thread per socket, and you can close the socket from another thread to cancel the operation (which is how you'd usually use a cancel, anyway). Not efficient, but works.

With signals, such a workaround does not exist. The main reason is that Python's signal handler doesn't actually run registered callbacks, but simply sets a flag that makes the handler run after the next IO/blocking function ends in the main thread. This means, the following would freeze:

hup_event = threading.Event()
def handler(_):
  hup_event.set()
  
  signal.signal(SIGHUP, handler)
  hup_event.wait()

Even if SIGHUP is sent, the handler will not be called until hup_event.wait returns, which creates a deadlock. This wouldn't be an issue if Python had a signal.signal_spawn_thread which spawns a thread for the handler, instead of queuing it on the main thread. But it doesn't, so you have to use an async library instead.

Even if signal.pause was available on al platforms (it isn't), unlike events and sockets, there's no action you can take from another thread to force the pause to break on another action. So, for example, if you're waiting for a SIGINT to close the program, but then a KILLAGENT is received over a GPG socket, you can close the server, but the main thread would still be stuck waiting for a SIGINT that will never come. And there's no programmatic way to break it out of that wait.

Mind you, this issue is unique to Python (specifically, CPython), due to the way it designed its signal handling and GIL.

@romanz
Copy link
Owner

romanz commented Sep 20, 2023

Unfortunately I won't be available to review this and the suggested PR in the upcoming weeks...
Maybe you can try to rewrite the existing code to use trio and if it doesn't make it more complex and doesn't introduce too much external dependencies, I will be happy to review it :)

@SlugFiller
Copy link
Contributor Author

SlugFiller commented Sep 24, 2023

Rebased on top of #456 and added support for deleting keys (requires --delete-secret-and-public-keys)

I'm still contemplating extending derivation path setting to primary keys, and refactoring the way the derivation path is extracted for a key.

@SlugFiller
Copy link
Contributor Author

@romanz Okay, I've had a thought about taking this PR in a completely different direction. And I want your opinion.

Normally, GPG-agent is supposed to manage private/secret keys. It stores them, plain or encrypted, in ${GNUPGHOME}/private-keys-v1.d/{bin2hex(keygrip)}.key. Meanwhile, GPG itself manages the store of public keys, and certification signatures of said keys.

Trezor-GPG naturally doesn't store any private keys. Instead, it pretends to store them, by querying the public keys stored, and claiming it has a private key for any public key which has a specific metadata attached to it. This method creates two problems:

  1. Trezor-GPG can't respond to secret key deletions, since it never keeps a record of stored secret keys in the first place.
  2. Public keys are marked in a way that makes it easily detectable if a key was produced by Trezor-GPG or a different agent. This signature is needed by Trezor-GPG in order to recreate the matching private key, but it is visible not only to Trezor-GPG and GPG itself, but also to any third parties that need to verify the public key.

My suggestion is therefore to copy GPG-agent's behavior, but replace the "secret key" with a derivation path. Trezor-GPG would deterministically create "secret key" files named after the matching keygrip, and containing the derivation path. This method gives the following advantages:

  1. The public key doesn't need to contain a single hint about what the derivation path is. The derivation path can therefore be arbitrary, and simply default to the user id if not specified.
  2. Trezor-GPG can delete a secret key without deleting the macthing public key, and assert it as being deleted, if queried by a GPG client.
  3. It can also recreate it at any time if called with the same parameters used originally, since the secret key is deterministic.
  4. It requires fewer calls to GPG while running. Specifically, it doesn't need to call gpg --export just to find a key by keygrip, as it can simply look for a file with a matching name, and check that it contains a valid derivation path.

What do you think?

@doolio
Copy link
Contributor

doolio commented Jan 9, 2024

Any further thought given to this?

@SlugFiller SlugFiller force-pushed the advanced-gpg branch 2 times, most recently from 7f88219 to 03ccef3 Compare January 11, 2024 00:52
@SlugFiller
Copy link
Contributor Author

Since I didn't get any response in a while, so I decided to go ahead and implement key storage in private-keys-v1.d. Now it's possible to separately delete public or secret keys.

@Midar
Copy link

Midar commented Sep 21, 2024

This sounds very desirable, is this still being worked on? I'd love to have a subkey for signing instead of the main key.

@SlugFiller
Copy link
Contributor Author

I've updated the OP to better represent the current status of the PR. It's merge-ready, and I'm currently using it "in production". So you can feel free to use it while it's waiting to be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Subkey generation cannot proceed past GNUPGHOME directory check
4 participants