Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: New Mediatype - Container Image Encryption #747

Open
lumjjb opened this issue Jul 9, 2018 · 63 comments
Open

Proposal: New Mediatype - Container Image Encryption #747

lumjjb opened this issue Jul 9, 2018 · 63 comments

Comments

@lumjjb
Copy link

lumjjb commented Jul 9, 2018

Overview

We would like to propose a new media type for encrypted layers of a container image. This addition would facilitate the ecosystem for encrypted container images. This allows users with stricter trust requirements to be able ensure end-to-end encryption from build to runtime. In addition, it allows users to use a centralized managed repository (i.e. Docker Hub) without any risk of their images being compromised.

Outdated Design Doc (Look at PR for updated):
https://docs.google.com/document/d/146Eaj7_r1B0Q_2KylVHbXhxcuogsnlSbqjwGTORB8iw/edit?usp=sharing

Link to presentations:
Dockercon US 2019: https://www.youtube.com/watch?v=9LyPUy4XYbs&list=PLkA60AVN3hh-XtoZ8zoZir6wnpaVXGUgk&index=28
Kubecon CN 2019: https://www.youtube.com/watch?v=bzHPnlSfM_8

Tracking implementations:

containerd
https://github.com/containerd/cri/blob/master/docs/decryption.md
https://github.com/containerd/imgcrypt

crio
https://github.com/cri-o/cri-o/blob/master/tutorials/decryption.md

buildah
containers/buildah#2271

skopeo
containers/skopeo#732

Call for Contribution:

  • Quay
  • podman
  • Kaniko
  • Docker CLI

Links:

Details of updated changes can be viewed in the PR.
Implementation of those changes can be viewed in this PR.

Proposal written and discussed by:
Brandon Lum (@lumjjb), Dimitrios Pendarakis, Hani Jamjoom (@jamjoom), James Bottomley (@jejb), Phil Estes (@estesp), Stefan Berger (@stefanberger), Alaa Youssef within IBM.

This is a follow-up from a discussion with @stevvooe , and @dmcgowan at DockerCon.

Goals

In coming up with a proposal, we considered the following:

  • Application creators should not need to worry about details of encryption. Flows in the DevOps process and the runtime orchestration would be responsible for facilitating encryption.
  • Re-usability of layer de-duplication among container images
  • Encryption should be end-to-end, from the build step to when the container needs to be run on the worker machine. (i.e. registry owner should not be able to view the encrypted contents).
  • Encrypted images created can be for multiple trusted entities, and key management is responsible for managing that trust.
  • Specification should be easily integrated to generic with and across key management systems.
  • Secondary experimental goal: Allow more granular control of security, to provide possibility of desegregation and more fine-grained security controls. (i.e. base OS, middleware, application can be encrypted separately).

Proposed Changes

Details of updated changes can be viewed in the PR.
Implementation of those changes can be viewed in this PR.

image

In the creation of a container runtime bundle from the encrypted images, the runtime would perform an additional step of performing the decryption based on the information given in the annotations.

Additional Details

We include some of the interesting discussions we have about the proposal.

Encryption Standard

The encryption/decryption will be done will be according to OpenPGP standard as according to RFC4800.

  • The layer data will consists of Symmetrically Encrypted Data Packet as in RFC4880 Section 5.7
  • The wrapped keys (org.opencontainers.image.pgp.keys) will be an array of Public-Key Encrypted Session Key Packets as in RFC4880 Section 5.1
  • Decryption will be done by processing the wrapped key packets followed by the encrypted data packets.

Key Management

The purpose of Key Management is to assist in performing distribution, storage and use of keys. We note that this is important to be able to ensure that container runtimes are able to obtain the keys for decryption, and for encrypted container image creators to pass keys to their kubernetes/docker runtimes. However, we note that Key Management in itself can be seen as a separate component that assists the use of Encrypted Container Images. Therefore, we treat the process of Key Management as separate from the design of the Encrypted Container Images itself.

We note however, that Key Management is important in ensuring technology adoption. We discuss Key Management briefly. We have two models of key management that we consider, they are not exclusive and can probably be used in tandem.

Own Key Management

Key management is handled by the operator of the cloud, and plugins are provided to allow interfacing with existing Key Management Solutions. In this case, the Key Management Solution (i.e. Vault, Azure Vault, IBM KeyProtect, etc.) need to be trusted by the company (run by themselves or by a trusted service).

An example:

Company A has an existing internal Vault service. To build an image, the build machine or developer generates a symmetric key through Vault to perform encryption of the image. This symmetric key is then stored in the Vault service via an Encrypted Container Image Vault plugin. To run the image, the administrator configures it's kubernetes cluster with a Vault token to use the internal vault service via a Encrypted Container Image Vault plugin.

Fully Centralized Untrusted Key Distribution

Key management is handled by users and container runtimes interacting with an untrusted Key Distribution server (alongside container image registry). Private keys are still managed individually by users but no additional external party's trust is required (if server is compromised, no keys are lost). In addition, it provides a central location for users to manage. The trust model is similar to that of Docker Notary server.

An example:

To build an image, the build machine or developer generates a symmetric key and performs the encryption of the image. The symmetric key is then wrapped with the public key of the entities/receipients that it allows access to and registered with the server.

We spin up a new cluster (that operates uses a docker ID of orgcluster). The kubernetes cluster runtime downloads an encrypted image and identifies it does not have the necessary keys to decrypt the image. The runtime sends a request for the key for the layer to the FCUKD server. The approver (image owner or delegate) gets a notification of the request, and verifies, approves and submits a wrapped key to the system. Thereafter, the requestor (cluster) can download unwrap and use the symmetric key to decrypt the container image. The approval process may be automated through an Access Control Lists set by the key owner.

We note that it is possible to provide the symmetric key to the Fully Centralized Untrusted Key Distribution system and perform auto-approvals, but it is highly discouraged since it weakens the security of the system.

@stevvooe
Copy link
Contributor

In general, I think this proposal looks good, although, I would recommend namespacing the annotation names, per https://github.com/opencontainers/image-spec/blob/master/annotations.md.

This might need more review from security experts. I'd be worried about defining an algo/keyid schema and would prefer to use something predefined, if possible.

Is there an reference implementation we can use to test this approach with?

@cyphar
Copy link
Member

cyphar commented Jul 10, 2018

In general, the OCI deals with already-tested-in-the-field technologies (at least that's what recent discussions about distribution have led me to believe). While having a discussion with us is definitely encouraged (since we probably will have opinions on the design), OCI is extensible specifically to allow vendors to try out their ideas before standardisation.

Re-usability of layer de-duplication among container images.

I'm a little bit worried about this goal for multiple reasons.

  • Deduplication tables act like a form of compression and we know from BEAST that this general construction can result in attacks on most crypto primitives (I'm not sure if this is actually true for dedup tables but it is something to keep in mind). In addition, this also allows for fingerprinting of encrypted data (something that would not normally be possible -- because ideally you want the same cleartext to have different ciphertext).

  • This might make it difficult for us to implement content-defined-chunking for files, as well as removing the sequential archive concept from our format (though of course we would probably just re-design it in that case, but it is something to keep in mind).

Encrypted backup tools like restic do have very interesting designs for chunk-deduplicated snapshot-based filesystem images -- have you taken a look at how they work (because that design should influence image-spec if we want to purge tar archives from the format)?

I'm currently writing a blog post that details how we can improve image-spec to not use tar archives anymore, and to add chunk-deduplicated snapshot-based filesystem images. But of course that will take quite a while to make into a proposal (since it also requires having a realworld implementation).

@wking
Copy link
Contributor

wking commented Jul 10, 2018

Re-usability of layer de-duplication among container images.

I'm a little bit worried about this goal for multiple reasons.

Deduplication tables act like a form of compression and we know from BEAST that this general construction can result in attacks on most crypto primitives (I'm not sure if this is actually true for dedup tables but it is something to keep in mind).

I don't think this applies to this proposal. BEAST is based on attacker provided (or attacker-modified) plaintext. For example, a HTTPS response which contains data from a user-supplied form would be one way for the attacker to influence the plaintext. But in this case, the encrypter is only signing layers that they've decided to build themselves. If you're publishing builds of something you develop yourself, there would be no attacker-influenced plaintext. You'd only run into trouble with BEAST if you were encrypting attacker-influenced layers. And if you have attacker-influenced layers, you have bigger problems than BEAST ;).

This might make it difficult for us to implement content-defined-chunking for files, as well as removing the sequential archive concept from our format (though of course we would probably just re-design it in that case, but it is something to keep in mind).

This approach (append +enc to the media type and stuff in some annotations) seems pretty generic. You could even have a Merkle tree where each node was encrypted, although only folks with access to the key would be able to walk that tree for garbage collection and such. Where the registry-walkable trees are required, you'd have to shuffle things around a bit to encrypt the payloads but not the Merkle links. So, as you say, potentially some impacts, but nothing that seems difficult to work around.

enc.keyid - The reference to the key to use to perform the decryption of the layer.

Another approach to key distribution would be to encrypt to multiple public keys. For example, OpenPGP encrypts the payload with a random symmetric key, and then encrypts that symmetric key to one or more public keys. I don't know how that approach would fit into your enc.algo property, but you could always use additional enc.* properties to support it. If we want to support that use case, we may want to make enc.keyid an array of strings, instead of making it a single string.

@cyphar
Copy link
Member

cyphar commented Jul 10, 2018

@wking

I don't think this applies to this proposal.

I think you missed the second part of that point, which is that (unrelated to BEAST), taking advantage of deduplication of layers opens you up to fingerprinting attacks (unless the plan is to use a hash of the ciphertext -- in which case you should get zero deduplication because ciphertext should appear random and is not consistent when regenerated).

@wking
Copy link
Contributor

wking commented Jul 10, 2018

I think you missed the second part of that point, which is that (unrelated to BEAST), taking advantage of deduplication of layers opens you up to fingerprinting attacks...

Can you link docs for "fingerprinting attacks"? Searching is turning up things like TOR traffic analysis, which doesn't sound like what you mean.

Going back to the original:

Re-usability of layer de-duplication among container images

For example, if Alice distributes her sensitive application with:

  • Image 1
    • Unencrypted base layer(s) (e.g. from an Alpine 3.7 image)
    • Encrypted app v1.0
  • Image 2
    • Different unencrypted base layer(s) (e.g. from an Alpine 3.8 image)
    • Encrypted app v1.0 (same blob as for image 1)

That's de-duping, because there's only one blob for the encrypted app. I don't see how it would increase your exposure to attacks, although perhaps the fact that the encrypted app presumably works with both Alpine 3.7 and 3.8 would give you some knowledge of the encrypted content which could be used for a known-plaintext approach. If you were concerned about that sort of thing, you could always encrypt all of your layers.

And obviously folks will be exposed to ciphertext-only attacks, so you shouldn't be publishing anything that you don't want your grandkids to read ;).

But as long as you're using per-blob session keys (as in the OpenPGP approach I linked above) and periodically rotate your main key (which is only used for signing the short, random session keys), I think this approach seems pretty solid.

@stefanberger
Copy link

@wking

Another approach to key distribution would be to encrypt to multiple public keys. For example, OpenPGP encrypts the payload with a random symmetric key, and then encrypts that symmetric key to one or more public keys. I don't know how that approach would fit into your enc.algo property, but you could always use additional enc.* properties to support it. If we want to support that use case, we may want to make enc.keyid an array of strings, instead of making it a single string.

Do the 'more public keys' in OpenPGP belong to all the people you want to communicate with?
I would say our current thinking is that one would place a encrypted symmetric layer encryption key on some server and decrypt it once an entitled user places a request for the layer decryption key. The user would have to pass along his public key (certificate) and we would wrap the key with that public key and pass it back. This allows adding users to an access control list for a particular key in the future and pass an individually wrapped key back for each one of them.

@wking
Copy link
Contributor

wking commented Jul 10, 2018

Do the 'more public keys' in OpenPGP belong to all the people you want to communicate with?

They could. You could also encrypt to a key shared by the QA team, and a key shared by the production team, etc.

I would say our current thinking is that one would place a encrypted symmetric layer encryption key on some server and decrypt it once an entitled user places a request for the layer decryption key.

That works too, it's just about whether the encrypted session key is stored on the blob or independently. If it's on the blob, it's easier to mirror, because it's transmitted through whatever channel you already use for blob mirroring. If it's in a separate system, you can add/remove recipients without adjusting the blob and rootwards Merkle tree. I expect that the optimal solution will depend on the individual users and use cases. And either way we go with this, the other approach is only a new media-type extension away, so recovery is possible if our initial best-guess is wrong.

@lumjjb
Copy link
Author

lumjjb commented Jul 10, 2018

@stevvooe

I would recommend namespacing the annotation names, per https://github.com/opencontainers/image-spec/blob/master/annotations.md.

Modified the proposal to reflect org.opencontainers.image namespace.

This might need more review from security experts. I'd be worried about defining an algo/keyid schema and would prefer to use something predefined, if possible.

Just to clarify, are you referring to just using a specific algorithm? Or are you looking more for being able to point to an RFC reference of some sort such as https://tools.ietf.org/html/rfc5116 ?

Is there an reference implementation we can use to test this approach with?

Not at the moment. This is our next step. Our current thoughts are to prototype something in containerd for the runtime. We are most definitely open to suggestions on this.

@wking

I think you articulated very clearly what we wanted out of the "de-duplication". Thanks :).

Another approach to key distribution would be to encrypt to multiple public keys. For example, OpenPGP encrypts the payload with a random symmetric key, and then encrypts that symmetric key to one or more public keys. I don't know how that approach would fit into your enc.algo property, but you could always use additional enc.* properties to support it. If we want to support that use case, we may want to make enc.keyid an array of strings, instead of making it a single string.

I really like the idea of wrapping the keys! This would simplify some of the key management problems and infrastructure a lot.

However, I am a little unsure about the scenario where we would want to dynamically allow new users/parties to use our image. A specific case in mind I had was bootstrapping a cluster i.e. when we set up new kubernetes cluster, the cluster would have a CA. And if we wanted to provide access to an encrypted image to a new cluster, we would wrap the symmetric key with the public key of the cluster CA.

I am in favor of using the wrapped keys along with the image as you proposed, and also provide the option to interface with a key management system in the event that the wrapped keys in the registry is not for the user. I am on board for making a wrapped key array.

@stefanberger
Copy link

I suppose the next question is how to implement this and what command line parameters to pass. I suppose docker commit should be instrumented to support this first. docker build may be another later candidate.

Would we want to manage symmetric keys internally somehow with a new set of commands>? We could pass it directly to docker commit --encryption-key file:<path> or reference the key with by name or id if internally managed docker commit --encryption-key name:<keyname>?

Also, how do we pass our one or multiple friends' public key via command line? docker commit --wrapping-pubkey file:<pubkey> and allow multiple of those be passed? Or pass in a config file that references those public keys/certs?

Do we want to give control over the individual layers or just the last one? docker commit --encrypt-layers=<last|all>. And we would refuse to re-encrypt already encrypted layers (in the base image)?

@wking
Copy link
Contributor

wking commented Jul 12, 2018

Also, how do we pass our one or multiple friends' public key via command line? docker commit --wrapping-pubkey file:<pubkey> and allow multiple of those be passed?

gpg uses --symmetric <key> for symmetric keys
and --recipient <name|email|key-id?> (which can be given multiple times) for wrapped session keys.

@stefanberger
Copy link

Do we want to tie this in with gpg in some way or manage recipients in some way ourselves?

@stefanberger
Copy link

The pgp public key server may come in handy...

@jejb
Copy link

jejb commented Jul 12, 2018 via email

@wking
Copy link
Contributor

wking commented Jul 13, 2018

Pretty much everything is following the spirit (if not the letter) of the s/mime spec PKCS#7: https://tools.ietf.org/html/rfc2315 for Enveloped-data...

That section talks about the same random-session-key-encrypted-to-each-recipient approach. Do you see a difference between PKCS#7 and OpenPGP on that score?

@stefanberger
Copy link

PGP seems to have its own message format.
Section 5.1 (https://tools.ietf.org/html/rfc4880#section-5) describes the support for multiple recpients:

5.1.  Public-Key Encrypted Session Key Packets (Tag 1)

   A Public-Key Encrypted Session Key packet holds the session key used
   to encrypt a message.  Zero or more Public-Key Encrypted Session Key
   packets and/or Symmetric-Key Encrypted Session Key packets may
   precede a Symmetrically Encrypted Data Packet, which holds an
   encrypted message.  The message is encrypted with the session key,
   and the session key is itself encrypted and stored in the Encrypted
   Session Key packet(s).  The Symmetrically Encrypted Data Packet is
   preceded by one Public-Key Encrypted Session Key packet for each
   OpenPGP key to which the message is encrypted.  The recipient of the
   message finds a session key that is encrypted to their public key,
   decrypts the session key, and then uses the session key to decrypt
   the message.

So this sounds good for supporting multiple recipients if we were to just us OpenPGP tools for creating the encrypted layers. We would follow a standard ... Though, it also ties us into the PGP message format. If we wanted to extend the system with a more dynamic handling of users that can decrypt the layers of an image we'd have to later on be able to parse the OpenPGP message to find the encrypted message and also find an id of the symmetric key.

Ideally we should be able to find the following information somehow either in the OpenPGP data stream or separate the encrypted message and the metadata to decrypt the message in a form like this one here extending the above proposed image annotations:

annotations: {
    enc.keyid: "0x12345678",
    enc.keyid_owner_account: "image-author",
    enc.wrapped : [{
        key_owner: "[email protected]",
        key_id: "0x11223344",
        wrapped_key: "0x76923749238749286565...",
    }, {
        key_owner: "[email protected]",
        key_id: "0x44332211",
        wrapped_key: "0x983r093275765",
    }]
}

If I am [email protected], I will use my key 0x11223344 to decrypt the wrapped_key part to get to the symmetric key. If I don't find myself in the enc.wrapped key list I can go go the server and ask for enc.key 0x12345678 under account image-author for the symmetric key and will get it back encrypted with my public key assuming I am on the ACL for this key.
Would we want to try an OpenPGP type of decryption of the layer first (assuming the layer is in OpenPGP format) and if this fails fall back to asking the server for the key? I am just wondering whether OpenPGP is suitable do to this or whether there's some tool implementing PKCS-7 type of messages that seem to be more suitable?

@wking
Copy link
Contributor

wking commented Jul 13, 2018

enc.keyid: "0x12345678",
enc.keyid_owner_account: "image-author",

I don't think we need an enc.keyid_owner_account. Will the symmetric key-store really need to shard these by author? If you're concerned about garbage collection, I think you want the key-store listening for blob-deletion, so it can remove key 123 when the last blob using that key is deleted.

Similarly, I think we can drop nominal-owner info from the wrapped array. Owner info will be accessable via the recipient ID (e.g. attached to an OpenPGP key or X.509 cert) where the key<->owner relationship can be signed by others. I'd be concerned about folks giving unsigned owner assertions here more weight than they deserve.

And if you want, the key-store could have its own public key, and go into the wrapped array too. Will folks really use the same session key for multiple layers? Encrypting a random session key to keys.example.com seems safer. Users not directly authorized (i.e. able to decrypt one of the wrapper payloads) would notice the keys.example.com wrapper and apply for decryption. As a bonus, this allows one key-store architecture to be "these users are authorized for all blobs", in which case it only needs to store its own key, and not maintain any layer <-> authorized-users mapping.

Also, annotation values must be strings (previous discussion starting here) you'll need to serialize to a string, mint a new descriptor property for the wrapped array, or make an encrytped media type as a separate blob:

  • manifest's layers[] descriptor points at the enctrypted blob
  • encrypted has wrapped keys in an array, and a data descriptor pointing at the encrypted layer.
  • encrypted layer

@wking
Copy link
Contributor

wking commented Jul 13, 2018

Would we want to try an OpenPGP type of decryption of the layer first (assuming the layer is in OpenPGP format) and if this fails fall back to asking the server for the key? I am just wondering whether OpenPGP is suitable do to this or whether there's some tool implementing PKCS-7 type of messages that seem to be more suitable?

For both of OpenPGP and S/MIME, the off-the-shelf approach would be to leave the descriptor schema alone and use multipart/encrypted descriptors in the manifest's layers. Then the referenced blobs would have payloads like this for OpenPGP and this for S/MIME.

@stefanberger
Copy link

@wking The enc.keyid_owner_account would at least reduce the possibility of a key_id collision among different users, though not completely eliminate it (per user) but the key server could refuse two distinct keys (per account) that map to the same key id. It would depend of course how long we make these key IDs for symmetric keys. RFC 4880 does seem to hint at a similar problem for their Key Ids for public keys here. Besides that it's not clear whether their should be a centralized key server that holds much information about keys (and be a high value target) or whether this server forwards requests for symmetric keys to servers that the owners are running themselves. Such request could be forwarded to the owners' server not by key id but by account name.

The above JSON was primarily meant as an example to show what information may be needed.

@wking
Copy link
Contributor

wking commented Jul 13, 2018

The enc.keyid_owner_account would at least reduce the possibility of a key_id collision among different users, though not completely eliminate it (per user) but the key server could refuse two distinct keys (per account) that map to the same key id. It would depend of course how long we make these key IDs for symmetric keys.

Yeah. I think the solution to that is to use longer hashes for the IDs. I'm not surprised that RFC 4880 is warning about collisions for folks using only 8 bytes ;). But even with longer IDs, there could still be collisions. I don't think collisions are a problem though. If a collision between Alice and Bob's keys makes the target ambiguous, they can each just attempt decryption to see if the payload was really decrypted to them. There are some denial-of-service vulnerabilities in this area (asking Alice to attempt decryption of unrelated packets), but you don't avoid them with owner-string namespacing.

Besides that it's not clear whether their should be a centralized key server that holds much information about keys (and be a high value target) or whether this server forwards requests for symmetric keys to servers that the owners are running themselves. Such request could be forwarded to the owners' server not by key id but by account name.

Why not use the key ID as the account name, wherever you're keeping the (key/user)-to-access-server mapping?

@stefanberger
Copy link

@wking Ok, so we can get rid of the account name if the keyid is sufficiently long to be unique and the central server, that would presumably somehow notify the owner of the key, refuses duplicate keyids to be registered with it. [Some sort of registration seems to be necessary.] I suppose for troubleshooting or just being able to contact an owner the central server should be able tell who the key owner is.

@stefanberger
Copy link

To avoid registering bogus key IDs in the central server, one could use the static account info to contact the final server.

@wking
Copy link
Contributor

wking commented Jul 13, 2018

@stefanberger, central servers and alternatives seem out of scope here (maybe they would be in-scope for the distribution spec?). Once you have a set of recipient IDs and payloads encrypted to those IDs, you can have many independent ways of actually decrypting those payloads without impacting the image format.

@jejb
Copy link

jejb commented Jul 13, 2018 via email

@wking
Copy link
Contributor

wking commented Jul 13, 2018

... because I can see how you distribute the keys for the image being a significant cloud native function, and one that has to comport with all the current keystore ideas in CNCF, so we want to take an enabling but not prescriptive approach.

Are there CNCF decryption-API proposals? Then you could authorize decryption at request time "yes, Alice is authorized for key 123 decryptions now, so I'll pass back the decrypted session key". If instead you distribute a long-running key itself ("Alice is authorized for key 123 now, so I'll pass back its private key"), Alice will have non-revokable access to anything ever encrypted to that key.

@stefanberger
Copy link

If we were to use OpenPGP for managing friends' public keys, then would we also want to use it for the layer encryption directly and take its encrypted output as the encrypted layer? Or only use it to manage public keys? I guess I am not clear what others' opinions are now. I don't think the format is ideal, but I am not sure whether designing our own is better. What I don't like about it is that it encodes Key IDs in the Public-Key Encrypted Session Key Packets that don't give a hint of who these keys are for. If keys are identifiable by their owners' email address, then this information should be preserved I think. The original identifiers of those keys may be useful if some day I were to build a new version of the image or add a new user to it. Those email addresses seem more user friendly than 4 byte key Ids.

The body of this packet consists of:

- A one-octet number giving the version number of the packet type.
       The currently defined value for packet version is 3.

     - An eight-octet number that gives the Key ID of the public key to
       which the session key is encrypted.  If the session key is
       encrypted to a subkey, then the Key ID of this subkey is used
       here instead of the Key ID of the primary key.

     - A one-octet number giving the public-key algorithm used.

     - A string of octets that is the encrypted session key.  This
       string takes up the remainder of the packet, and its contents are
       dependent on the public-key algorithm used.

@wking
Copy link
Contributor

wking commented Jul 13, 2018

What I don't like about it is that it encodes Key IDs in the Public-Key Encrypted Session Key Packets that don't give a hint of who these keys are for.

That information is associated with the public key, which you can retrieve by using the key ID. This is very similar to X.509, where you have a private and public key, as well as a separate certificate asserting the identity of the private-key-holder. What do you gain by embedding that metadata (as an unsigned assertion) in the recipient list? If you want to resolve that metadata, you should use the key ID to retrieve metadata which has been signed by parties you trust (e.g. in your web of trust, your trust-on-first-use database, a shared certificate authority, etc.). For example:

$ gpg --search-keys 0xBB729EC7
gpg: searching for "0xBB729EC7" from hkp server keys.gnupg.net
(1)	CoreOS Application Signing Key <[email protected]>
	  4096 bit RSA key FC8A365E, created: 2016-03-02, expires: 2021-03-01
Keys 1-1 of 1 for "0xBB729EC7".  Enter number(s), N)ext, or Q)uit > q

The analogous PKCS#7 object is similar in just recording the IssuerAndSerialNumber:

RecipientInfo ::= SEQUENCE {
     version Version,
     issuerAndSerialNumber IssuerAndSerialNumber,
     keyEncryptionAlgorithm

       KeyEncryptionAlgorithmIdentifier,
     encryptedKey EncryptedKey }

so you'd have to use that to lookup the cert if you wanted to find the recipient's name. A difference between the PKCS#7 approach and the OpenPGP approach is that the former only supports one party (the issuer) for asserting metadata (like the recipient's name or domain name), while OpenPGP references the key itself, and allows multiple parties to assert metadata associated with that key.

@lumjjb
Copy link
Author

lumjjb commented Jul 16, 2018

i like the OpenPGP spec. It seems to be an easy option for users to share keys without a trusted certificate hierarchy. Local users that want to use encrypted containers can use a combination of gpg and docker to run encrypted images (by utilizing the gpg keychain).

In addition, it is convenient that there exist a golang library that implements OpenPGP :) :).

The data layer would be the ciphertext packet and the wrapped keys packets would be the enc.keys array in the annotations.

I will update start updating the original proposal again to note some of the ideas/comments in the discussions.

For the scenario raised where we need to pass keys to new users, @stefanberger and I have discussed a design with the "Fully Centralized Untrusted Key Server" which will work with the Open PGP model. We have the wrapped keys associated with the image be the trusted parties and have the trusted parties be able to re-wrap the keys.

@lumjjb
Copy link
Author

lumjjb commented Jul 17, 2018

Updated proposal. In addition, @stefanberger, @estesp and I are looking into implementation details with containerd and possibly buildkit.

@wking
Copy link
Contributor

wking commented Jul 17, 2018

Updated proposal.

Looks like you have stale enc.algo and enc.keyid references now that the meat is all under org.opencontainers.image.enc.keys.

Also, as I mentioned earlier, annotation values must be strings, so the proposal graphic should either move ghe keys property out of annotations or convert the value to a string with something like:

"annotations": {
  "org.opencontainers.image.enc.keys": "[\"wrapping1\",\"wrapping2\"]"
}

And there seems to be a dangling "registered with a Key ID to the organization namespace" paragraph fragment. Maybe leftover from a partial edit?

@lumjjb
Copy link
Author

lumjjb commented Jul 18, 2018

Thanks @wking ! I have made the changes!

Also, as I mentioned earlier, annotation values must be strings, so the proposal graphic should either move ghe keys property out of annotations or convert the value to a string with something like:

made the keys comma-delimetered base64 strings.

@stefanberger
Copy link

@wking Can one currently push any images where the JSON documents were (re-)written by containerd? My guess is that our failed attempts to push images should actually be quite common.

@wking
Copy link
Contributor

wking commented Aug 2, 2018

Can one currently push any images where the JSON documents were (re-)written by containerd?

I don't know. The distribution-spec is currently pretty vague about pushing manifests, with two broken links for the types. But it looks like docker/distribution only landed support for OCI types two weeks ago. If your registry is using that codebase, maybe that PR has what you need?

@lumjjb
Copy link
Author

lumjjb commented Aug 2, 2018

CC: @harche who is also contributing.

@lumjjb
Copy link
Author

lumjjb commented Aug 3, 2018

@harche pointed out an issue with the current build of the registry not branching into the correct routines for the OCI image. He can probably elaborate in much more detail.

We will file an issue with the registry to fix that. In the meantime, we are using a similar mechanism done in buildkit as pointed out by @estesp:

https://github.com/moby/buildkit/blob/master/exporter/containerimage/writer.go#L83-L100

With this, image push/pull works perfectly with the current registry.

@lumjjb
Copy link
Author

lumjjb commented Aug 7, 2018

We have referenced the current WIP implementation for playing around with! There is a README in the referenced "WIP PR" for the features we wrote.

containerd/containerd#2532

Do let us know if it works well for you - or any issues that come up.

Is there an reference implementation we can use to test this approach with?

@stevvooe here's something to play with :)

@stevvooe
Copy link
Contributor

stevvooe commented Aug 7, 2018

@lumjjb Thanks for submitting the PoC! I'll take a closer look at that in detail. I have a few comments in line that should help.

Updated to use mediatype +pgp

Please undo this. ;)

  1. It seems while this proposal was a fairly generic implementation of providing encryption support in descriptors, it seems it has devolved into discussion specific to PGP. Let's focus around the modified mediatype that punts encryption parameters to labels. It is clean, simple, portable and extensible.
  2. PGP is a good test case, but we should also look at other encryption schemes so we can ensure we have an approach that covers other encryption schemes.
  3. When you do encryption, you give up deduplication. @cyphar referenced this with BEAST, but its more than that. We need to acknowledge that this is no longer the goal with encrypted images and pursuing that goal would likely compromise the resulting implementation. Every time you encrypt a layer, there is generally a nonce that should be used that will actually change the cyphertext each time it is encrypted.
  4. It looks like there may be a bug in the registry in requiring the mediaType in the top-level manifests field. That should not be required if using OCI mediatypes.
  5. In general, key distribution is outside the scope of these specifications -- there are a lot of ways to do it. The balance is providing enough information in the layer encryption configuration to allow one to decrypt the image without having to rebuild images when upstream keys/ids/services change.

Hopefully, I am not stomping on the discussion too much and we have enough to move forward.

@lumjjb
Copy link
Author

lumjjb commented Aug 8, 2018

  1. It seems while this proposal was a fairly generic implementation of providing encryption support in descriptors, it seems it has devolved into discussion specific to PGP. Let's focus around the modified mediatype that punts encryption parameters to labels. It is clean, simple, portable and extensible.

Agreed on the scope. I have some questions about the nature of extensibility though - my worry is that key storage/retrieval needing to be generalized to different encryption schemes - i.e. they may use different key formats and key identifiers. I.e. in OpenPGP we can specify a recipient by email.

There are a few questions that we would need to discuss: 1. "Should key retrieval be specific to encryption scheme?" 2. "If not, How do we handle key retrieval and identification across different encryption schemes?"

  1. PGP is a good test case, but we should also look at other encryption schemes so we can ensure we have an approach that covers other encryption schemes.

We've done a bit of discussion on the topic while working on the PoC, and at least for the usecases that we've thought about, we think that OpenPGP can handle them - and that any variants of supported encryption details would be bindings at the lower level of OpenPGP. I.e. using a HSM routine instead of calling the one in OpenPGP. But definitely, this is a topic that would be nice to hear what others think.

3.When you do encryption, you give up deduplication. @cyphar referenced this with BEAST, but its more than that. We need to acknowledge that this is no longer the goal with encrypted images and pursuing that goal would likely compromise the resulting implementation. Every time you encrypt a layer, there is generally a nonce that should be used that will actually change the cyphertext each time it is encrypted.

Yup - that is true.

We've tried to provide some facilities (i.e. adding recipients without re-encrypting the blob) to help provide some type of deduplication where it would not compromise the security of the content.

The choice of performing encryption on specific layers (that would be supported) allows there to be de-duplication of non-confidential layers (i.e. Building a secret application on top of an ubuntu image, the ubuntu part of the image is not secret.).

Actually, I'm not too familiar with the other types of deduplication that are done. Is there deduplication happening on pages within layers that is happening?

  1. It looks like there may be a bug in the registry in requiring the mediaType in the top-level manifests field. That should not be required if using OCI mediatypes.

Yup - we realized that we had mistakenly specified the docker index type instead, so the registry was going down the wrong codepath. This issue has been resolved appropriately now! :)

  1. In general, key distribution is outside the scope of these specifications -- there are a lot of ways to do it. The balance is providing enough information in the layer encryption configuration to allow one to decrypt the image without having to rebuild images when upstream keys/ids/services change.

Agreed. I think a good candidate interface to key distribution systems are key stores - since most of them work around getting the keys into a particular key store. Given that, we would probably look to implementing an interface for using/handling keys that would follow the generic key model of PKI ( i.e. UnwrapKey, GetPublicKey, etc. ). This should add pluggability with both services like Vault and more low level interfaces like TPMs and HSMs (which technically could be used through a key store).

@cyphar
Copy link
Member

cyphar commented Sep 1, 2018

@lumjjb

Actually, I'm not too familiar with the other types of deduplication that are done. Is there deduplication happening on pages within layers that is happening?

No, currently the deduplication is only on the layer level. I have a proposal I'm working on (and giving a talk on next week) that will allow for content-defined-chunking to be used for the entire image tree (similar in concept to https://restic.net/). This would then provide deduplication at a much finer level (and give us an opportunity to fix all of the awful tar code).

We've done a bit of discussion on the topic while working on the PoC, and at least for the usecases that we've thought about, we think that OpenPGP can handle them

The main issue with OpenPGP is that the message format is not followed by all implementations. GPG does a lot of things that nothing else does, and many implementations that support the RFC have taken liberties with some of the important undefined behaviour. My impression of OpenPGP (aside from the fact that it still doesn't support authenticated encryption -- which should be a strong argument against its use) is that we should not be using it unless we actually want to interact with systems that already auto-consume stuff that has OpenPGP messages in it (such as emailing images).

But that's just my $0.02. I do get why you would use it (and I also felt the same way when working on some of my side projects) but I quickly discovered it has many many many issues.

@wking
Copy link
Contributor

wking commented Sep 1, 2018

The main issue with OpenPGP is that the message format is not followed by all implementations. GPG does a lot of things that nothing else does, and many implementations that support the RFC have taken liberties with some of the important undefined behaviour.

Can you cite examples? Were they with or without the --openpgp compliance option? Searching for compat issues turned up stuff like this, which doesn't look like a big deal for our use cases.

My impression of OpenPGP (aside from the fact that it still doesn't support authenticated encryption -- which should be a strong argument against its use)...

Really? The proposed structure here has encryption at the layer/manifest level, with signatures over on the config (I think; there's no opencontainers signature spec, #22, #176, #400, etc.). And "encrypt the layer and then reference it via a cryptographic hash" (this proposal) is very similar to encrypt-then-MAC, with the difference being hashes vs. HMACs. Do we care about unforgeability at the encryption/blob level? I'd expect we only care about it at the signature level. And if we do care about unforgeability at the blob level, shouldn't we be using an HMACs (alongside raw hashes?) in the Merkle DAG? I don't think this distinction is something we can narrowly address in an encryption proposal.

... is that we should not be using it unless we actually want to interact with systems that already auto-consume stuff that has OpenPGP messages in it (such as emailing images).

Lots of devs already have OpenPGP keys, and they're a familiar tool in the software-distribution ecosystem. I think compat with that existing infrastructure is a plus. S/MIME / PKCS#7 is an extablished alternative in this space, as discussed previously in this issue. For our purposes, I think the meaningful distinction is whether the ecosystem allows one (S/MIME / PKCS#7) or many (OpenPGP) parties to make assertions about key metadata (see this earlier comment). Do you have an alternative system you'd like to put on the table?

@stefanberger
Copy link

[...] aside from the fact that it still doesn't support authenticated encryption -- which should be a strong argument against its use) is that we

I don't think authenticated encryption like encrypt-then-MAC can address the problem of a malicious third party replacing a wrapped key and layer. You'd need an image author's signature over the top level manifest (Notary).

@lumjjb
Copy link
Author

lumjjb commented Sep 11, 2018

Updating the thread with some offline discussions with @stevvooe @stefanberger @dmcgowan @estesp @harche @crosbymichael :

  • the ability for the OCI specification to support the use of other encryption protocols (i.e. the OCI spec should not lock a user into a specific encryption protocol)
  • Implementing encryption in client vs server side should be explored.
  • We will investigate into JWK/JWE protocols to see if they are a good fit/option.

@lumjjb
Copy link
Author

lumjjb commented Sep 11, 2018

We have been looking into JWE/JWK and it seems like a promising approach. It seems to support a rather generic interface. However, at the moment, it does not have support for all protocols, more specifically pgp, tpms, hsms, etc.. We are working with the authors of the RFC standard to try to work in PGP as a key option.

While we try to work towards getting JWE/JWK to be able to support majority of encryption protocol usecases, we have added a field:

org.opencontainers.image.enc.scheme = "pgp"/"jwe"

As support for other protocols in JWE/JWK mature, we will move integration of the protocols into JWE, and hopefully it is possible that it can encompass majority of encryption protocols.

@lumjjb
Copy link
Author

lumjjb commented Sep 13, 2018

Through the exploration of JWE, we ran into similar issues that we encountered with OpenPGP, but due to the different interfaces exposed, we found it difficult to adopt it the same way we did with OpenPGP. More specifically, it was hard to separate the encrypted data with the rest of the metadata and keys to benefit from deduplication of layer data. 

@stevvooe proposed a scheme where we the layer is encrypted by a block cipher and the ciphertext of the encryption scheme used for key delivery be the key of the block cipher.

An rough idea of this would be:

Annotations:
org.opencontainers.image.enc.cipher: "AES_128_GCM"
org.opencontainers.image.enc.cipher.IV: "3r9irw3r3" (or other options)
org.opencontainers.image.enc.key.openpgp: "<base64 opengpgp stream>, <base64 opengpgp stream>"
org.opencontainers.image.enc.key.jwe: "<base64 JWE json>, <base64 JWE json>"

 

@stefanberger
Copy link

stefanberger commented Sep 13, 2018

We don't need so many annotations if we put the IV and cipher into the encryptor configuration/metadata which then gets encrypted by OpenPGP or JWE along with the key used for encrypting/decrypting the layer. At least the IV doesn't need to be visible to anyone.

{
    "cipher.iv": "3r9irw3r3",
    "cipher": "AES_128_GCM",
    "key": "beg62453abed",
}

It may even save space if we reduce the spaces in the JSON.

@lumjjb
Copy link
Author

lumjjb commented Sep 13, 2018

I've created a google doc for easier discussion of details:

https://docs.google.com/document/d/146Eaj7_r1B0Q_2KylVHbXhxcuogsnlSbqjwGTORB8iw/edit?usp=sharing

We don't need so many annotations if we put the IV and cipher into the encryptor >configuration/metadata which then gets encrypted by OpenPGP or JWE along with the key used for >encrypting/decrypting the layer. At least the IV doesn't need to be visible to anyone.

yea, I think that's cleaner from an interface perspective - it is just a struct with all the information.

I like that the cipher information is available on the top level for viewing though. Keeping a non-functional redundant copy would also be nice too.

@lumjjb
Copy link
Author

lumjjb commented Sep 14, 2018

I have updated the document to be comment on sign-in (or so I believe so), so that we don't have multiple "Anonymous" users to facilitate discussion.

@cyphar
Copy link
Member

cyphar commented Sep 14, 2018

Right sorry, I forgot to log in. I was the "Anonymous" user.

@lumjjb
Copy link
Author

lumjjb commented Nov 19, 2018

@mikedanese
Copy link

mikedanese commented Jul 2, 2019

The proposed structure here has encryption at the layer/manifest level, with signatures over on the config (I think; there's no opencontainers signature spec, #22, #176, #400, etc.). And "encrypt the layer and then reference it via a cryptographic hash" (this proposal) is very similar to encrypt-then-MAC, with the difference being hashes vs. HMACs. Do we care about unforgeability at the encryption/blob level?

If you are trying to remove the registry as a trusted party, then you care about authentication. If the registry has control over the manifest and the encrypted blobs, then you need to pick a cipher that provides security under chosen cipher text attacks.

I don't think authenticated encryption like encrypt-then-MAC can address the problem of a malicious third party replacing a wrapped key and layer.

It's less about the ability to replace and more about what you can do with the ability to replace. E.g. without authentication, someone in control of the registry can use clients as a padding oracle if they can detect information about padding errors. With some knowledge of the plaintext, someone in control of the registry can selectively flip bits of the plain text.

@lumjjb
Copy link
Author

lumjjb commented Jul 2, 2019

Hey @mikedanese, hope you found this interesting! Btw, this issue is a bit stale, we've opened a new PR that you may want to check out - which may be clearer in articulating some of the newer changes: #775

Yup. The authentication that we suggest for encryption is targeted just for the encrypted blob. Integrity of the image still relies of a separate process of image signing and verification (i.e. with Docker Content Trust and the notary/TUF). We leave those integrity attacks to signing :).

@justincormack
Copy link

It is not quite the same, as the encryption is also authenticated, we are not just relying on the hash as a MAC. There are some situations in which the registry might guess the plain text (eg if you encrypt a known public base image like Debian, it is not hard to guess what it is from the length), and it is definitely worth doing a full security analysis of the design.

@PatrickLang
Copy link

@jterry75 @jstarks - have you reviewed this to make sure it can also work with Windows images?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants