Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test communications when root key pair is renewed and NodeID is changed #317

Closed
emmacasolin opened this issue Jan 20, 2022 · 14 comments · Fixed by #378
Closed

Test communications when root key pair is renewed and NodeID is changed #317

emmacasolin opened this issue Jan 20, 2022 · 14 comments · Fixed by #378
Assignees
Labels
development Standard development r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management

Comments

@emmacasolin
Copy link
Contributor

Specification

When the root keypair is renewed, we expect the old nodeId and client certificate to still be valid (in constrast to when it is reset).

From #312 (comment):

Changing the node id will impact communications between:

  • Agent to Agent - this is mTLS
  • CLI to Agent - normal TLS (client does not provide cert)
  • GUI to Agent - normal TLS (client does not provide cert)

Under Agent to Agent, this is going through the forward and reverse proxies, and also the GRPCClientAgent. The GRPC client does not use TLS but the TLS is being done between forward and reverse proxies. This means that when we perform a keyManager.renewRootKeyPair, the old NodeId is still valid until the certificate expiration date. This means nodes contacting the changing agent with the old NodeId should still work, but receive an automatic update to the NodeGraph when the node id changes. I believe the logic to check the cert chain is in network/utils.ts verifyServerCertificateChain, but the logic to update the node id in the nodegraph is not verified yet.

Under CLI to Agent and GUI to Agent, they are using direct TLS on the GRPCClient. Existing GRPCClient connections should continue to work and not be interrupted by the TLS change. I believe I had tests for this under tests/grpc/GRPCServer.test.ts as changing the private key and certificate on the fly. However new connections will need to be made with a new target Node ID. This can occur by reading the status each time a new connection is made. However I think a similar TLS verification logic could take place where the old node ID can still be valid.

So we need tests to ensure this is the case. One for tests/grpc/GRPCClient.test.ts to ensure that the GRPCClient node id can still work. Another for tests/network as well.

Then we would extend these tests into tests/nodes for NodeConnection, and also in tests/bin to test CLI situation.

Additional context

Tasks

  1. Check logic for updating the certchain and node id
  2. TLS verification to allow old node ids to still be used
  3. Tests for GRPCClient, network domain, nodes, and CLI
@CMCDragonkai
Copy link
Member

From #326 (comment)

@joshuakarp when the await this.nodeManager.refreshBuckets(); is executed, it says it must be done upon key renewal.

However how does it get access to the new key? Is it calling the KeyManager internally? For something like this, it'd make sense to pass the the new key if it has to use it. Also is there any downsides to this? Seems like something to incorporate into #317.

This should be incorporated to the testing here.

@CMCDragonkai
Copy link
Member

Changes from #326 will now bring in recovery code information when the keypair changes too.

But this revealed that pk keys commands should also report the new recovery code if the keypair changes. This is something that should be checked too.

@CMCDragonkai
Copy link
Member

Any existing connections proxy connections or node connections should continue to work without problems, however asking for the node ID may give a different answer than the expected original node ID during connection establishment.

Furthermore, if a node is a aware that there is a newer node ID, it should proceed to update its own NodeGraph with this information, as well as any pre-existing relationships that the old node ID had. So for example if old node ID was part of a gestalt that had certain permissions, then the new node ID is now part of that gestalt as well.

The user should be notified somehow about the fact that they are using an old node ID. And that they should switch to the new node ID. However if they continue to try to contact the old node ID on the CLI or GUI, then it should still work as long as the old node ID is still part of the valid certificate chain. This should mean that we don't delete the old node ID.

Any garbage collection of the old node ID should only occur lazily. Not sure exactly when, but perhaps as part of a natural forgetting process of the PK network.

@tegefaulkes
Copy link
Contributor

There are 3 kinds of key reset.

  • resetRootCert, This recreates the cert chain from scratch but keeps root keyPair
  • resetRootKeyPair, This recreates the cert chain from scratch with new root keyPair
  • renewRootKeyPair, this creates a new root keyPair and adds it to the cert chain. Old one should still be valid.

We expect there to be issues with chaning the root keyPair of the receiving side since that's the side that is authenticated. The conditions for a failure to authenticate would be the target having a new keyPair AND the old one not in the cer chain.By theses conditions only resetRootKeyPair will cause a node connecting with old details to fail.

I'd expect RenewRootKeyPair to still authenticate properly since it will keep the only cert on the cert chain. But currently it is failing so I need to look into it. I suggests that the authenticate process is not checking the old certs in the chain properly. Or the old cert is not properly maintained in the chain.

When trying to connect with bad authentication we get the following error out of the proxy in the logger ERROR:NodeConnection test:Failed CONNECT to 127.0.0.1:34670 - ErrorCertChainUnclaimed: Node ID is not claimed by any certificate. But this error doesn't propigate down to the NodeConnection so as far as the NodeConnection can tell it's just failing to connect and times out. Shouldn't we be throwing some kind of authentication error instead?

Also just a note. I can get the tests done today but that's just to check that things stil work as expected when reseting the root keypair. The aspect of this issue where a connecting node will notice that the nodeId/cerChain has been updated and reacts accordingly still needs to be speced out and done. I don't think that it is a strict requirenment for #378 so we can extract that part of the issue out and do it at a later date.

@CMCDragonkai
Copy link
Member

@emmacasolin this NAT testing should eventually incorporate this as well, so that when a keypair changes, any connection should continue to work even through NAT busted connections, or eventually through NAT relayed connections.

@tegefaulkes
Copy link
Contributor

tegefaulkes commented Jun 10, 2022

There is a problem with verifying the certchain when establishing a connection. There seems to only be 1 cert in the cert chain. Even when there are multiple certs in the certChain in the KeyManager. I'm digging into it.

It seems like the receiving side only provides one certificate for the connection. We should be providing the whole chain.

@tegefaulkes
Copy link
Contributor

So the keychange is meant to affect the sigchain and gestaltgraph in some what. lets start by asking what these are...

  • sigchain. This is used to store claims made to and from other nodes and identities. it is stored in a signed chain to maintain authenticity of the claims.
  • gestaltGraph. is used to track all of the nodes within a Gestalt.
  • ACL is used to track the permissions between nodes.

So how would a key change affect the sigchain? Let's see.... If the root KeyPair changes but is not a part of the cert chain then we essentially disavow the old NodeId/KeyPair. If the old nodeId is still in the certChain then the claims should still be valid.

The GestaltGraph and the ACL Needs to have old NodeIds converted over to the new NodeId with it is discovered that the ID has changed. We will need a way to discover that the NodeId has changed. We also need a way to update a NodeId of a node when we find that it has changed.

How does the NodeId changing affect discovery of the node? I'm sure that for a while the old NodeId can still be found but will be updated with the new NodeId when it gets connected to. Do we need some mecanism for ensuring it can be found by it's old ID searching for it on the network? It seems possible that nodes with permissions or part of it's gestalt can become 'estranged'.

The sigchain is not actively affected by the root keyPair change event. They however depend on the changed state in the KeyManager. This just means we need to make sure it checks f the nodeId is part of the certificate chain during the relevant checks.

General spec task outline

  • Need to make sure that the TLS certificate contains the full cert chain and the forward connecton verifys the whole chain. Currently the TLS cert chain only contains the 1 cert.
  • For the SigChain we just need to take in account the full certChain when verifying SigChain data. I think this is done by sigChainUtils.verifyChainData.
  • GestaltGraph and ACL need to know a Node by it's old ID and convert the old ID to the new one over time. Likewise if it needs to verify the ID it needs to use the nodes whole certChain to do so.

@CMCDragonkai
Copy link
Member

Yea key change is a big problem, but that's why I wanted to focus only on maintaining the node connection for now and tackle the key change problem for the other domains separately. Good to keep these in mind for a new issue.

@tegefaulkes
Copy link
Contributor

Ok then, all the connection tests are working except for one case. I'll fix that and then this should be done. The details in the comments above can be part of a new issue tacking this general key change problem.

tegefaulkes added a commit that referenced this issue Jun 10, 2022
Added some tests to check that a root keyPair change propagates properly. Also added tests for the change for existing and new node connections.

#317
tegefaulkes added a commit that referenced this issue Jun 10, 2022
Added some tests to check that a root keyPair change propagates properly. Also added tests for the change for existing and new node connections.

#317
emmacasolin pushed a commit that referenced this issue Jun 14, 2022
Added some tests to check that a root keyPair change propagates properly. Also added tests for the change for existing and new node connections.

#317
@CMCDragonkai
Copy link
Member

A portion of this work was done in #378, only addressing maintaining node connections while keypair is changed. @tegefaulkes please create a new issue for general keypair change testing.

emmacasolin pushed a commit that referenced this issue Jun 14, 2022
Added some tests to check that a root keyPair change propagates properly. Also added tests for the change for existing and new node connections.

#317
@tegefaulkes
Copy link
Contributor

Essentially what was addressed in this issue was handling the root cert changing and by extension the local Agent's NodeId. This means that when the root cert is changed we

  1. Update the status file with the new information.
  2. Update the tlsConfig for the grpcServerClient and Proxy
  3. Reset the NodeGraph buckets with NodeManager.resetBuckets().

What hasn't been addressed is handling remote nodes who's NodeId's have changed. We need to be able to

  1. detect when a node's NodeId has changed.
  2. treat the new NodeId as we would with it's old NodeId.
  3. update our internal information having anything related to the old NodeId be updated to use the new NodeId. This process needs to happen lazily. Domains such as ACL, Gestalt Graph and SigChain will need to handle this.

I will create a new issue specifying what still needs to be done.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jun 20, 2022 via email

@tegefaulkes
Copy link
Contributor

I worked on adding tests for this issue. the bulk of the work was done by Emma. As far as I know these changes were made.

  • src/keys/KeyManager.ts - a changeCallback was added to the constructor. it is called when the root certificate and NodeId changes.
  • src/events/EventBus.ts - was created. This provides an asynchronious way to handle events. It allows the root cert change to propagate to other domains.
  • src/PolykeyAgent.ts - passes the changeCallback to the KeyManager and creates the handler for the cert change event. This event updates the status file, NodeGraph buckets, and the grpcServerCLient and proxy TLS configs.
  • tests/PolykeyAgent.test.ts - intergration tests were added to see if the cert change event propigates properly.
  • tests/nodes/NodeConnection.test.ts - tests were added to check the expected behviour when changing the TLS config in various situations.

I'm writing up the new issue now.

@tegefaulkes
Copy link
Contributor

New issue has been created at #386.

@teebirdy teebirdy added the r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management label Jul 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management
Development

Successfully merging a pull request may close this issue.

4 participants