Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get root ca certificate script retires forever #170

Open
nkhine opened this issue May 9, 2017 · 6 comments
Open

get root ca certificate script retires forever #170

nkhine opened this issue May 9, 2017 · 6 comments
Assignees
Labels

Comments

@nkhine
Copy link
Contributor

nkhine commented May 9, 2017

Hello, I just tried to install a new cluster with the latest code of tack, everything goes smoothly until,

scripts/do-task "get root ca certificate" scripts/get-ca
❤ get root ca certificate
+ source scripts/retry
+ echo .cfssl
.cfssl
++ terraform output s3-bucket
+ PKI_S3_BUCKET=kz8s-pki-test-12345-eu-west-2
+ CA_PATH=s3://kz8s-pki-test-12345-eu-west-2/ca.pem
+ mkdir -p .cfssl
+ _retry '❤ Grabbing s3://kz8s-pki-test-12345-eu-west-2/ca.pem' aws s3 cp s3://kz8s-pki-test-12345-eu-west-2/ca.pem .cfssl
+ '[' -z aws ']'
+ echo -n ❤ Grabbing s3://kz8s-pki-test-12345-eu-west-2/ca.pem
❤ Grabbing s3://kz8s-pki-test-12345-eu-west-2/ca.pem+ printf .
.+ aws s3 cp s3://kz8s-pki-test-12345-eu-west-2/ca.pem .cfssl
+ sleep 5.2
...

navigating to the S3 bucket, it is empty.

➜  dev git:(deploy) ✗ echo $DIR_SSL                                                                                                         (git)-[deploy] 

➜  dev git:(deploy) ✗ 

returns empty!
what am i missing?

@wellsie
Copy link
Member

wellsie commented May 9, 2017 via email

@nkhine
Copy link
Contributor Author

nkhine commented May 9, 2017 via email

@wellsie
Copy link
Member

wellsie commented May 9, 2017

can you ssh into the pki host and run systemctl status and journalctl -fl to verify everything ran ok there ?

@frodeanonsen
Copy link

I get the exact same behavior. I have waited for a few hours, but no avail. I do see some errors after journalctl -fl output:

May 09 19:10:09 ip-10-0-10-10.eu-central-1.compute.internal systemd-machined[1064]: New machine rkt-cfafa0e6-875b-4a52-8203-0359db5df445.
May 09 19:10:10 ip-10-0-10-10.eu-central-1.compute.internal fetch-from-s3[1015]: [ 5020.196959] awscli[5]: fatal error: An error occurred (400) when calling the HeadObject operation: Bad Request
May 09 19:10:10 ip-10-0-10-10.eu-central-1.compute.internal systemd[1]: Stopped Container rkt-cfafa0e6-875b-4a52-8203-0359db5df445.
May 09 19:10:10 ip-10-0-10-10.eu-central-1.compute.internal fetch-from-s3[1015]: retrying
May 09 19:10:10 ip-10-0-10-10.eu-central-1.compute.internal systemd-machined[1064]: Machine rkt-cfafa0e6-875b-4a52-8203-0359db5df445 terminated

@frodeanonsen
Copy link

In my previous comment I ran journalctl on the wrong host. After running on the PKI host like you said @wellsie , I got the actual errors, and it also contained the answers to how to fix it. I've created a pull request for this issue. Hope this helps you as well @nkhine
Here is the actual log output from the PKI host:

May 09 17:48:24 ip-10-0-10-9.eu-central-1.compute.internal systemd[1]: Starting Generate rootca and save to s3...
May 09 17:48:24 ip-10-0-10-9.eu-central-1.compute.internal sh[1193]: 2017/05/09 17:48:24 [INFO] generating a new CA key and certificate from CSR
May 09 17:48:24 ip-10-0-10-9.eu-central-1.compute.internal sh[1193]: 2017/05/09 17:48:24 [INFO] generate received request
May 09 17:48:24 ip-10-0-10-9.eu-central-1.compute.internal sh[1193]: 2017/05/09 17:48:24 [INFO] received CSR
May 09 17:48:24 ip-10-0-10-9.eu-central-1.compute.internal sh[1193]: 2017/05/09 17:48:24 [INFO] generating key: rsa-2048
May 09 17:48:24 ip-10-0-10-9.eu-central-1.compute.internal sh[1193]: 2017/05/09 17:48:24 [INFO] encoded CSR
May 09 17:48:24 ip-10-0-10-9.eu-central-1.compute.internal sh[1193]: 2017/05/09 17:48:24 [INFO] signed certificate with serial number 182607777373557159922755035748643937117304384766
May 09 17:48:26 ip-10-0-10-9.eu-central-1.compute.internal systemd-machined[1078]: New machine rkt-36dc9401-4a40-4a39-a810-c49e7b2e693b.
May 09 17:48:26 ip-10-0-10-9.eu-central-1.compute.internal rkt[1212]: [   97.035259] awscli[5]: upload failed: etc/cfssl/ca.pem to s3://kz8s-pki-test-702434813013-eu-central-1/ca.pem An error occurred (InvalidRequest) when calling the PutObject operation: You are attempting to operate on a bucket in a region that requires Signature Version 4.  You can fix this issue by explicitly providing the correct region location using the --region argument, the AWS_DEFAULT_REGION environment variable, or the region variable in the AWS CLI configuration file.  You can get the bucket's location by running "aws s3api get-bucket-location --bucket BUCKET".
May 09 17:48:27 ip-10-0-10-9.eu-central-1.compute.internal systemd[1]: generate-rootca.service: Main process exited, code=exited, status=1/FAILURE
May 09 17:48:27 ip-10-0-10-9.eu-central-1.compute.internal systemd[1]: Failed to start Generate rootca and save to s3.
May 09 17:48:27 ip-10-0-10-9.eu-central-1.compute.internal systemd[1]: generate-rootca.service: Unit entered failed state.
May 09 17:48:27 ip-10-0-10-9.eu-central-1.compute.internal coreos-cloudinit[893]: 2017/05/09 17:48:27 Result of "start" on "generate-rootca.service": failed
May 09 17:48:27 ip-10-0-10-9.eu-central-1.compute.internal coreos-cloudinit[893]: 2017/05/09 17:48:27 Calling unit command "start" on "cfssl.service"
May 09 17:48:27 ip-10-0-10-9.eu-central-1.compute.internal systemd[1]: generate-rootca.service: Failed with result 'exit-code'.
May 09 17:48:27 ip-10-0-10-9.eu-central-1.compute.internal systemd-machined[1078]: Machine rkt-36dc9401-4a40-4a39-a810-c49e7b2e693b terminated.

@nkhine
Copy link
Contributor Author

nkhine commented May 10, 2017

@wellsie systemctl status has this output

May 10 09:15:14 ip-10-0-10-9.eu-west-2.compute.internal locksmithd[1304]: Unlocking old locks failed: error setting up lock: Error initializing etcd client: client: etcd clus
May 10 09:19:35 ip-10-0-10-9.eu-west-2.compute.internal systemd[1]: Started OpenSSH per-connection server daemon (10.0.0.106:33398).
May 10 09:19:35 ip-10-0-10-9.eu-west-2.compute.internal sshd[1411]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.0.0.106  user=c
May 10 09:19:37 ip-10-0-10-9.eu-west-2.compute.internal sshd[1409]: PAM: Authentication failure for core from 10.0.0.106
May 10 09:19:37 ip-10-0-10-9.eu-west-2.compute.internal sshd[1412]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.0.0.106  user=c
May 10 09:19:39 ip-10-0-10-9.eu-west-2.compute.internal sshd[1409]: PAM: Authentication failure for core from 10.0.0.106
May 10 09:19:39 ip-10-0-10-9.eu-west-2.compute.internal sshd[1413]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.0.0.106  user=c
May 10 09:19:41 ip-10-0-10-9.eu-west-2.compute.internal sshd[1409]: PAM: Authentication failure for core from 10.0.0.106
May 10 09:19:41 ip-10-0-10-9.eu-west-2.compute.internal sshd[1409]: Failed none for core from 10.0.0.106 port 33398 ssh2
May 10 09:19:41 ip-10-0-10-9.eu-west-2.compute.internal sshd[1409]: Failed password for core from 10.0.0.106 port 33398 ssh2
May 10 09:19:41 ip-10-0-10-9.eu-west-2.compute.internal sshd[1409]: Failed password for core from 10.0.0.106 port 33398 ssh2
                                         
                                         write-files:
                                           - path: /etc/cfssl/ca-csr.json
                                             content: |
                                               {
                                                 "CN": "CA",
                                                 "key": { "algo": "rsa", "size": 2048 },
                                                 "names": [{ "C": "US", "L": "San Francisco", "O": "Kubernetes", "ST": "California" }]
                                               }
                                         
                                           - path: /etc/cfssl/ca-config.json
                                             content: |
                                               {
                                                 "signing": {
                                                   "default": { "expiry": "43800h" },
                                                   "profiles": {
                                                     "server": {
                                                       "expiry": "43800h",
                                                       "usages": [ "signing", "key encipherment", "server auth" ]
                                                     },
                                                     "client": {
                                                       "expiry": "43800h",
                                                       "usages": [ "signing", "key encipherment", "client auth" ]
                                                     },
                                                     "client-server": {
                                                       "expiry": "43800h",
                                                       "usages": [ "signing", "key encipherment", "server auth", "client auth" ]
                                                     }
                                                   }
                                                 }
                                               }
                                         
                                           - path: /etc/cfssl/s3-bucket
                                             content: kz8s-pki-dhegdheer-378239092462-eu-west-2
May 10 08:59:14 localhost ignition[429]: failed to fetch config: not a config (found coreos-cloudconfig)
May 10 08:59:14 localhost audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=ignition-files comm="systemd" exe="/usr/lib64/systemd/syste
May 10 08:59:14 localhost audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=ignition-files comm="systemd" exe="/usr/lib64/systemd/system
May 10 08:59:14 localhost ignition[429]: not a config (found coreos-cloudconfig): ignoring user-provided config
May 10 08:59:14 localhost systemd[1]: Started Ignition (files).
May 10 08:59:14 localhost ignition[429]: files: op(1): [started]  processing unit "[email protected]"
May 10 08:59:14 localhost systemd[1]: Starting Reload Configuration from the Real Root...
May 10 08:59:14 localhost ignition[429]: files: op(1): [finished] processing unit "[email protected]"
May 10 08:59:14 localhost systemd[1]: Reloading.

and journalctl -fl:

core@ip-10-0-10-9 ~ $ journalctl -fl 
-- Logs begin at Wed 2017-05-10 08:59:11 UTC. --
May 10 09:27:23 ip-10-0-10-9.eu-west-2.compute.internal su[1475]: Successful su for root by root
May 10 09:27:23 ip-10-0-10-9.eu-west-2.compute.internal su[1475]: + /dev/pts/0 root:root
May 10 09:27:23 ip-10-0-10-9.eu-west-2.compute.internal su[1475]: pam_unix(su:session): session opened for user root by core(uid=0)
May 10 09:27:23 ip-10-0-10-9.eu-west-2.compute.internal su[1475]: pam_systemd(su:session): Cannot create session: Already running in a session
May 10 09:29:24 ip-10-0-10-9.eu-west-2.compute.internal systemd-timesyncd[742]: Network configuration changed, trying to establish connection.
May 10 09:29:25 ip-10-0-10-9.eu-west-2.compute.internal systemd-timesyncd[742]: Synchronized to time server 80.82.244.120:123 (2.coreos.pool.ntp.org).
May 10 09:30:14 ip-10-0-10-9.eu-west-2.compute.internal locksmithd[1304]: Unlocking old locks failed: error setting up lock: Error initializing etcd client: client: etcd cluster is unavailable or misconfigured. Retrying in 5m0s.
May 10 09:35:14 ip-10-0-10-9.eu-west-2.compute.internal locksmithd[1304]: Unlocking old locks failed: error setting up lock: Error initializing etcd client: client: etcd cluster is unavailable or misconfigured. Retrying in 5m0s.
May 10 09:37:10 ip-10-0-10-9.eu-west-2.compute.internal su[1475]: pam_unix(su:session): session closed for user root
May 10 09:37:10 ip-10-0-10-9.eu-west-2.compute.internal sudo[1474]: pam_unix(sudo:session): session closed for user root

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants