You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When there's a single invalid certificate in the consul certificate store in use by fabio, all the other certificates in the store will not be refreshed.
Upon a fresh start of fabio (like a server reboot), this causes no single ssl endpoint to function anymore.
Steps to reproduce
We're running fabio using the fabiolb/fabio:latest image from dockerhub on a nomad cluster as system job (running on every worker node).
First off, here's the used config loaded into /etc/fabio/fabio.properties within the container from a nomad template:
The certificates are loaded into /certs within consul as bundles, so /certs/example.com.pem contains the full chain including the private key.
To replicate the issue, insert a single certificate where the private key does not match the certificate. This will result in fabio not loading any certificates.
Expected result
Fabio to load all valid certificates, and ignore/skips the invalid certificates, allowing domains where valid certificates are available for to continue to function.
Actual result
Fabio refuses to load or update any certificate, causing all services it's loadbalancing to fail upon ssl initialization because it fails to find a valid certificate for the servername indicated by SNI.
The text was updated successfully, but these errors were encountered:
func loadCertificates(...) tries to load all certs and if one fails, an error is returned.
The logs do reflect this and this is expected behaviour for the code as it stands now. should we change it is another question.
For proof-of-concept setups having a single invalid one fail the entire batch makes sense, but for production setups having one invalid certificate not take down the entire updating or ingesting of all certificates "feels" unstable.
Note that it does not take down already-running instances, but it does remove the ability to spin up new instances of fabio. When running a cluster on hardware, this is not a major issue, but when running a cluster on spot-instances in AWS where there's significant churn this will take down the entire ingress path of the cluster.
So even if both types of behaviors would be desirable depending on the context, having that be configurable would be a major improvement in my opinion.
Regardless of the approach taken, having a clearly defined log that states which certificate was invalid would be a great help already, allowing me to add a pattern matcher to our logs that notifies me whenever something goes wrong.
In the short term reporting all errors with certificates would be the solution. Similar to how addNeighbors(...) does. Currently any errors are reported with the following error messages:
Summary
When there's a single invalid certificate in the consul certificate store in use by fabio, all the other certificates in the store will not be refreshed.
Upon a fresh start of fabio (like a server reboot), this causes no single ssl endpoint to function anymore.
Steps to reproduce
We're running fabio using the
fabiolb/fabio:latest
image from dockerhub on a nomad cluster as system job (running on every worker node).First off, here's the used config loaded into /etc/fabio/fabio.properties within the container from a nomad template:
The certificates are loaded into
/certs
within consul as bundles, so/certs/example.com.pem
contains the full chain including the private key.To replicate the issue, insert a single certificate where the private key does not match the certificate. This will result in fabio not loading any certificates.
Expected result
Fabio to load all valid certificates, and ignore/skips the invalid certificates, allowing domains where valid certificates are available for to continue to function.
Actual result
Fabio refuses to load or update any certificate, causing all services it's loadbalancing to fail upon ssl initialization because it fails to find a valid certificate for the servername indicated by SNI.
The text was updated successfully, but these errors were encountered: