-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Istio Authservice - Envoy filters/denies traffic to Keycloak token endpoint - AWS EKS environment #1111
Comments
On a related note, since getting the UDS cluster running, I have had issues authenticating with Neuvector. I sometimes am able to login via SSO, other times, I get an authentication error. When I am able to login to Neuvector, I only ever see 2 nodes being scanned (out of 3). The enforcers show all 3 pods running, and I can see in the Neuvector dashboard the node names the enforcers are running on, so it definitely should not be showing only 2 nodes. I haven't dug into why that is happening yet, but it just occurred to me that they may be related. |
I was able to replace the JWKS and token URLs in the I am now however getting an Any help would be appreciated, thanks! |
The I know we currently have environments running in EKS in GovCloud so I don't anticipate any major hurdles there. In terms of your configuration are you setting a valid domain + cert, with DNS setup for those? I noticed the mention of coredns and the DNS script, I'm not overly familiar with the referenced script and am wondering if that could be causing some of the issues here. Would you be able to shell into one of your pods in cluster and perform a curl request to The two errors I'm seeing in the logs you provided are |
Yes, I have a valid cert and DNS for the tenant and admin endpoints
This issue occurs with or without the updated CoreDNS config. Link to the script below.
Pods can NOT reach the
No logs from any Keycloak pods when the timeout happens |
Enabling Istio DNS proxying on cm/istio seems to resolve the issue, at least get me to the to the proxyMetadata:
# Enable basic DNS proxying
ISTIO_META_DNS_CAPTURE: "true" |
Interesting, that's part of why I asked about DNS - if the DNS address for your SSO provide resolves to localhost you would hit an issue similar to this one. Not sure if that's part of what you're encountering? With the |
This comment was marked as resolved.
This comment was marked as resolved.
Hmmm I added a coredns rewrite rule so that requests to SSO domain are routed to the tenant-ingressgateway virtual service. I am still seeing the same error.
The 172.20.44.78:443 address is correct, that is the address of the ingress gateway. Why would envoy be dropping traffic that is being routed locally? istio proxy-config endpoint for authservice: |
The DNS address for the UDS SSO provider URL resolves to a publicly available, external AWS Network Load Balancer. Also, in the last 3 days, I have not been able to get a single request to hit my application or sidecar after obtaining a token. I can't even test my application until that happens, since it relies on Authservice to provide a token. And yes, I reverted my changes to authservice-uds. I created a brand new cluster and the issue persists. It seems likely to be some infrastructure/EKS/AWS configuration that is incompatible with Authservice in UDS, although I'm not sure what that could be... |
I know you mentioned that the coredns change didn't seem to affect it one way or the other, but just to explain it - that should only be necessary if your SSO URL does not have a valid DNS entry setup for it. As long as DNS resolves for the URL (to something other localhost) you shouldn't need the coredns script. That's primarily used as a local dev workaround to allow using a domain that resolves everything to localhost. Nothing in particular stands out in the cluster/cloud setup from what the details I have - we definitely have validated deployments on EKS in GovCloud in the past. Do you encounter issues with other services, or just authservice? I haven't seen any errors to indicate this is the issue but if your cert happens to be self-signed or signed by a Private PKI root you would need to add additional configuration for Authservice to properly trust it. |
@mjnagel, I'm working this issue with Brandon. Any connection to Istio Proxy's DNS_CAPTURE not being enabled by default in a UDS Core deployment? Wouldn't this prevent the Keycloak's package ServiceEntry from being utilized since all DNS lookups would go to kube dns? |
Grafana works perfectly fine without any issues. We are not using using a private PKI. This whole thing is weird because the gateways are publicly available, I can route to my domain properly from outside the cluster. Routing inside the cluster correctly proxies to the NLB public IP addresses so it should work, however the traffic just seemingly gets dropped by envoy with no explanation or logs. And yes, the security groups on the NLB allow ingress from the entire VPC (I wish it were that simple). |
@mikeherrera so from what we've seen the service entries actually do get used, with the caveat of ONLY when kubedns resolves an address to something outside of the pod (i.e. must resolve and not be localhost). Normally (outside of dev scenarios) DNS for those address should meet that criteria, so it should be functional. Once DNS resolves outside of the pod, the istio proxy handles actually directing the traffic to the service entry destination (rather than the original IP it was resolved to). There's two values noted in this issue that could be used to switch to fully using Istio for DNS. We haven't done a ton of testing on that and hadn't had time to dig into the effects/why it isn't the default upstream which is why we haven't made that the default in uds-core yet. If this does end up being the problem here we can definitely evaluate it to see, but from what we experienced it was only necessary in the scenario described where kubedns either doesn't resolve an address or resolves it to localhost. If this is a dev cluster it might be worth seeing if network policies are causing any of the issues here and just deleting them across |
We'll dig into the latter suggestions more, and get back to you. In regards to the linked issue, that's exactly what I found and continue to experiment with. However, even with DNS_CAPTURE enabled, I'm not seeing a listener being added for the sidecar's proxy DNS. I understand what you're saying about your intention for the Service Entry, and the caveat, but unless I'm misunderstanding I'm reading in the docs here, that intention seems contradictory to functionality intended by Istio? My understanding - the Istio Proxy should intercept the DNS request so that instead of resolution to the external, AWS NLB name, it would resolve to the matching internal, cluster service name, as defined by the corresponding Service Entry. It makes no sense for the traffic to leave the network just to come back. No matter what we do, we see the sso authservice name resolve to the external NLB DNS with kube DNS versus the Service Entry's value. I'm going to build a fresh cluster to remove some "variables" and test further. |
You may have already done this but just to make sure - for any changes take effect you would need to cycle istiod, followed by the workload itself I believe in order to make sure you have the latest proxy config for that workload. With regards to the service entry and the DNS docs you linked, DNS proxying is absolutely required for service entries that aren't normally resolvable by DNS. I think a key distinction and clarification is that service entries can still have a place and utility even if not using Istio's DNS proxying. Reading the DNS proxying docs:
The key part there is the last sentence and |
@mikeherrera I did some validation on the behavior described above, testing outside of uds-core to keep the behavior isolated and simple. This gist contains a walkthrough with two different tests of service entries using k3d and istioctl. The second walkthrough is very similar to our setup in uds-core, creating a service entry for each virtual service in our cluster. Note that this was all done without DNS proxying behavior enabled. Hope that helps to display the behavior a bit better - I haven't found any great documentation on this as most use cases of service entries are either using external hosts (that are truly external to the cluster, without endpoints defined) or using internal hosts. |
Assuming DNS is working as expected, I'm still stuck at getting traffic to reach the All other traffic works as expected. I can access all other services behind my load balancer. From within my k8s network, I can reach external services. On a brand new cluster with UDS-Core deployed, I deployed the following Zarf package:
So this should be stock UDS and a basic, simple authservice flow. The only thing that is different is the cluster running on EKS. |
@brcourt to isolate where the issues might be popping up could you test a few requests in cluster?
kubectl run curl-pod --rm -it --restart=Never --image=curlimages/curl -- curl -v https://sso.uds.dev
kubectl run curl-pod --rm -it \
--restart=Never \
--image=curlimages/curl \
--overrides='{
"metadata": {
"labels": {
"sidecar.istio.io/inject": "true",
"batch.kubernetes.io/job-name": "job"
}
}
}' -- curl -v https://sso.uds.dev If both of those succeed then we've probably isolated the issue to network policies - were you able to test hitting that endpoint from a pod after deleting all network policies in the namespace? |
So far, I should be simply enabling Authservice for a UDS package I currently have running in my cluster. I've been struggling to understand why the connection to the
token
endpoint is dropped, but even with debug level logs enabled for Authservice, istio-sidecar, and Keycloak, there isn't any indication of why the connection is failing.Using Authservice for the first time with UDS, and the OIDC flow works correctly, getting a session token and hitting the
auth
endpoint, however the last step of hitting thetoken
endpoint fails and the connection is dropped. Below are the logs I've been able to find for this request.Authservice / Istio-Sidecar debug logs
Keycloak logs
I should be simply enabling Authservice for a service that does run, although obviously with error since I am not yet providing it with an id_token in the Authorization header. My package is configured like so:
and as you can see, I've been enabling ALL the things:
I have the AuthorizationPolicies and RequestPolicies that I am expecting, and the
authservice-uds
secret contains all the correct information I would expect for my client.Other perhaps important information
I am at a loss for what I can do next to troubleshoot the issue. As far as I can tell, I am simply turning on Authservice and pointing it at my running service. I have made no changes to the Keycloak clients that were created or the Authorization/Request Policies or exemptions. I've allowed all the traffic across the cluster...
I'm having a really frustrating experience trying to find out what is going on without any revealing information from the logs. I assume there is something that I am missing, or some caveat perhaps when running in EKS or maybe bottlerocket, but I didn't see any references in your documentation or any issues/PRs in this repo, hence I would like to see if there is something wrong, or perhaps an opportunity to improve the UDS-Core documentation.
The text was updated successfully, but these errors were encountered: