-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic addresses #189
Comments
Well, I just noticed I was too quick to judge. It is true that it throws a bunch of warnings at the beginning because the old leader isn't around any more, but after a moment, the nodes have elected a new leader, and it's business as usual. Although, I am not sure if its problematic, if some of the same IPs are assigned. For example, node 1 was leader before with a specific ip but after restart the ip is assigned to node 2. This can be observed in these logs.
I am using a function like this to know the IPs to connect to. Providing the DNS name of the headless service or, in this case, the shared docker network alias. func clusterAddresses(ctx context.Context, dns, port string) (string, []string, error) {
host, err := os.Hostname()
if err != nil {
return "", nil, err
}
r := net.Resolver{}
ips, err := r.LookupHost(ctx, host)
if err != nil {
return "", nil, err
}
if len(ips) == 0 {
return "", nil, fmt.Errorf("no IPs found for %s", dns)
}
ownIp := ips[0]
ips, err = r.LookupHost(ctx, dns)
if err != nil {
return "", nil, err
}
clusterIps := make([]string, 0, len(ips)-1)
for _, ip := range ips {
if ip == ownIp {
continue
}
clusterIps = append(clusterIps, net.JoinHostPort(ip, port))
}
log.Printf("own ip: %s, cluster members %v", ownIp, clusterIps)
return net.JoinHostPort(ownIp, port), clusterIps, nil
} And then I start the app like this: ownAddr, members, err := clusterAddresses(ctx, clusterDNS, sqlPort)
if err != nil {
return err
}
dqlite, err := app.New(
dataDir,
app.WithAddress(ownAddr),
app.WithCluster(members),
) |
You can use either an IP or a resolvable DNS name as argument of the I believe both Kubernetes and Docker have options to do this, for example See the k8s docs: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#stable-network-id |
Yes, I am referring to the provided link, but what is described as stable network ID there is actually regarding DNS. It's about the fact that the same pod with the same state (volume etc) will always be i.e. myapp-0, or myapp-1. They will never change, and they have no random string suffix like pods from a deployment. There is no guarantee that the same IP will be assigned to the pod, though, at least per my understanding. It's true, you can assign subnets and IPs within subnets to containers in docker. I am not sure if that works in Kubernetes too. I know you can do it for k8s services, but I haven't seen this for pods. Usually, cluster type of applications have shared headless services that return all the IPs of the pods of the stateful set when doing a dns lookup, and they can find each other like that. My code above was created with this kind of thing in mind. From my experimentation, it seems like go-dqlite nodes are able to recover from messed up IPs. It takes a couple seconds, and then they run OK, it appears. It's just at first there is no leader and hey have to do an election it seems. I will research if it's possible to assign static IPs to pods, however this is error-prone as there may not be a way to guarantee that the IP won't be taken at some point by something else. |
Oh, I think that dns was not working with your testing because it only works if you also enable TLS (e.g. with Do you have a chance of configuring your dqlite |
See this test setup function for an example of how to configure your |
I will experiment with that, thanks for the hint. But I think its not working because it wants to bind a network interface. You cannot bind a network interface with a hostname or dns afaik. You need the actual ip. Nameservers point to the IP of the network interface via dns records. It works with But again, I will check this out in depth and report back. |
And there's some more info in the |
Right, but if you use the Changing the value that you pass to |
It stores dns names in the config with this.
But I have issues with the nodes accepting each other's certificate.
I used the openssl command from the comments there, to generate a cert for each apps name like Since they are self-signed, I would think they all need to share the same CA or at least have access to the CA, so they can validate the cert of the other apps. OR should this be the same cert for all apps with all the dns names inside? like dns.1 dns.2 in subject alternate names? |
The code looks like this. cert, err := tls.LoadX509KeyPair("cluster.crt", "cluster.key")
if err != nil {
return err
}
data, err := ioutil.ReadFile("cluster.crt")
if err != nil {
return err
}
pool := x509.NewCertPool()
pool.AppendCertsFromPEM(data)
nodeDNS, ok := os.LookupEnv("NODE_DNS")
if !ok {
return fmt.Errorf("CLUSTER_DNS not set")
}
var statefulZeroDNS []string
if !strings.HasSuffix(nodeDNS, "-0") {
dns := regexp.MustCompile(`-\d+$`).ReplaceAllString(nodeDNS, "-0")
statefulZeroDNS = []string{net.JoinHostPort(dns, sqlPort)}
}
dqlite, err := app.New(
dataDir,
app.WithTLS(app.SimpleTLSConfig(cert, pool)),
app.WithAddress(net.JoinHostPort(nodeDNS, sqlPort)),
app.WithCluster(statefulZeroDNS),
app.WithLogFunc(func(l client.LogLevel, format string, a ...interface{}) {
if l < 2 {
return
}
log.Printf(fmt.Sprintf("%s: %s\n", l.String(), format), a...)
}),
) And each container creates its own certifcate, in the entrypint. openssl req -x509 -newkey rsa:4096 -sha256 -days 3650 \
-nodes -keyout cluster.key -out cluster.crt -subj "/CN=$NODE_DNS" \
-addext "subjectAltName=DNS:$NODE_DNS" Will keep playing with it tomorrow. Its sleepy time now. |
You should generate the certificate only once, then copy it on all nodes before starting them the first time If you want a separate certificate for each node, then a more complex |
It works when they all have the same certificate. They all start up OK and find each other with host names. There are still some issues though. When they start up, they still have this moment of warnings where they all report that they don't have a leader, but eventually it works. But it becomes really problematic when using health checks. Because in that case app-1 does not start until app-0 is healthy, but app-0 does not become healthy because it wants to connect to app-1 as its leader. The below output is after restarting the container with
After a while, it's marked as unhealthy by the orchestrator and killed, halting the entire application rollout. Now, this may be better in Kubernetes, because I think it will also shut down the stateful set in reverse order so that they hand off their leadership, but I am not sure on this. And it's probably not very solid to rely on this. If it's even the case, I had to test it. |
It works when not using Then you get logs like the following, but eventually it works. Logs
|
I'm not entirely sure about what your setup and code looks like (e.g. what are the healthy checks etc). Would it be possible to see the code that you are using? I'd recommend to start with the same code as the Using |
I started from that code. The code has the same issue, I would assume, because the net listener only starts to listen on the http port after Ready unblocks. But Ready won't unblock until a connection to the leader has been made, which may be a container that is started after the current one in the order of startup of the statefulset. My healthcheck here is not doing much apart responding on a http ping endpoint. The idea was to use it to know if the server has actually started / app.Ready has unblocked. Also keep in mind, I am taking about the scenario where all the container are stopped and then started again. Here is the code I am currently using: https://gist.github.com/bluebrown/f5abd384da488a8f356042662a8b929d |
The compose file to mimic a statefulsets behaviour looks like this, in a nutshell. I have removed some fields from the services for brevity. But the important parts, healthcheck and depends on, are there. services:
app-0:
image: testapp
healthcheck:
test: httpcheck http://localhost:8080/ping
app-1:
image: testapp
healthcheck:
test: httpcheck http://localhost:8080/ping
depends_on:
app-0: { condition: "service_healthy" }
app-2:
image: testapp
healthcheck:
test: httpcheck http://localhost:8080/ping
depends_on:
app-0: { condition: "service_healthy" }
app-1: { condition: "service_healthy" }
volumes:
app-0:
app-1:
app-2: |
I have created a repo here. You should be able to spin it the code up with |
Ok, thank you very much! That should make debugging easier. I didn't yet try it out, but I'm making some notes below in case you wan to experiment further. Perhaps @MathieuBordere could step in too and try to reproduce the problem using your repo?
As a side note regarding 3., I'll also add that even for the very first run you should be able to start all nodes in parallel (if Hope that helps. Please let me know if always starting nodes in parallel regardless of the healthcheck does the trick. |
To sum up a bit, the orchestrator (k8s or docker) should:
|
Hi, thanks for the feedback. I add some though to the mentioned points. 1. Always assign the same hostname to the same node 2. Always try to (re)start a node when it's down, with no particular ordering or dependency with respect to other nodes 3. If it's the very first run, then app-0 MUST eventually show up in order for the application to be functional 4. If it's not the first run, if 2 nodes out of 3 show up, then your application will be functional In that sense, I can remove the depends on clause from my compose simulation. So that they all start at the same time just like a parallel statefulset. That way, I can also block with So it should be all sorted out this way. Thanks again for your help :) I have still 2 open question, apart from the solved main problem:
|
PS. Regarding the certificate. I removed the IP because the IP is not static. So I can't really provide it. It works without, it seems. In the provided OpenSSL command is also using a single IP, but you will usually have more than 1 node, so even then the IP doesn't seem to make a lot of sense, since the cert is shared. It was the reason why I tried to give each app its own certificate at some point. |
Good to know. I'll probably try to get rid of it as well and possibly change the docs. Thanks. |
First, I believe the performance hit is very likely negligible, so even if you go to straight TCP you're app probably won't run any faster. I don't have hard data, but network and disk latency should largely dominate any overhad due to TLS. Having said that, the problem is that The reason
In that case all you need is a copy of the data directory from any node, and then call |
OK makes, sense. Thank you. I am not good with c, otherwise I could try to contribute here. The Reconfigure option is handy to have. That's good to know. I will create a working Kubernetes example, considering the discussed points, and report back. Maybe someone can benefit from it in the future. Thank you for taking your time, to explain all these things. |
Having a working Kubernetes example it's definitely a valuable contribution, let us know, thanks! |
I started working on a kubernetes setup but for some reason 1 pod out of 3 always fails. It works locally without any issues. The project is here https://github.com/bluebrown/dqlite-kubernetes-demo The failing pod shows different types of warnings:
The app-0 pod, which is the entrypoint for the cluster also shows some warnings. But in this case app-2 was eventually able to connect while app-1 was the one failing because it was after 5 minutes still not ready. 022/07/30 06:56:45 starting server
2022/07/30 06:57:15 WARN: change dqlite-app-2.dqlite-app-headless.sandbox.svc.cluster.local:9000 from spare to voter: a configuration change is already in progress (5)
2022/07/30 06:57:15 WARN: adjust roles: could not assign role voter to any node |
Hi, I have refactored the project but I am still not able to run it in kubernetes. It is currently in this branch https://github.com/bluebrown/dqlite-kubernetes-demo/tree/refactor. Below are the logs of the 3 pods. Any idea what is wrong?
|
I feel like app-2 is giving up before app-0, is ready. It's strange because app-1 succeeds. After 5 minutes, app-2 was restarted. Now I have the below logs for app-2. It's also strange that in app-2 which is supposed to be the cluster node to connect to, does not show any logs that app-2 is communicating with it.
|
Ok, I found the issue. The reason its failing is that the nodes need to communicate to each other. But the readiness probe paired with the cluster dns via headless service is preventing that. This is because a service will not route traffic to a pod that is not ready. In this case the pod wont become ready unless it can have that traffic. A quick and dirty solution is to disable the health checks but I think its also possible to connect the pods directly without going over the service. That is using It would be even better if dqlite would resolve hosts based on the search option in the |
|
OK, using the search option in /etc/resolv.conf does actually work. The issue was that a statefulset requires using the governing headless service for DNS resolution. It's not possible to resolve the pods by name. I found a well hidden option which I could use on the headless service. Setting I think with that, all the problems are solved:
I am planning to explain the setup in detail in the readme of the project. I have already merged the branch. https://github.com/bluebrown/dqlite-kubernetes-demo. If you are interested, you can have a look and perhaps provide feedback. If its all good, maybe we can link it in your documentation so that someone else does not have to go through the same hustle. |
Hi, I noticed that the addresses of the nodes are saved in a file. I am experimenting with container deployments and those usually don't come with fixed repeatable IPs. For example, docker compose or Kubernetes stateful set.
I was trying to use DNS as the address of the leader to connect to, but the nodes look up the IP of this hostname and remember it.
The problem is, if the IPs change after restart or due to other reasons, the nodes are not able to start any more.
I am unsure how this could be solved. Likewise, I am thinking of deleting the files from the data volumes which contain this information before starting the node, but I don't know what file the actual data is in and what file can be deleted.
Is there some recommended way to deal with this scenario? I would assume it is not uncommon these days to deploy apps in a Kubernetes cluster, for example.
The text was updated successfully, but these errors were encountered: