-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to define rack topology aware configuration #1801
Comments
@andrey-dubnik How many Infinispan replicas do you want to reside in each availability zone? If a single replica is sufficient, then it should be possible to use the anti-affinity configuration to achieve this: spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: infinispan-pod
clusterName: <cluster_name>
infinispan_cr: <cluster_name>
topologyKey: "topology.kubernetes.io/zone" For multiple replicas in each of the availability zones, you could replicate the rack aware topology you describe by having multiple Infinispan clusters. Each cluster would have it's affinity settings configured so that scheduled pods reside in their given AZ, and it's site backups are the clusters in the other AZs. |
@ryanemerson indeed if the total instance count is 3 then we are good with the affinity configuration, the problem surfaces itself when the cluster grows beyond 3 nodes in the distributed cache topology. Having multiple clusters can be an option but it also introduces an overhead of managing the cache configurations and proto schemas across all the replication participating clusters + handling the client connections etc. Another potential problem is replicated setup is exactly that so it does not enable the data distribution and each replicated cluster is going to contain a full data copy which makes it not very optimal scenario as for the case of a very large cluster we are going to replicate the memory footprint 3 times over. If rack option is available for the Infinispan it could be beneficial to use it as it simplifies the cluster management, reduces the cost of infrastructure and increases the availability when working within the k8s. reg. the data replica count I was thinking to keep 2-3 replicas to cover all AZs + add more nodes when we need to increase a cluster capacity. |
@andrey-dubnik Infinispan already provides Server Hinting to ensure that data is replicated appropriately amongst the replicas, however this isn't currently utilised by the Operator as there's a performance concern ISPN-12505 that needs to be addressed. If implemented and combined with the appropriate affinity configuration, would this satisfy your requirements or do you have a further need to explicitly state where individual pods should reside? |
@ryanemerson I was thinking of using TopologyAwareSyncConsistentHashFactory accordingly to my understanding of the documentation statement When it comes to the Server Hinting my understanding is hinting equals to the data pining and if I specify rack in will stick it to a specific rack ID and won't distribute across multiple racks. If my understanding is correct this won't likely achieve the desired outcome of placing the data copies into different availability zones. Is my understanding correct? If it is then hinting won't likely help if cluster node is not aware of the rack. |
It's the other way around. With server hinting, if the "rack" field is specified on the individual pods and multiple racks exist, then the hash ensures that the primary and backup replica(s) for a given segment are distributed across distinct racks. |
@ryanemerson alright - so the Server Hinting actually describes the node topology and if each node is having a different cache-container transport hint Infinispan would account for that when distributing the data. Will there be an option to source the Server Hinting data from the k8s labels? I would like to achieve following outcome
|
I think we'll need to provide two things to satisfy your requirements:
|
@ryanemerson this looks like it There may be 1 nice to have feature - each rack to run in a dedicated stateful set, e.g. This may be needed as with a current affinity scheduling there is no 100% guarantee for the workload to be distributed across AZs, although almost always nodes are going to be created in a cycling fashion across AZs by the cloud provider within a single AZ aware node pool. |
Hi,
Maybe a newbie question and there is already a perfect way to configure a rack aware topology...
Currently I can't see a way to specify rack topology for the cluster node deployment meaning that even for the multiple data copies there is a risk of the data copy placement into the nodes within the same availability zone (AZ). Affinity block can make sure nodes are distributed across the zones but there is no zonal topology awareness other than the node.name and the cluster.name.
It would be great if Infinispan operator allowed for the Rack grouping of nodes. The use case for it would be running the nodes across different failure domains within the same cluster so the data is guaranteed to be replicated into a different availability zone.
Example on how this is implemented in k8ssandra operator where k8s labels are associated with the racks. This configuration results in the nodes provisioned with the rack attributes configured and required node affinity block (generated for the actual deployment) makes sure the PODs scheduled for the rack configuration are only placed into the nodes with the corresponding labels.
The text was updated successfully, but these errors were encountered: