Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AZ-Affinity strategy blog #186

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

adarovadya
Copy link

Description

GLIDE, official open-source Valkey client library, recently added support for a key feature AZ affinity routing, which enables Valkey-based applications to direct calls specifically to server nodes in the same AZ as the client.
In this blog, we dive into AZ affinity routing mechanics, showing how it optimizes application's performance and cost using Valkey GLIDE.

Issues Resolved

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.

Signed-off-by: Adar Ovadia <[email protected]>
Copy link
Member

@stockholmux stockholmux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from the few points in the review (all nits really)

@@ -0,0 +1,166 @@
+++
# `title` is how your post will be listed and what will appear at the top of the post
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete these comments in the front matter.

## AZ Affinity routing advantages

1. **Reduce Data Transfer Costs** Cross-zone data transfer often incurs additional charges in Cloud environments. By ensuring operations are directed to nodes within the same AZ, you can minimize or eliminate these costs.
**Example:** An application in AWS with a Valkey cluster of 2 shards, each with 1 primary and 2 replicas, the instance type is m7g.xlarge. The cluster processes 250MB of data per second and to simplify the example 100% of the traffic is read operation. 50% of this traffic crosses AZs at a cost of $0.01 per GB, the monthly cross-AZ data transfer cost would be approximately $3,285. In addition the cost of the cluster is $0.252 per hour per node. Total of $1,088 per month. By implementing AZ affinity routing, you can reduce the total cost from $4,373 to $1,088, as all traffic remains within the same AZ.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

De-inlining the example looks better.
Screenshot 2025-01-02 at 12 16 36 PM

Suggested change
**Example:** An application in AWS with a Valkey cluster of 2 shards, each with 1 primary and 2 replicas, the instance type is m7g.xlarge. The cluster processes 250MB of data per second and to simplify the example 100% of the traffic is read operation. 50% of this traffic crosses AZs at a cost of $0.01 per GB, the monthly cross-AZ data transfer cost would be approximately $3,285. In addition the cost of the cluster is $0.252 per hour per node. Total of $1,088 per month. By implementing AZ affinity routing, you can reduce the total cost from $4,373 to $1,088, as all traffic remains within the same AZ.
**Example:** An application in AWS with a Valkey cluster of 2 shards, each with 1 primary and 2 replicas, the instance type is m7g.xlarge. The cluster processes 250MB of data per second and to simplify the example 100% of the traffic is read operation. 50% of this traffic crosses AZs at a cost of $0.01 per GB, the monthly cross-AZ data transfer cost would be approximately $3,285. In addition the cost of the cluster is $0.252 per hour per node. Total of $1,088 per month. By implementing AZ affinity routing, you can reduce the total cost from $4,373 to $1,088, as all traffic remains within the same AZ.



2. **Minimize Latency** Distance between AZs within the same region— for example, in AWS, is typically up to 60 miles (100 kilometers)—adds extra roundtrip latency, usually in the range of 500µs to 1000µs. By ensuring requests remain within the same AZ, you can reduce latency and improve the responsiveness of your application.
**Example:**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Example:**
**Example:**


2. **Minimize Latency** Distance between AZs within the same region— for example, in AWS, is typically up to 60 miles (100 kilometers)—adds extra roundtrip latency, usually in the range of 500µs to 1000µs. By ensuring requests remain within the same AZ, you can reduce latency and improve the responsiveness of your application.
**Example:**
Consider a cluster with three nodes, primary and two replicas. Each node located in different availability zone. The client locate in az-2 as replica-1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Consider a cluster with three nodes, primary and two replicas. Each node located in different availability zone. The client locate in az-2 as replica-1.
Consider a cluster with three nodes, primary and two replicas. Each node located in different availability zone. The client locate in az-2 as replica-1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, wording is a little odd.

Suggested change
Consider a cluster with three nodes, primary and two replicas. Each node located in different availability zone. The client locate in az-2 as replica-1.
Consider a cluster with three nodes, one primary and two replicas. Each node is located in a different availability zone. The client is located in az-2 along with replica-1.

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me, just some minor nits. Please address kyle and my comments then we can clean this up and merge it.

Comment on lines +76 to +90
For each node, run the following command and change the AZ and routing address as appropriate:

**Python:**
```python
client.config_set({"availability-zone": az},
route=ByAddressRoute(host="address.example.com", port=6379))
```
**Java:**
```Java
client.configSet(Map.of("availability-zone", az), new ByAddressRoute("address.example.com", 6379))
```
**Node.js:**
```javascript
client.configSet({"availability-zone": az}, { route: {type: "routeByAddress", host:"address.example.com", port:6379}})
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I made my comment here clear. I think this is this wrong way to setup configuration, it's almost always done when starting the server and done in a way that is configured. It think it makes more sense to just link to https://valkey.io/topics/valkey.conf/ or skip this all together. Either that, or have them use the CLI.


2. **Minimize Latency** Distance between AZs within the same region— for example, in AWS, is typically up to 60 miles (100 kilometers)—adds extra roundtrip latency, usually in the range of 500µs to 1000µs. By ensuring requests remain within the same AZ, you can reduce latency and improve the responsiveness of your application.
**Example:**
Consider a cluster with three nodes, primary and two replicas. Each node located in different availability zone. The client locate in az-2 as replica-1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, wording is a little odd.

Suggested change
Consider a cluster with three nodes, primary and two replicas. Each node located in different availability zone. The client locate in az-2 as replica-1.
Consider a cluster with three nodes, one primary and two replicas. Each node is located in a different availability zone. The client is located in az-2 along with replica-1.

```
**Java:**
```Java
/// Initialize Valkey client with preference for the client's AZ
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Initialize Valkey client with preference for the client's AZ
// Initialize Valkey client with preference for the client's AZ

nit: I don't think triple slashes is the recommended comment.

});
/// Write operation (route to the primary's slot owner)
await client.set("key1", "val1");
/// Get will read from one of the replicas in the same client's availability zone if exits.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Get will read from one of the replicas in the same client's availability zone if exits.
/// Get will read from one of the replicas in the same client's availability zone if one exists.

)

# Determine the client's AZ (this could be fetched from the cloud provider's metadata service)
client_az = 'AZ-a'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
client_az = 'AZ-a'
client_az = 'us-east-1a'

You inconsistently use us-east-1a as an example and AZ-a, but the former is clearer imo. I think we can be AWS centric since we wrote the blog, but explain how it works generally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants