Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overload Detection with RING_HASH load balancing and subset enabled #37965

Open
Kagamia-MS opened this issue Jan 10, 2025 · 0 comments
Open

Overload Detection with RING_HASH load balancing and subset enabled #37965

Kagamia-MS opened this issue Jan 10, 2025 · 0 comments
Labels
triage Issue requires triage

Comments

@Kagamia-MS
Copy link

Description:

We have encountered an issue with Envoy Proxy's cluster when RING_HASH load balancing is enabled along with subset configuration. Specifically, the detection of LBEndpoint overload still uses the active requests of the entire cluster as the denominator.

In this scenario, should the overload detection be based only on the nodes within the subset?

https://github.com/envoyproxy/envoy/blob/a0c96b389d2ef44ff207bb17678a5c5eabdbbadb/source/extensions/load_balancing_policies/common/thread_aware_lb_impl.cc#L199C1-L204C94

Steps to Reproduce:

  1. Configure a cluster with RING_HASH load balancing with 4 upstream endpoints, A and B are in the same subset.
  2. Send equal amounts of long-delay requests to endpoint A, C, and D.
  3. Send requests to the subset without hashing key.

Expected Behavior: The overload detection should consider only the active requests of the nodes within the subset, so the load balancer always forward requests to B.

Actual Behavior: The overload detection uses the active requests of the entire cluster, so randomly forward requests to both A and B until A reach the overload threshold.

clusters:
- name: fd_mesh_cluster
  connect_timeout: 0.25s
  type: STRICT_DNS
  lb_policy: RING_HASH
  upstream_connection_options:
    tcp_keepalive: {}
  common_lb_config:
    consistent_hashing_lb_config:
      hash_balance_factor: 200
  lb_subset_config:
    fallback_policy: ANY_ENDPOINT
    subset_selectors:
    - keys:
      - logical_group
  load_assignment:
    cluster_name: fd_mesh_endpoint
    endpoints:
    - lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: "127.0.0.1"
              port_value: 5001
        metadata:
          filter_metadata:
            envoy.lb:
              logical_group: blue
      - endpoint:
          address:
            socket_address:
              address: "127.0.0.1"
              port_value: 5002
        metadata:
          filter_metadata:
            envoy.lb:
              logical_group: blue
      - endpoint:
          address:
            socket_address:
              address: "127.0.0.1"
              port_value: 5003
      - endpoint:
          address:
            socket_address:
              address: "127.0.0.1"
              port_value: 5004
@Kagamia-MS Kagamia-MS added the triage Issue requires triage label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Issue requires triage
Projects
None yet
Development

No branches or pull requests

1 participant