Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration errors reset VSAddress to None for all VirtualServers #3723

Open
mikejoh opened this issue Jan 21, 2025 · 1 comment
Open

Configuration errors reset VSAddress to None for all VirtualServers #3723

mikejoh opened this issue Jan 21, 2025 · 1 comment

Comments

@mikejoh
Copy link
Contributor

mikejoh commented Jan 21, 2025

Setup Details

CIS Version: 2.18.1
Build: f5networks/k8s-bigip-ctlr:2.18.1
BIGIP Version: BIG-IP 17.1.1.3 Build 0.0.5 Point Release 3
AS3 Version: 3.51.0 build 5
Agent Mode: AS3
Orchestration: K8S
Orchestration Version: 1.28.13
Pool Mode: Auto
Additional Setup details: Cilium v1.16.0

Description

Before we describe our issue please note that we run the f5-ipam-controller on version f5networks/f5-ipam-controller:0.1.11 and all our VirtualServer objects are configured with a per-cluster configured ipamLabel, the ipam controller are started with these flags:

--orchestration=kubernetes
--ipam-provider=f5-ip-provider
--ip-range={"CLUSTER01-IPAM":"10.0.1.10-10.0.1.254"}
--log-level=DEBUG

The issue we've observed lately, after upgrading from 2.16.x to 2.18.x of the k8s-bigip-ctlr is that when we hit a configuration issue of a VirtualServer that yields a 422 from the F5 BIG-IP all VirtualServer objects move from status OK -> ERROR, at the same time all VSAddress fields are set to None. After a while (seconds) the status changes from ERROR -> OK but the VSAddress is still set to None. After restarting the k8s-bigip-ctlr we get all IP addresses back (same as before since they were never changed/removed in the IPAM controller).

At the moment this is causing a lot of other issues for us, please see the Observations sections for relevant logs.

The k8s-bigip-ctlr is started with the following flags:

--ingress-class=f5
--credentials-directory /tmp/creds
--bigip-partition=cluster01
--bigip-url=https://lb01.example.com/
--custom-resource-mode=true
--http-client-metrics=true
--insecure=true
--ipam=true
--log-as3-response=false
--log-level=INFO
--manage-ingress=false
--node-label-selector=node-role.kubernetes.io/worker=true
--pool-member-type=auto

Steps To Reproduce

These steps should be enough to replicate this, given that the rest of the versions and components are started with the correct flags:

  1. Create at least two VirtualServers with IPAM label and make sure you get an IP address from the f5-ipam-controller.
  2. Reconfigure, edit one of the VirtualServers, in a way that is not allowed (from a F5 AS3 perspective AFAICT). In our case we changed the spec.virtualServerName from my-service-vs to my-service this yielded the 422 error code from the F5. The error is fine and that the VirtualServer object goes into a error state. What's not good is that the IP address are set to None at the same time for the other VirtualServer - which is not reflecting the current state. The current state as we've seen is that the IPAM allocated address is still there so the VSAddress field should not be changed?
  3. Observe how the status and VSAddress fields changes of the object.
  4. Rollback to a version before 2.18, in our case 2.16.1 and perform the same test again.

Expected Result

At most one VirtualServer, the one with the configuration error sould be affected, but it should never update/touch any other VirtualServers.

Actual Result

See above.

Diagnostic Information

N/A

Observations (if any)

One observation we've made are that a lot of logic (code) changed between 2.16 -> 2.18 that is related to the status field of the VirtualServer, the important changes that we as users of the k8s-bigip-ctlr are these:

  • 2.16:
    // VirtualServerStatus is the status of the VirtualServer resource.
    type VirtualServerStatus struct {
    VSAddress string `json:"vsAddress,omitempty"`
    StatusOk string `json:"status,omitempty"`
    }
  • 2.18:
    // VirtualServerStatus is the status of the VirtualServer resource.
    type VirtualServerStatus struct {
    VSAddress string `json:"vsAddress,omitempty"`
    Status string `json:"status,omitempty"`
    LastUpdated metav1.Time `json:"lastUpdated,omitempty"`
    Error string `json:"error,omitempty"`
    }

The field names changed and the struct were expanded with more fields. Could something deeper down have changed so that when an error occurs (e.g. 422) all statuses are changed for some reason, not just for that specific object we're trying to change? Maybe something with the f5-ipam-controller integration messes things up here?

AFAICT no IPAM addresses were deallocated from the ipam controller during the time we had a problem, they were still present (allocated) but the k8s-bigip-ctlr still changed the field to None which is wrong.

f5-ipam-controller log during the time we reconfigured the VirtualServer:

2025/01/21 14:18:39 [DEBUG] Enqueueing on Update: kube-system/cluster01.ipam
2025/01/21 14:18:39 [DEBUG] Processing Key: &{0xc000042160 0xc00045fb80 Update}
2025/01/21 14:18:39 [DEBUG] Updated: kube-system/cluster01.ipam  with Status. Removed
Hostname: my-service.cluster01.k8s.example.com    Key: my-service/my-service.cluster01.k8s.example.com_host    IPAMLabel: CLUSTER01-IPAM    IPAddr: 10.0.1.11   Operation: Delete

2025/01/21 14:18:39 [DEBUG] Enqueueing on Update: kube-system/cluster01.ipam
2025/01/21 14:18:39 [DEBUG] Processing Key: &{0xc00019f340 0xc000042160 Update}
2025/01/21 14:18:40 [DEBUG] Enqueueing on Update: kube-system/cluster01.ipam
2025/01/21 14:18:40 [DEBUG] Processing Key: &{0xc00019f600 0xc00019f340 Update}
2025/01/21 14:18:40 [DEBUG] [CORE] Allocated IP: 10.0.1.11 for Request:
Hostname: my-service.cluster01.k8s.example.com    Key: my-service/my-service.cluster01.k8s.example.com_host    IPAMLabel: CLUSTER01-IPAM    IPAddr:     Operation: Create

2025/01/21 14:18:40 [DEBUG] Updated: kube-system/cluster01.ipam with Status. With IP: 10.0.1.11 for Request:
Hostname: my-service.cluster01.k8s.example.com    Key: my-service/my-service.cluster01.k8s.example.com_host    IPAMLabel: CLUSTER01-IPAM    IPAddr:     Operation: Create

2025/01/21 14:18:40 [DEBUG] Enqueueing on Update: kube-system/cluster01.ipam
2025/01/21 14:18:40 [DEBUG] Processing Key: &{0xc00037a000 0xc00019f600 Update}

No other IPAM entries were touched or updated at the same time period AFAICT.

k8s-bigip-ctlr log during the time we reconfigured the VirtualServer:

2025/01/21 14:18:39 [ERROR] unable to make IPAM Request, will be re-requested soon
E0121 14:18:39.931943       1 worker.go:329] [ERROR] Sync &{my-service VirtualServer my-service 0xc0018ae008 Create  false} failed with unable to make IPAM Request, will be re-requested soon
2025/01/21 14:18:39 [WARNING] Request from cluster local resulted in retry for  CREATE in VIRTUALSERVER my-service/my-service
2025/01/21 14:18:39 [INFO] [Request: 1] cluster local requested CREATE in VIRTUALSERVER my-service/my-service
2025/01/21 14:18:39 [INFO] [Request: 1][AS3] creating a new AS3 manifest
2025/01/21 14:18:40 [INFO] [Request: 1][AS3] posting request to https://lb01.example.com/ for cluster01 tenants
2025/01/21 14:18:40 [ERROR] IP address requested for service: my-service/my-service
2025/01/21 14:18:40 [ERROR] Error while updating VS status:Operation cannot be fulfilled on virtualservers.cis.f5.com "my-service": the object has been modified; please apply your changes to the latest version and try again
2025/01/21 14:18:40 [INFO] [Request: 2] cluster local requested UPDATE in IPAM kube-system/cluster01.ipam
2025/01/21 14:19:09 [ERROR] [Request: 1][AS3] Response from BIG-IP: code: 422 --- tenant:cluster01 --- message: declaration failed
@mikejoh mikejoh added bug untriaged no JIRA created labels Jan 21, 2025
@trinaths
Copy link
Contributor

Created [CONTCNTR-5187] for internal tracking.

@trinaths trinaths added JIRA and removed untriaged no JIRA created labels Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants