Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SKS-1804: Improve reconcileNetwork() to speed up fetching VM IP #144

Merged
merged 2 commits into from
Sep 1, 2023

Conversation

Levi080513
Copy link
Contributor

@Levi080513 Levi080513 commented Aug 31, 2023

问题

[SKS-1804] 优化ECP CNI场景下,ELFMachine Network Ready的等待时间 - Jira http://jira.smartx.com/browse/SKS-1804

ECP CNI场景下,VM携带有双网卡,一张网卡为静态IP,一张网卡需要DHCP,当前已有的处理无法覆盖该场景。

修复

  1. 删除getK8sNodeIP方法里面,判断ControlPlaneInitializedCondition的逻辑。当第一个KCP VM获取IP慢时,ControlPlaneInitialized也会在较长时间后才会变成true(经过调查很可能是CAPI的bug)
  2. 优化reconcileNetwork方法处理逻辑变更:
  • 存在DHCP网卡时,需要所有的DHCP网卡都获取到IP则认为network ready。
  • 不存在DHCP网卡时,则至少保证从Tower API/k8s node获取到一个IP,才认为network ready

测试

ECP CNI KSC集群

kc get ksc -n default  hw-sks-test-1.24.17-04 -ojson | jq .spec.network.cni
{
  "ecpConfig": {
    "fakeIP": "100.64.254.254/32",
    "ippools": [
      {
        "cidr": "10.255.67.0/25",
        "gateway": "10.255.0.1",
        "name": "test",
        "subnet": "10.255.0.0/16"
      }
    ],
    "uplinkIP": "240.255.0.1/32"
  },
  "name": "ecp"
}

kc get ksc -n default  hw-sks-test-1.24.17-04 -ojson | jq .spec.topology.controlPlane.nodeConfig.network
{
  "devices": [
    {
      "networkType": "IPV4_DHCP",
      "tag": "default",
      "vlan": "dd1f408f-7715-48c1-a817-13c3568f1d93_4cd00407-63ca-440b-80b7-ceacfccb8d08"
    },
    {
      "ipAddrs": [
        "240.255.0.1"
      ],
      "netmask": "255.255.255.255",
      "networkType": "IPV4",
      "tag": "ecp",
      "vlan": "dd1f408f-7715-48c1-a817-13c3568f1d93_4cd00407-63ca-440b-80b7-ceacfccb8d08"
    }
  ]
}

k8s node创建时间

kc get nodes  hw-sks-test-1.24.17-04-controlplane-6wtfn  -ojson | jq .metadata.creationTimestamp
"2023-08-31T05:31:33Z"

ELFMachine VMProvisioned时间为2023-08-31T05:31:42Z

[
  {
    "lastTransitionTime": "2023-08-31T05:31:42Z",
    "status": "True",
    "type": "Ready"
  },
  {
    "lastTransitionTime": "2023-08-31T05:30:15Z",
    "status": "True",
    "type": "TowerAvailable"
  },
  {
    "lastTransitionTime": "2023-08-31T05:31:42Z",
    "status": "True",
    "type": "VMProvisioned"
  }
]

vmtools采集上报时间为 13:32:22
image

ELFMachine VMProvisioned时间 先于vmtools数据上报时间,符合预期

单DHCP网卡集群

 kc get ksc -n default  hw-sks-test-1.24.17-04-single -ojson | jq .spec.topology.controlPlane.nodeConfig.network
{
  "devices": [
    {
      "networkType": "IPV4_DHCP",
      "tag": "default",
      "vlan": "dd1f408f-7715-48c1-a817-13c3568f1d93_4cd00407-63ca-440b-80b7-ceacfccb8d08"
    }
  ]
}

k8s node创建时间

kc get node hw-sks-test-1.24.17-04-single-controlplane-pkppw  -ojson | jq .metadata.creationTimestamp
"2023-08-31T05:47:54Z"

ELFMachine VMProvisioned时间为2023-08-31T05:47:55Z

 kc get elfmachine -n default hw-sks-test-1.24.17-04-single-controlplane-pkppw -ojson | jq .status.conditions
[
  {
    "lastTransitionTime": "2023-08-31T05:47:55Z",
    "status": "True",
    "type": "Ready"
  },
  {
    "lastTransitionTime": "2023-08-31T05:46:40Z",
    "status": "True",
    "type": "TowerAvailable"
  },
  {
    "lastTransitionTime": "2023-08-31T05:47:55Z",
    "status": "True",
    "type": "VMProvisioned"
  }
]

vmtools采集上报时间为 13:48:59
image

ELFMachine VMProvisioned时间 先于vmtools数据上报时间,符合预期

双DHCP网卡集群

  kc get ksc -n default  hw-sks-test-1.24.17-04-double -ojson | jq .spec.topology.controlPlane.nodeConfig.network
{
  "devices": [
    {
      "networkType": "IPV4_DHCP",
      "tag": "default",
      "vlan": "dd1f408f-7715-48c1-a817-13c3568f1d93_4cd00407-63ca-440b-80b7-ceacfccb8d08"
    },
    {
      "networkType": "IPV4_DHCP",
      "tag": "default",
      "vlan": "dd1f408f-7715-48c1-a817-13c3568f1d93_4cd00407-63ca-440b-80b7-ceacfccb8d08"
    }
  ]
}

k8s node创建时间

kc get node hw-sks-test-1.24.17-04-double-controlplane-4bgsg -ojson | jq .metadata.creationTimestamp
"2023-08-31T05:49:19Z"

ELFMachine VMProvisioned时间为2023-08-31T05:50:35Z

kc get elfmachine -n default hw-sks-test-1.24.17-04-double-controlplane-4bgsg -ojson | jq .status.conditions
[
  {
    "lastTransitionTime": "2023-08-31T05:50:35Z",
    "status": "True",
    "type": "Ready"
  },
  {
    "lastTransitionTime": "2023-08-31T05:47:56Z",
    "status": "True",
    "type": "TowerAvailable"
  },
  {
    "lastTransitionTime": "2023-08-31T05:50:35Z",
    "status": "True",
    "type": "VMProvisioned"
  }
]

vmtools采集数据上报时间
image

ELFMachine VMProvisioned时间 晚于vmtools数据上报时间,符合预期

单静态IP网卡集群

kc get ksc -n default  hw-sks-test-1.24.17-04-single-static -ojson | jq .spec.topology.controlPlane.nodeConfig.network
{
  "devices": [
    {
      "ipAddrs": [
        "10.255.233.186"
      ],
      "netmask": "255.255.0.0",
      "networkType": "IPV4",
      "routes": [
        {
          "gateway": "10.255.0.1"
        }
      ],
      "tag": "default",
      "vlan": "dd1f408f-7715-48c1-a817-13c3568f1d93_4cd00407-63ca-440b-80b7-ceacfccb8d08"
    }
  ],
  "nameservers": [
    "10.255.0.2"
  ]
}

k8s node创建时间

kc get nodes hw-sks-test-1.24.17-04-single-static-controlplane-679vf -ojson | jq .metadata.creationTimestamp
"2023-08-31T06:33:11Z"

ELF Machine VMProvisioned 时间为 2023-08-31T06:33:19Z

[
  {
    "lastTransitionTime": "2023-08-31T06:33:19Z",
    "status": "True",
    "type": "Ready"
  },
  {
    "lastTransitionTime": "2023-08-31T06:31:19Z",
    "status": "True",
    "type": "TowerAvailable"
  },
  {
    "lastTransitionTime": "2023-08-31T06:33:19Z",
    "status": "True",
    "type": "VMProvisioned"
  }
]

vmtools采集数据时间

image

ELF Machine VMProvisioned 在k8s node创建之后,符合预期

双静态IP网卡集群

 kc get ksc hw-sks-test-1.24.17-04-two-static -n default -ojson | jq .spec.topology.controlPlane.nodeConfig.network
{
  "devices": [
    {
      "ipAddrs": [
        "10.255.233.186"
      ],
      "netmask": "255.255.0.0",
      "networkType": "IPV4",
      "routes": [
        {
          "gateway": "10.255.0.1"
        }
      ],
      "tag": "default",
      "vlan": "dd1f408f-7715-48c1-a817-13c3568f1d93_4cd00407-63ca-440b-80b7-ceacfccb8d08"
    },
    {
      "ipAddrs": [
        "10.255.233.187"
      ],
      "netmask": "255.255.0.0",
      "networkType": "IPV4",
      "routes": [
        {
          "gateway": "10.255.0.1"
        }
      ],
      "tag": "default",
      "vlan": "dd1f408f-7715-48c1-a817-13c3568f1d93_4cd00407-63ca-440b-80b7-ceacfccb8d08"
    }
  ],
  "nameservers": [
    "10.255.0.2"
  ]
}

k8s node创建时间

kc get node hw-sks-test-1.24.17-04-two-static-controlplane-qht7s -ojson | jq .metadata.creationTimestamp
"2023-08-31T06:54:51Z"

ELFMachine VMProvisioned时间为2023-08-31T06:54:52Z

[
  {
    "lastTransitionTime": "2023-08-31T06:54:52Z",
    "status": "True",
    "type": "Ready"
  },
  {
    "lastTransitionTime": "2023-08-31T06:53:30Z",
    "status": "True",
    "type": "TowerAvailable"
  },
  {
    "lastTransitionTime": "2023-08-31T06:54:52Z",
    "status": "True",
    "type": "VMProvisioned"
  }
]

vmtools采集数据上报时间
image

ELF Machine VMProvisioned 在k8s node创建之后,符合预期

@codecov
Copy link

codecov bot commented Aug 31, 2023

Codecov Report

Merging #144 (df3a6ba) into master (7253a00) will increase coverage by 0.06%.
The diff coverage is 81.25%.

@@            Coverage Diff             @@
##           master     #144      +/-   ##
==========================================
+ Coverage   55.83%   55.89%   +0.06%     
==========================================
  Files          16       16              
  Lines        2735     2739       +4     
==========================================
+ Hits         1527     1531       +4     
  Misses       1073     1073              
  Partials      135      135              
Files Changed Coverage Δ
controllers/elfmachine_controller.go 74.78% <81.25%> (+0.11%) ⬆️

@jessehu jessehu changed the title SKS-1804: Optimize reconcileNetwork Func SKS-1804: Improve reconcileNetwork() to speed up fetching VM IP process Aug 31, 2023
@jessehu jessehu changed the title SKS-1804: Improve reconcileNetwork() to speed up fetching VM IP process SKS-1804: Improve reconcileNetwork() to speed up fetching VM IP Aug 31, 2023
Copy link
Collaborator

@jessehu jessehu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

controllers/elfmachine_controller.go Outdated Show resolved Hide resolved
controllers/elfmachine_controller.go Outdated Show resolved Hide resolved
controllers/elfmachine_controller.go Outdated Show resolved Hide resolved
@Levi080513 Levi080513 merged commit c4cc5fd into master Sep 1, 2023
3 checks passed
@Levi080513 Levi080513 deleted the huangwei/speed-up-vm-network-ready branch September 1, 2023 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants