Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting vcluster takes a long time and API response is slow #1900

Open
hr-tsu opened this issue Jul 2, 2024 · 4 comments
Open

Starting vcluster takes a long time and API response is slow #1900

hr-tsu opened this issue Jul 2, 2024 · 4 comments
Labels

Comments

@hr-tsu
Copy link

hr-tsu commented Jul 2, 2024

What happened?

When I run the vcluster create command, the command keeps waiting until the timeout, and then an error occurs.

% vcluster create vlsuter-debug
17:31:04 info Create vcluster vlsuter-debug...
17:31:04 info execute command: helm upgrade vlsuter-debug /var/folders/2h/0dl61nlx07n4fylcxw67sngc0000gp/T/vcluster-0.20.0-beta.5.tgz-2989503029 --kubeconfig /var/folders/2h/0dl61nlx07n4fylcxw67sngc0000gp/T/1793836983 --namespace vcluster --install --repository-config='' --values /var/folders/2h/0dl61nlx07n4fylcxw67sngc0000gp/T/1752636891
17:31:06 done Successfully created virtual cluster vlsuter-debug in namespace vcluster
17:32:06 info Waiting for vcluster to come up...
17:32:06 warn vcluster is waiting, because vcluster pod vlsuter-debug-0 has status: Init:0/3
17:32:17 warn vcluster is waiting, because vcluster pod vlsuter-debug-0 has status: Init:0/3
17:32:27 warn vcluster is waiting, because vcluster pod vlsuter-debug-0 has status: Init:0/3
17:32:38 warn vcluster is waiting, because vcluster pod vlsuter-debug-0 has status: Init:0/3
17:32:49 warn vcluster is waiting, because vcluster pod vlsuter-debug-0 has status: Init:0/3
17:33:00 warn vcluster is waiting, because vcluster pod vlsuter-debug-0 has status: Init:0/3
17:33:11 warn vcluster is waiting, because vcluster pod vlsuter-debug-0 has status: Init:1/3
17:33:22 warn vcluster is waiting, because vcluster pod vlsuter-debug-0 has status: Init:1/3
17:41:06 fatal failed to parse kube config: wait for vcluster: client rate limiter Wait returned an error: context deadline exceeded

However, the process seems to be continuing after the error, and after waiting for 2~3h, coredns are created on the host and the vcluster seems to have started (vcluster statefulsets are also running).

% vcluster list
           NAME        |     NAMESPACE      | STATUS  |    VERSION    | CONNECTED |    AGE
  ---------------------+--------------------+---------+---------------+-----------+-------------
    vcluster-debug     | vcluster-tsuchiya  | Running | 0.20.0-beta.5 |           | 19m25s
2024-07-01 10:26:11 INFO    commandwriter/commandwriter.go:126  Slow SQL (started: 2024-07-01 10:25:12.531404609 +0000 UTC m=+93.976543691) (total time: 59.425698212s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/health false]] {"component": "vcluster", "component": "kine", "time": "2024-07-01T10:26:11.957183227Z", "level": "info"}
2024-07-01 10:26:11 INFO    commandwriter/commandwriter.go:126  Slow SQL (started: 2024-07-01 10:25:41.706720112 +0000 UTC m=+123.151859217) (total time: 30.25053982s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/resourcequotas/kube-node-lease/% false]]   {"component": "vcluster", "component": "kine", "time": "2024-07-01T10:26:11.957363891Z", "level": "info"}
2024-07-01 10:26:11 INFO    commandwriter/commandwriter.go:126  Slow SQL (started: 2024-07-01 10:25:40.679653418 +0000 UTC m=+122.124792499) (total time: 31.277724555s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/controllers/% false]]   {"component": "vcluster", "component": "kine", "time": "2024-07-01T10:26:11.957477047Z", "level": "info"}
2024-07-01 10:26:11 INFO    commandwriter/commandwriter.go:126  Slow SQL (started: 2024-07-01 10:25:31.467582359 +0000 UTC m=+112.912721474) (total time: 40.489952153s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/clusterroles/% false]]  {"component": "vcluster", "component": "kine", "time": "2024-07-01T10:26:11.957598185Z", "level": "info"}
2024-07-01 10:26:11 INFO    commandwriter/commandwriter.go:126  Slow SQL (started: 2024-07-01 10:25:27.298280599 +0000 UTC m=+108.743419704) (total time: 44.659296202s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/roles/% false]] {"component": "vcluster", "component": "kine", "time": "2024-07-01T10:26:11.957661015Z", "level": "info"}
2024-07-01 10:26:11 INFO    commandwriter/commandwriter.go:126  Slow SQL (started: 2024-07-01 10:25:30.262912024 +0000 UTC m=+111.708051125) (total time: 41.694783275s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/namespaces/% false]]    {"component": "vcluster", "component": "kine", "time": "2024-07-01T10:26:11.957749559Z", "level": "info"}
2024-07-01 10:26:11 INFO    commandwriter/commandwriter.go:126  Slow SQL (started: 2024-07-01 10:25:42.577832235 +0000 UTC m=+124.022971337) (total time: 29.380034876s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/controllerrevisions/% false]]   {"component": "vcluster", "component": "kine", "time": "2024-07-01T10:26:11.957930108Z", "level": "info"}
2024-07-01 10:26:11 INFO    commandwriter/commandwriter.go:126  Slow SQL (started: 2024-07-01 10:25:14.531691988 +0000 UTC m=+95.976831073) (total time: 57.426216029s):  SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC  : [[/registry/health false]] {"component": "vcluster", "component": "kine", "time": "2024-07-01T10:26:11.957967405Z", "level": "info"}
2024-07-01 10:26:11 INFO    commandwriter/commandwriter.go:126  Slow SQL (started: 2024-07-01 10:25:31.211477835 +0000 UTC m=+112.656616940) (total time: 40.746481705s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/priorityclasses/% false]]   {"component": "vcluster", "component": "kine", "time": "2024-07-01T10:26:11.958040577Z", "level": "info"}
2024-07-01 10:26:11 INFO    commandwriter/commandwriter.go:126  Slow SQL (started: 2024-07-01 10:25:41.51519597 +0000 UTC m=+122.960335055) (total time: 30.442772809s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/clusterrolebindings/% false]]    {"component": "vcluster", "component": "kine", "time": "2024-07-01T10:26:11.95805738Z", "level": "info"}
2024-07-01 10:26:11 INFO    commandwriter/commandwriter.go:126  Slow SQL (started: 2024-07-01 10:25:24.703721241 +0000 UTC m=+106.148860311) (total time: 47.254266965s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/horizontalpodautoscalers/% false]]  {"component": "vcluster", "component": "kine", "time": "2024-07-01T10:26:11.958066288Z", "level": "info"}
2024-07-01 10:26:11 INFO    commandwriter/commandwriter.go:126  Slow SQL (started: 2024-07-01 10:25:40.315286761 +0000 UTC m=+121.760425838) (total time: 31.642926394s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/volumeattachments/% false]] {"component": "vcluster", "component": "kine", "time": "2024-07-01T10:26:11.958263937Z", "level": "info"}
2024-07-01 10:26:11 INFO    commandwriter/commandwriter.go:126  Slow SQL (started: 2024-07-01 10:25:16.164139624 +0000 UTC m=+97.609278726) (total time: 55.794088971s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv JOIN ( SELECT MAX(mkv.id) AS id FROM kine AS mkv WHERE mkv.name LIKE ? GROUP BY mkv.name) AS maxkv ON maxkv.id = kv.id WHERE kv.deleted = 0 OR ? ) AS lkv ORDER BY lkv.theid ASC ) c : [[/registry/pods/% false]]   {"component": "vcluster", "component": "kine", "time": "2024-07-01T10:26:11.958304142Z", "level": "info"}
2024-07-01 10:26:11 INFO    commandwriter/commandwriter.go:126  Slow SQL (started: 2024-07-01 10:25:16.381819263 +0000 UTC m=+97.826958368) (total time: 55.576580865s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), COUNT(c.theid) FROM ( SELECT * FROM ( SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv...

When I try to connect to the created vCluster and create a pod, the API takes quite a long time to respond, which interferes with the creation of the pod.

% kubectl get all
% kubectl apply -f https://k8s.io/examples/application/deployment.yaml
deployment.apps/nginx-deployment created
% date; time kubectl get pod
2024年 7月 2日 火曜日 16時39分53秒 JST
NAME                               READY   STATUS    RESTARTS   AGE
nginx-deployment-cbdccf466-ccz6s   0/1     Pending   0          80s
nginx-deployment-cbdccf466-j4dzh   0/1     Pending   0          80s
kubectl get pod  0.06s user 0.04s system 0% cpu 20.149 total
% date; time kubectl get pod
2024年 7月 2日 火曜日 16時42分56秒 JST
NAME                               READY   STATUS    RESTARTS   AGE
nginx-deployment-cbdccf466-j4dzh   0/1     Pending   0          5m31s
nginx-deployment-cbdccf466-ccz6s   1/1     Running   0          5m31s
kubectl get pod  0.06s user 0.03s system 31% cpu 0.281 total

What did you expect to happen?

In another environment, the response is coming in at this rate, so I would like it to be that way.
The following log environment is the Minikube environment.

ubuntu@k8s-worker-2:~$ vcluster create my-vcluster --namespace team-xvcluster create my-vcluster --namespace team-x
06:41:36 info Downloading [command helm]
06:41:37 info Creating namespace team-x
06:41:37 info Detected local kubernetes cluster minikube. Will deploy vcluster with a NodePort & sync real nodes
06:41:37 info Create vcluster my-vcluster...
06:41:37 info execute command: helm upgrade my-vcluster /tmp/vcluster-0.20.0-beta.9.tgz-2881249151 --create-namespace --kubeconfig /tmp/3016070667 --namespace team-x --install --repository-config='' --values /tmp/596304531
06:41:39 done Successfully created virtual cluster my-vcluster in namespace team-x
06:41:41 info Waiting for vcluster to come up...
06:41:55 warn vcluster is waiting, because vcluster pod my-vcluster-0 has status: Init:0/3
06:42:07 warn vcluster is waiting, because vcluster pod my-vcluster-0 has status: Init:1/3
06:42:17 warn vcluster is waiting, because vcluster pod my-vcluster-0 has status: Init:2/3
06:42:41 info Starting proxy container...
06:42:46 done Switched active kube context to vcluster_my-vcluster_team-x_minikube
- Use `vcluster disconnect` to return to your previous kube context
- Use `kubectl get namespaces` to access the vcluster

How can we reproduce it (as minimally and precisely as possible)?

% vcluster create vlsuter-debug

% kubectl apply -f https://k8s.io/examples/application/deployment.yaml
deployment.apps/nginx-deployment created

Anything else we need to know?

No response

Host cluster Kubernetes version

$ kubectl version
# % kubectl version
Client Version: v1.30.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.8

vcluster version

vcluster --version
vcluster version 0.20.0-beta.5

VCluster Config

No settings.

# My vcluster.yaml / values.yaml here
@hr-tsu hr-tsu added the kind/bug label Jul 2, 2024
@FabianKramm
Copy link
Member

@hr-tsu thanks for creating this issue! Are you using NFS or similar as a backing store for the vCluster? These slow SQL queries show that the underlying storage seems to be slow. To check that you can turn off persistence via:

controlPlane:
  statefulSet:
    persistence:
      volumeClaim:
        enabled: false

@hr-tsu
Copy link
Author

hr-tsu commented Jul 4, 2024

@FabianKramm
Thanks for the comment.
The storage product was in this environment, so I put in the settings you gave me and re-created it, and it was created immediately, just like the Minikube environment.

Shouldn't the vCluster environment use the storage product?
Will I keep having the same problem if I use the storage product?

@heiko-braun
Copy link
Contributor

@hr-tsu these are good questions. Using w/o persistence is not recommened in the majority of cases, except for maybe ephemeral workloads (i.e. CI/CD tests). But @FabianKramm merely wanted to validate if it's related to the underlying storage – which it seems to be.

What I would recommend is using a different backing store in your case. By default it's sqlite with 0.20.0, which has limitations on NFS volumes.

Take a look here if you want to enable persistence, but switch to a different backing store:
https://www.vcluster.com/docs/vcluster/configure/vcluster-yaml/control-plane/components/backing-store/

But keep in mind that these are install time settings that cannot easily be changed after the fact.

@hr-tsu
Copy link
Author

hr-tsu commented Jul 9, 2024

@heiko-braun
Thank you for your comment.

Our current environment was using NetApp as the storage product.
We will try another backing-store based on the settings you provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants