Provisioning the control VM failed in the Kubernetes cluster #13056
Replies: 7 comments 7 replies
-
|
Facing the same issue, any help will be appreciated |
Beta Was this translation helpful? Give feedback.
-
|
Any exceptions in logs, @anirudh09041 ? |
Beta Was this translation helpful? Give feedback.
-
|
Few questions @anirudh09041
|
Beta Was this translation helpful? Give feedback.
-
|
@anirudh09041 what is the ubuntu template are you using the templates fromt this link https://download.cloudstack.org/testing/custom_templates/ubuntu/22.04/ |
Beta Was this translation helpful? Give feedback.
-
|
HI @Pearl1594 , below are the answers to the previously mentioned questions. Can you please have a look. Yes, the attachIso command is being sent and it even succeeds. Here's the exact sequence from my management server log: 06:50:55.765 - AttachCommand sent for i-2-120-VM I cannot SSH into the control node because port forwarding doesn't appear to be set up. The VM is on an isolated network (192.168.100.x). The cloud-init script times out waiting for /mnt/k8sdisk/ after 100 attempts (25 minutes), which is expected since the ISO is detached. This is 100% consistent and happens every time. When I manually attach the ISO via virsh change-media, the ISO stays attached but the VM is still stuck at the login prompt because the patch.sh deployment script is hung waiting for qemu-guest-agent (which is not installed in my Ubuntu cloud image). |
Beta Was this translation helpful? Give feedback.
-
|
since you are using the public templates and it's not completed provisioning, can you try logging in via console use username: cloud passowrd: cloud. If you can do that, attach the iso again and run the |
Beta Was this translation helpful? Give feedback.
-
|
Hi @anirudh09041 and @Pearl1594. I'm a teammate of @anirudh09041 and here is what I've found so far. @Pearl1594 can you please help us with a solution based on the RCA below?
KubernetesClusterStartWorker logs show a similar error: So the real failure seems to be: CKS finishes provisioning the VMs and FW/PF/LB rules, then calls KubernetesClusterUtil.isKubernetesClusterServerRunning(), which fails because the API server never came up. The kube-apiserver never came up because each VM was stuck "Starting" for 50 minutes So cluster provisioning sat for ~100 min (50 min × 2 VMs sequentially) just waiting for the VMs to be marked Started. The 50-minute stall is the KVM agent repeatedly retrying patch.sh patch.sh times out because qemu-guest-agent isn’t responding in the guest Here is the custom template that we’re using: That template doesn't seem to have qemu-guest-agent installed (or running at boot), so patch.sh has nothing to talk to — hence the timeout in the previous block. And patch.sh hangs trying to inject /var/lib/cloud/data/cmdline Even after patch.sh finally gives up, cloud-init inside the guest can't make progress: it sits in its own retry loop waiting for the binaries ISO at /mnt/k8sdisk/ to appear. The retry length is configured globally as 100 × 15s = 25 min: CKS only actually attaches the binaries ISO at the very end of the workflow (06:50:55 below) — by then cloud-init has been dead for over an hour. And the ISO attach is mmediately followed by a detach 1s later. Summary: qemu-guest-agent is missing or disabled in the Ubuntu-CKS-Fresh template, so the host's patch.sh has nothing to talk to and times out for 50 minutes per VM (~100 min total for both). Without patch.sh, the cmdline metadata is never injected, the cks-init module inside cloud-init never runs, and kubeadm-init never executes. When CKS eventually calls isKubernetesClusterServerRunning(), the API server isn't up, the readiness check fails, and the cluster goes to Error. @Pearl1594 please let us know how to remediate this issue. Any help would be greatly appreciated! |
Beta Was this translation helpful? Give feedback.







Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Getting the below error when provisioning the cluster.
Error
Provisioning the control VM failed in the Kubernetes cluster : test-cluster-2
in Error description i got this :
Description
Error while creating Kubernetes cluster. domain(s): [test-cluster-2-control-19db5014c45, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster, kubernetes.default.svc.cluster.local] addresses: [10.96.33.163]
What could be the underlying issue ?? How can i resolve this.
Beta Was this translation helpful? Give feedback.
All reactions