Provisioning the control VM failed in the Kubernetes cluster #13056

anirudh09041 · 2026-04-22T12:01:20Z

anirudh09041
Apr 22, 2026

Getting the below error when provisioning the cluster.

Error
Provisioning the control VM failed in the Kubernetes cluster : test-cluster-2

in Error description i got this :
Description
Error while creating Kubernetes cluster. domain(s): [test-cluster-2-control-19db5014c45, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster, kubernetes.default.svc.cluster.local] addresses: [10.96.33.163]

What could be the underlying issue ?? How can i resolve this.

laxmansingh18200 · 2026-04-23T07:05:16Z

laxmansingh18200
Apr 23, 2026

Facing the same issue, any help will be appreciated

0 replies

DaanHoogland · 2026-04-28T13:57:28Z

DaanHoogland
Apr 28, 2026
Collaborator

Any exceptions in logs, @anirudh09041 ?

5 replies

anirudh09041 May 4, 2026
Author

I can see now the control vm is getting started but when i login through console i see the below message and after 30 mins the cluster provisioning fails.

anirudh09041 May 4, 2026
Author

i am using the below configuration

compute offering

Template

DaanHoogland May 4, 2026
Collaborator

please study the logs for any related errors.

anirudh09041 May 5, 2026
Author

HI @DaanHoogland , i found out that iso is not getting attached

[root@cloudstack secondary]# virsh domblklist i-2-120-VM --details
Type Device Target Source

file disk vda /var/lib/libvirt/images/38d592d7-dc21-47b7-84ee-c6489687959f
file cdrom hdc -

[root@cloudstack secondary]#

anirudh09041 May 5, 2026
Author

i have set up the iso in the below way

Pearl1594 · 2026-05-05T12:34:04Z

Pearl1594
May 5, 2026
Collaborator

Few questions @anirudh09041

In the management server logs, you do see the attachIso command being sent?
Can you ssh into the control node ssh -i -p 2222 cloud@<control_ip>
If you can SSH into the control node, can you check the /var/log/cloud-init-output.log or /var/log/daemon.log files to see if there are any issues reported there.
Does the node provisioning time out waiting on the binaries? or it starts after sometime ?
Is this consistent or a one-off occurrence?
Can you try force stopping and starting the cluster just to see if that works.

0 replies

kiranchavala · 2026-05-05T13:30:20Z

kiranchavala
May 5, 2026
Collaborator

@anirudh09041 what is the ubuntu template are you using the templates fromt this link

https://download.cloudstack.org/testing/custom_templates/ubuntu/22.04/

1 reply

anirudh09041 May 5, 2026
Author

@kiranchavala , i have tried both the below templates but same issue

https://download.cloudstack.org/testing/custom_templates/ubuntu/22.04/cks-ubuntu-2204-kvm.qcow2.bz2

https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img

anirudh09041 · 2026-05-05T15:37:42Z

anirudh09041
May 5, 2026
Author

HI @Pearl1594 , below are the answers to the previously mentioned questions. Can you please have a look.

Yes, the attachIso command is being sent and it even succeeds. Here's the exact sequence from my management server log:

06:50:55.765 - AttachCommand sent for i-2-120-VM
06:50:55.955 - "Attached binaries ISO for VM" (SUCCESS)
06:50:56.043 - DettachCommand sent for the SAME ISO on the SAME VM
The ISO attaches successfully but is immediately detached ~278ms later by a DettachCommand (note the typo "Dettach" instead of "Detach"). This happens for both control and worker nodes.

I cannot SSH into the control node because port forwarding doesn't appear to be set up. The VM is on an isolated network (192.168.100.x).

The cloud-init script times out waiting for /mnt/k8sdisk/ after 100 attempts (25 minutes), which is expected since the ISO is detached.

This is 100% consistent and happens every time.

When I manually attach the ISO via virsh change-media, the ISO stays attached but the VM is still stuck at the login prompt because the patch.sh deployment script is hung waiting for qemu-guest-agent (which is not installed in my Ubuntu cloud image).

0 replies

Pearl1594 · 2026-05-05T16:04:26Z

Pearl1594
May 5, 2026
Collaborator

since you are using the public templates and it's not completed provisioning, can you try logging in via console use username: cloud passowrd: cloud. If you can do that, attach the iso again and run the /opt/bin/setup-kube-system script and then the deploy-kube-system script. I'm not sure if you'd be able to log in with those, after a specific point of cluster setup, it revokes access.

1 reply

anirudh09041 May 5, 2026
Author

I tried doing login but not able to login.

According to you what should be the ideal templates which i should use and from where.
Kubernetes ISOs and Templates in order to provision a cks cluster.

rishabhjain1997 · 2026-05-06T22:09:20Z

rishabhjain1997
May 6, 2026

Hi @anirudh09041 and @Pearl1594. I'm a teammate of @anirudh09041 and here is what I've found so far. @Pearl1594 can you please help us with a solution based on the RCA below?

The actual error seems to be the async job error:

 $ grep "Complete async job-774" /var/log/cloudstack/management/management-server.log
                                                                                                                                                        
  2026-05-05 06:50:56,274 DEBUG ... Complete async job-774, jobStatus: FAILED, resultCode: 530,                                                         
  result: ... "errortext":"Failed to setup Kubernetes cluster : test-cluster-23 is not in                                                               
  usable state as the system is unable to access control node VMs of the cluster"

KubernetesClusterStartWorker logs show a similar error:

2026-05-05 06:50:56,035 ERROR [c.c.k.c.a.KubernetesClusterStartWorker]                                                                                
  Failed to setup Kubernetes cluster : test-cluster-23 is not in usable state                                                                           
  as the system is unable to access control node VMs of the cluster

So the real failure seems to be: CKS finishes provisioning the VMs and FW/PF/LB rules, then calls KubernetesClusterUtil.isKubernetesClusterServerRunning(), which fails because the API server never came up.

The kube-apiserver never came up because each VM was stuck "Starting" for 50 minutes

$ grep -E "i-2-120-VM.*StartCommand|i-2-120-VM.*Start completed" \                                                                                    
      /var/log/cloudstack/management/management-server.log                                                                                              
                                                                                                                                                        
  05:10:40,293 ... Sending { Cmd ... StartCommand ... id=120 ...     <- StartCommand sent                                                               
  06:00:43,090 ... Start completed for VM ... i-2-120-VM             <- StartAnswer arrived

So cluster provisioning sat for ~100 min (50 min × 2 VMs sequentially) just waiting for the VMs to be marked Started.

The 50-minute stall is the KVM agent repeatedly retrying patch.sh

$ grep "patch.sh -n i-2-120-VM" /var/log/cloudstack/agent/agent.log                                                                                   
                                                                                                                                                        
  05:15:43 WARN ... Process [2551078] for command [/usr/share/cloudstack-common/scripts/                                                                
     vm/hypervisor/kvm/patch.sh -n i-2-120-VM -c template=domP name=test-cluster-23-control                                                             
     ...] timed out. Output is [].                                                                                                                      
  05:20:43 WARN ... Process [2567946] ... timed out. Output is [].                                                                                      
  05:25:43 WARN ... Process [2588320] ... timed out. Output is [].                                                                                      
  05:30:43 WARN ... Process [2605401] ... timed out. Output is [].                                                                                      
  05:35:43 WARN ... Process [2621993] ... timed out. Output is [].                                                                                      
  05:40:43 WARN ... Process [2638725] ... timed out. Output is [].                                                                                      
  05:45:43 WARN ... Process [2655334] ... timed out. Output is [].                                                                                      
  05:50:43 WARN ... Process [2672092] ... timed out. Output is [].
  05:55:43 WARN ... Process [2688808] ... timed out. Output is [].                                                                                      
  06:00:43 WARN ... Process [2705703] ... timed out. Output is [].

patch.sh times out because qemu-guest-agent isn’t responding in the guest

[root@cloudstack ~]# virsh qemu-agent-command i-2-120-VM '{"execute":"guest-ping"}' --timeout 5
error: Guest agent is not responding: QEMU guest agent is not connected

Here is the custom template that we’re using:

mysql> SELECT name, format, url FROM vm_template WHERE id = 217;
  +------------------+--------+-------------------------------------------+                                                                             
  | name             | format | url                                       |                                                                             
  +------------------+--------+-------------------------------------------+                                                                             
  | Ubuntu-CKS-Fresh | QCOW2  | http://10.96.32.32/ubuntu-cks-fresh.qcow2 |                                                                             
  +------------------+--------+-------------------------------------------+

That template doesn't seem to have qemu-guest-agent installed (or running at boot), so patch.sh has nothing to talk to — hence the timeout in the previous block.

And patch.sh hangs trying to inject /var/lib/cloud/data/cmdline

 mysql> SELECT name, value FROM configuration
         WHERE name IN ('cloud.kubernetes.control.node.install.attempt.wait.duration',
                        'cloud.kubernetes.control.node.install.reattempt.count');                                                                       
  +-------------------------------------------------------------+-------+                                                                               
  | cloud.kubernetes.control.node.install.attempt.wait.duration |    15 |                                                                               
  | cloud.kubernetes.control.node.install.reattempt.count       |   100 |                                                                               
  +-------------------------------------------------------------+-------+

Even after patch.sh finally gives up, cloud-init inside the guest can't make progress: it sits in its own retry loop waiting for the binaries ISO at /mnt/k8sdisk/ to appear. The retry length is configured globally as 100 × 15s = 25 min:

CKS only actually attaches the binaries ISO at the very end of the workflow (06:50:55 below) — by then cloud-init has been dead for over an hour.

$ grep -E "Attached binaries ISO|Detached Kubernetes binaries|isKubernetesClusterServerRunning" \
      /var/log/cloudstack/management/management-server.log                                                                                              
   
  06:50:55,955 INFO  ... Attached binaries ISO for VM: ... i-2-120-VM                                                                                   
  06:50:56,034 INFO  ... Attached binaries ISO for VM: ... i-2-122-VM
  06:50:56,035 ERROR ... Failed to setup Kubernetes cluster ...                                                                                         
  06:50:56,145 INFO  ... Detached Kubernetes binaries from VM: ... i-2-120-VM
  06:50:56,271 INFO  ... Detached Kubernetes binaries from VM: ... i-2-122-VM

And the ISO attach is mmediately followed by a detach 1s later.

Summary:

qemu-guest-agent is missing or disabled in the Ubuntu-CKS-Fresh template, so the host's patch.sh has nothing to talk to and times out for 50 minutes per VM (~100 min total for both). Without patch.sh, the cmdline metadata is never injected, the cks-init module inside cloud-init never runs, and kubeadm-init never executes. When CKS eventually calls isKubernetesClusterServerRunning(), the API server isn't up, the readiness check fails, and the cluster goes to Error.

@Pearl1594 please let us know how to remediate this issue. Any help would be greatly appreciated!

0 replies

Provisioning the control VM failed in the Kubernetes cluster #13056

Uh oh!

Uh oh!

anirudh09041 Apr 22, 2026

Replies: 7 comments · 7 replies

Uh oh!

laxmansingh18200 Apr 23, 2026

Uh oh!

DaanHoogland Apr 28, 2026 Collaborator

Uh oh!

anirudh09041 May 4, 2026 Author

Uh oh!

Uh oh!

anirudh09041 May 4, 2026 Author

Uh oh!

DaanHoogland May 4, 2026 Collaborator

Uh oh!

anirudh09041 May 5, 2026 Author

[root@cloudstack secondary]# virsh domblklist i-2-120-VM --details Type Device Target Source

Uh oh!

anirudh09041 May 5, 2026 Author

Uh oh!

Uh oh!

Pearl1594 May 5, 2026 Collaborator

Uh oh!

kiranchavala May 5, 2026 Collaborator

Uh oh!

anirudh09041 May 5, 2026 Author

Uh oh!

anirudh09041 May 5, 2026 Author

Uh oh!

Uh oh!

Pearl1594 May 5, 2026 Collaborator

Uh oh!

anirudh09041 May 5, 2026 Author

Uh oh!

rishabhjain1997 May 6, 2026

anirudh09041
Apr 22, 2026

Replies: 7 comments 7 replies

laxmansingh18200
Apr 23, 2026

DaanHoogland
Apr 28, 2026
Collaborator

anirudh09041 May 4, 2026
Author

anirudh09041 May 4, 2026
Author

DaanHoogland May 4, 2026
Collaborator

anirudh09041 May 5, 2026
Author

[root@cloudstack secondary]# virsh domblklist i-2-120-VM --details
Type Device Target Source

anirudh09041 May 5, 2026
Author

Pearl1594
May 5, 2026
Collaborator

kiranchavala
May 5, 2026
Collaborator

anirudh09041 May 5, 2026
Author

anirudh09041
May 5, 2026
Author

Pearl1594
May 5, 2026
Collaborator

anirudh09041 May 5, 2026
Author

rishabhjain1997
May 6, 2026