Troubleshoot Huawei Cloud Stack Workload Clusters
This guide covers troubleshooting patterns that are specific to Huawei Cloud Stack (HCS) workload clusters managed through Cluster API.
If your cluster reaches Cluster.status.phase = Provisioned but the import flow does not complete and workload nodes stay NotReady because the CNI is missing, start with the provider-agnostic guide Troubleshoot a Workload Cluster Stuck in Provisioned. That guide covers the shared diagnostic flow (global cluster import controller, sentry ServiceAccount, clusters.platform.tkestack.io invariants) for every Immutable Infrastructure provider.
If you have already completed the generic flow and the symptoms below also appear, see the HCS-specific pattern on this page.
TOC
Pattern: kubeadm init Never Completes Because the Hostname Is WrongDiagnoseMitigatePreventRelatedPattern: kubeadm init Never Completes Because the Hostname Is Wrong
Symptoms — all true at the same time:
Cluster.status.phase=Provisioned,HCSCluster.status.ready=true, the ELB is up.HCSMachine.status.instanceState=ACTIVEwith anInternalIPpopulated, butMachine.status.nodeRefnever becomes set.KubeadmControlPlane.status.initializedstays empty andstatus.readyReplicas=0past the usual init window of about 5 to 10 minutes.- The
cluster-api-provider-hcscontroller logs (incpaas-system) repeatedly printconnect: connection refusedagainst the control plane ELB VIP on port 6443. - The
HCSMachineConfigPool.spec.configs[].hostnamevalue chosen for the node contains a dot (FQDN style, for examplemaster-1.example.org).
In cluster-api-provider-hcs releases earlier than v1.0.1, a dotted HCSMachineConfigPool.spec.configs[].hostname is rendered into cloud-init as the full FQDN string in the hostname field, with prefer_fqdn_over_hostname: true. The resulting node has a POSIX hostname that contains dots, which kubeadm init does not handle, so kube-apiserver never starts.
Diagnose
If you can reach the node (HCS console, jump host, or an in-cluster debug pod):
Indicators that you are hitting this pattern:
hostnamereturns the full dotted string instead of just the short label.hostname -freturnsName or service not knownorTemporary failure in name resolution./etc/hostsdoes not contain a line of the form<node-ip> <fqdn> <short>.- The cloud-init user-data on the node shows
prefer_fqdn_over_hostname: trueand does not setmanage_etc_hosts.
Mitigate
Upgrade the cluster-api-provider-hcs plugin to v1.0.1 or later, then trigger a rolling replacement of the affected control plane and worker machines so the new cloud-init runs on freshly booted VMs. The step-by-step procedure is documented in Configure FQDN Hostname on Existing HCS Clusters.
Manual edits to /etc/hostname and /etc/hosts on an existing MicroOS / SLE Micro node do not persist: cloud-init re-renders these files on every boot, including reboots triggered by transactional-update, OS patching, or systemctl reboot. Rolling replacement is the supported migration path.
Prevent
Re-running the cluster create with cluster-api-provider-hcs v1.0.1 or later already installed avoids the pattern entirely. New manifests for HCS should follow Hostname behavior on the node on the cluster create page when picking dotted versus short hostnames in HCSMachineConfigPool.
Related
- Troubleshoot a Workload Cluster Stuck in Provisioned — provider-agnostic import-controller diagnostic flow that applies before any HCS-specific pattern on this page.
- Configure FQDN Hostname on Existing HCS Clusters — rolling-replacement migration procedure for clusters created before
cluster-api-provider-hcsv1.0.1. - Create Cluster on Huawei Cloud Stack — manifest reference for new clusters, including Hostname behavior on the node.