Configure FQDN Hostname on Existing HCS Clusters
Use this guide when an existing Huawei Cloud Stack (HCS) cluster needs nodes whose POSIX hostname returns the short label and whose hostname -f returns the full FQDN, and the cluster was originally created before cluster-api-provider-hcs v1.0.1.
Applications that depend on a working hostname / hostname -f split — TLS server certificate SAN matching, Kerberos SPN composition, monitoring labels keyed on the short name, and any code that calls getfqdn() — require the node to be initialized by cluster-api-provider-hcs v1.0.1 or later. Cloud-init only runs on first boot, so existing nodes do not pick up the new behavior automatically; the node has to be replaced through the standard Cluster API rolling-replacement flow with the new controller in place.
Version
Use this procedure when the management cluster runs cluster-api-provider-hcs v1.0.1 or later. Earlier controller versions render dotted HCSMachineConfigPool.spec.configs[].hostname values as the full FQDN string in the POSIX hostname, which breaks hostname -f and can also block kubeadm init.
TOC
OverviewPrerequisitesStep-by-step procedure1. Confirm the controller is upgraded2. Decide which pools and templates are in scope3. Extend the control plane pool with a surge slot4. Create a newHCSMachineTemplate for the control plane5. Patch KubeadmControlPlane to reference the new template6. Repeat for each MachineDeployment (workers)7. Verify hostname behavior on each new nodeStorage and data caveatsRollbackRelatedOverview
Two conditions must both be true for a node to expose the supported hostname / FQDN behavior:
- The node's
HCSMachineConfigPool.spec.configs[].hostnameentry contains a dot (FQDN form, for examplemaster-1.example.org). - The node was first booted by
cluster-api-provider-hcsv1.0.1 or later.
Existing nodes booted under the older controller continue to use the cloud-init user-data they received on first boot, so editing manifests or re-applying user-data does not retroactively change their hostname behavior. The supported migration path is to replace those nodes through Cluster API rolling replacement, with the upgraded controller already installed and Ready on the management cluster.
Manual on-node edits to /etc/hostname and /etc/hosts do not persist on MicroOS / SLE Micro. Cloud-init re-renders these files on every boot, including reboots triggered by transactional-update, OS patching, or systemctl reboot.
Prerequisites
- The
cluster-api-provider-hcsplugin on the management cluster is at v1.0.1 or later, and the controller Deployment incpaas-systemis Ready. - The existing workload cluster is healthy: all current
Machineobjects reportphase=Running, allNodeobjects reportReady, and no rolling replacement is in flight. - Storage caveats reviewed (see Storage and data caveats).
- The target VM image used by the existing
HCSMachineTemplateis still present in the HCS environment. (No image change is needed; the migration only renames the template to trigger replacement.) - The control plane
HCSMachineConfigPoolhas at least one unused(hostname, IP)slot available so Cluster API can surge a new control plane machine before draining the old one. If the existing pool is sized exactly to the current replica count, extend it with one more entry before starting.
Step-by-step procedure
1. Confirm the controller is upgraded
The Deployment must show a v1.0.1-or-later image tag and report all replicas Ready. Replacing nodes before the controller is upgraded re-renders the old user-data on the new VMs, leaving the cluster in the same state.
2. Decide which pools and templates are in scope
The migration is only required for pools whose entries contain a dot. List the existing HCSMachineConfigPool resources and check each spec.configs[].hostname:
Pools whose entries are all short labels (no dot) are not affected by the migration and do not need to be touched. Continue only for pools that contain dotted entries.
3. Extend the control plane pool with a surge slot
Cluster API replaces control plane machines one at a time. The default rollout strategy needs a free (hostname, IP) slot in the pool to bring up the replacement machine before terminating the old one. If the existing control plane pool has exactly as many entries as KubeadmControlPlane.spec.replicas, add one more entry now:
Add an extra entry under spec.configs with a new hostname and a free IP from the same subnet, then save. The new entry is held in reserve and consumed only when the first machine is replaced.
4. Create a new HCSMachineTemplate for the control plane
The new template can be a byte-for-byte copy of the existing one with a new metadata.name. The rename alone is what triggers Cluster API to roll machines and pick up the new user-data; you do not need to change imageName, flavorName, or any other field.
Edit new-cp-template.yaml:
- Set
metadata.nameto a new value (for example, append-fqdnor a date suffix). - Strip server-generated metadata (
resourceVersion,uid,generation,creationTimestamp,managedFields, thekubectl.kubernetes.io/last-applied-configurationannotation) and the entirestatusfield. - Leave runtime identity fields unset, including
spec.template.spec.providerIDandspec.template.spec.serverId. The HCS provider assigns these values when it creates the new VMs.
Apply the new template:
Keep the previous template until the rollout is healthy; rollback uses it.
5. Patch KubeadmControlPlane to reference the new template
Cluster API now starts a rolling replacement of the control plane. Monitor progress:
Wait until every control plane Machine is the new one (the old Machine names disappear) and KubeadmControlPlane.status.readyReplicas equals spec.replicas.
6. Repeat for each MachineDeployment (workers)
For each MachineDeployment whose nodes need the new hostname behavior, repeat steps 4 and 5 with the worker template:
Wait for each MachineDeployment to report status.updatedReplicas equal to spec.replicas and all worker Machine objects on the new template before moving on.
7. Verify hostname behavior on each new node
On each new node:
For a node whose pool config was hostname: master-1.example.org, expect:
hostnamereturnsmaster-1(short label, dot and everything after it stripped).hostname -freturnsmaster-1.example.org./etc/hostscontains an entry of the form<loopback-or-node-ip> master-1.example.org master-1.
If a new node reports the FQDN-style string from hostname, or hostname -f returns Name or service not known, the cloud-init log on the node (/var/log/cloud-init.log and /var/lib/cloud/instance/user-data.txt) is the first thing to inspect, then confirm the controller is actually at v1.0.1 or later.
Storage and data caveats
Rolling replacement destroys each old VM and builds a fresh replacement from the new template. On HCS:
- The system disk (root volume) is rebuilt from the VM image on every replacement.
- Data volumes declared in
HCSMachineTemplate.spec.template.spec.dataVolumesare not detached and reattached; volumes on the old VM may be removed with it. - Workload data must live on external persistent storage (HCS EVS CSI or equivalent) to survive the migration.
Complete backup and migration of any node-local state before starting.
Rollback
If the new rollout is unhealthy — new VMs fail to boot, nodes do not become Ready, or applications break in unexpected ways — patch each affected resource back to the previous template name. Cluster API treats the reversion as a new spec drift and rolls the new machines back to the previous template:
Do not delete the previous HCSMachineTemplate until the new rollout is healthy. If it has already been deleted, recreate it from version control or backup before rolling back.
Related
- Upgrading Clusters on Huawei Cloud Stack — Kubernetes-version upgrades use the same rolling-replacement mechanism.
- Troubleshoot Huawei Cloud Stack Workload Clusters — diagnose the kubeadm-init-stalled symptom that appears when an unfixed controller boots a dotted-hostname node.
- Troubleshoot a Workload Cluster Stuck in Provisioned — provider-agnostic import-controller diagnostic flow.
- Hostname behavior on the node — what
hostname/hostname -f//etc/hostslook like for short versus dotted pool entries on a freshly created cluster.