Configure FQDN Hostname on Existing HCS Clusters

Use this guide when an existing Huawei Cloud Stack (HCS) cluster needs nodes whose POSIX hostname returns the short label and whose hostname -f returns the full FQDN, and the cluster was originally created before cluster-api-provider-hcs v1.0.1.

Applications that depend on a working hostname / hostname -f split — TLS server certificate SAN matching, Kerberos SPN composition, monitoring labels keyed on the short name, and any code that calls getfqdn() — require the node to be initialized by cluster-api-provider-hcs v1.0.1 or later. Cloud-init only runs on first boot, so existing nodes do not pick up the new behavior automatically; the node has to be replaced through the standard Cluster API rolling-replacement flow with the new controller in place.

INFO

Version

Use this procedure when the management cluster runs cluster-api-provider-hcs v1.0.1 or later. Earlier controller versions render dotted HCSMachineConfigPool.spec.configs[].hostname values as the full FQDN string in the POSIX hostname, which breaks hostname -f and can also block kubeadm init.

Overview

Two conditions must both be true for a node to expose the supported hostname / FQDN behavior:

  1. The node's HCSMachineConfigPool.spec.configs[].hostname entry contains a dot (FQDN form, for example master-1.example.org).
  2. The node was first booted by cluster-api-provider-hcs v1.0.1 or later.

Existing nodes booted under the older controller continue to use the cloud-init user-data they received on first boot, so editing manifests or re-applying user-data does not retroactively change their hostname behavior. The supported migration path is to replace those nodes through Cluster API rolling replacement, with the upgraded controller already installed and Ready on the management cluster.

Manual on-node edits to /etc/hostname and /etc/hosts do not persist on MicroOS / SLE Micro. Cloud-init re-renders these files on every boot, including reboots triggered by transactional-update, OS patching, or systemctl reboot.

Prerequisites

  • The cluster-api-provider-hcs plugin on the management cluster is at v1.0.1 or later, and the controller Deployment in cpaas-system is Ready.
  • The existing workload cluster is healthy: all current Machine objects report phase=Running, all Node objects report Ready, and no rolling replacement is in flight.
  • Storage caveats reviewed (see Storage and data caveats).
  • The target VM image used by the existing HCSMachineTemplate is still present in the HCS environment. (No image change is needed; the migration only renames the template to trigger replacement.)
  • The control plane HCSMachineConfigPool has at least one unused (hostname, IP) slot available so Cluster API can surge a new control plane machine before draining the old one. If the existing pool is sized exactly to the current replica count, extend it with one more entry before starting.

Step-by-step procedure

1. Confirm the controller is upgraded

kubectl get deployment -n cpaas-system -l app=cluster-api-provider-hcs-manager -o wide
kubectl rollout status deployment -n cpaas-system cluster-api-provider-hcs-manager

The Deployment must show a v1.0.1-or-later image tag and report all replicas Ready. Replacing nodes before the controller is upgraded re-renders the old user-data on the new VMs, leaving the cluster in the same state.

2. Decide which pools and templates are in scope

The migration is only required for pools whose entries contain a dot. List the existing HCSMachineConfigPool resources and check each spec.configs[].hostname:

kubectl get hcsmachineconfigpool -n cpaas-system
kubectl get hcsmachineconfigpool <pool-name> -n cpaas-system \
  -o jsonpath='{range .spec.configs[*]}{.hostname}{"\n"}{end}'

Pools whose entries are all short labels (no dot) are not affected by the migration and do not need to be touched. Continue only for pools that contain dotted entries.

3. Extend the control plane pool with a surge slot

Cluster API replaces control plane machines one at a time. The default rollout strategy needs a free (hostname, IP) slot in the pool to bring up the replacement machine before terminating the old one. If the existing control plane pool has exactly as many entries as KubeadmControlPlane.spec.replicas, add one more entry now:

kubectl edit hcsmachineconfigpool <cp-pool-name> -n cpaas-system

Add an extra entry under spec.configs with a new hostname and a free IP from the same subnet, then save. The new entry is held in reserve and consumed only when the first machine is replaced.

4. Create a new HCSMachineTemplate for the control plane

The new template can be a byte-for-byte copy of the existing one with a new metadata.name. The rename alone is what triggers Cluster API to roll machines and pick up the new user-data; you do not need to change imageName, flavorName, or any other field.

kubectl get hcsmachinetemplate <current-cp-template> -n cpaas-system -o yaml > new-cp-template.yaml

Edit new-cp-template.yaml:

  • Set metadata.name to a new value (for example, append -fqdn or a date suffix).
  • Strip server-generated metadata (resourceVersion, uid, generation, creationTimestamp, managedFields, the kubectl.kubernetes.io/last-applied-configuration annotation) and the entire status field.
  • Leave runtime identity fields unset, including spec.template.spec.providerID and spec.template.spec.serverId. The HCS provider assigns these values when it creates the new VMs.

Apply the new template:

kubectl apply -f new-cp-template.yaml -n cpaas-system

Keep the previous template until the rollout is healthy; rollback uses it.

5. Patch KubeadmControlPlane to reference the new template

kubectl patch kubeadmcontrolplane <kcp-name> -n cpaas-system --type='merge' \
  -p='{"spec":{"machineTemplate":{"infrastructureRef":{"name":"<new-cp-template>"}}}}'

Cluster API now starts a rolling replacement of the control plane. Monitor progress:

kubectl get kubeadmcontrolplane <kcp-name> -n cpaas-system -w
kubectl get machines -n cpaas-system -l cluster.x-k8s.io/control-plane

Wait until every control plane Machine is the new one (the old Machine names disappear) and KubeadmControlPlane.status.readyReplicas equals spec.replicas.

6. Repeat for each MachineDeployment (workers)

For each MachineDeployment whose nodes need the new hostname behavior, repeat steps 4 and 5 with the worker template:

kubectl get hcsmachinetemplate <current-worker-template> -n cpaas-system -o yaml > new-worker-template.yaml
# Edit metadata.name, strip status/server fields, apply
kubectl apply -f new-worker-template.yaml -n cpaas-system

kubectl patch machinedeployment <md-name> -n cpaas-system --type='merge' \
  -p='{"spec":{"template":{"spec":{"infrastructureRef":{"name":"<new-worker-template>"}}}}}'

Wait for each MachineDeployment to report status.updatedReplicas equal to spec.replicas and all worker Machine objects on the new template before moving on.

7. Verify hostname behavior on each new node

On each new node:

hostname
hostname -f
cat /etc/hosts

For a node whose pool config was hostname: master-1.example.org, expect:

  • hostname returns master-1 (short label, dot and everything after it stripped).
  • hostname -f returns master-1.example.org.
  • /etc/hosts contains an entry of the form <loopback-or-node-ip> master-1.example.org master-1.

If a new node reports the FQDN-style string from hostname, or hostname -f returns Name or service not known, the cloud-init log on the node (/var/log/cloud-init.log and /var/lib/cloud/instance/user-data.txt) is the first thing to inspect, then confirm the controller is actually at v1.0.1 or later.

Storage and data caveats

Rolling replacement destroys each old VM and builds a fresh replacement from the new template. On HCS:

  • The system disk (root volume) is rebuilt from the VM image on every replacement.
  • Data volumes declared in HCSMachineTemplate.spec.template.spec.dataVolumes are not detached and reattached; volumes on the old VM may be removed with it.
  • Workload data must live on external persistent storage (HCS EVS CSI or equivalent) to survive the migration.

Complete backup and migration of any node-local state before starting.

Rollback

If the new rollout is unhealthy — new VMs fail to boot, nodes do not become Ready, or applications break in unexpected ways — patch each affected resource back to the previous template name. Cluster API treats the reversion as a new spec drift and rolls the new machines back to the previous template:

kubectl patch kubeadmcontrolplane <kcp-name> -n cpaas-system --type='merge' \
  -p='{"spec":{"machineTemplate":{"infrastructureRef":{"name":"<previous-cp-template>"}}}}'

kubectl patch machinedeployment <md-name> -n cpaas-system --type='merge' \
  -p='{"spec":{"template":{"spec":{"infrastructureRef":{"name":"<previous-worker-template>"}}}}}'

Do not delete the previous HCSMachineTemplate until the new rollout is healthy. If it has already been deleted, recreate it from version control or backup before rolling back.