Managing Nodes on Huawei Cloud Stack
This document explains how to manage worker nodes using Cluster API Machine resources on the Huawei Cloud Stack platform.
TOC
PrerequisitesOverviewWorker Node DeploymentStep 1: Configure Machine Configuration PoolPool-Managed Persistent Disks for WorkersStep 2: Configure Machine TemplateStep 3: Configure Bootstrap TemplateStep 4: Configure Machine DeploymentNode Management OperationsScaling Worker NodesAdding Worker NodesRemoving Worker NodesUpgrading Machine InfrastructureUpgrading Kubernetes VersionVerificationTroubleshootingViewing Controller LogsCommon IssuesPrerequisites
Important Prerequisites
- The control plane must be deployed before performing node operations. See Create Cluster for setup instructions.
- Ensure you have proper access to the HCS platform and required permissions.
When using the YAML examples in this document, replace only values enclosed in <> with environment-specific values. Preserve the remaining fields unless your cluster policy requires a different value.
Overview
Worker nodes are managed through Cluster API Machine resources, providing declarative and automated node lifecycle management. The deployment process involves:
- Machine Configuration Pool - Network settings for worker nodes
- Machine Template - VM specifications
- Bootstrap Configuration - Node initialization settings
- Machine Deployment - Orchestration of node creation and management
Worker Node Deployment
Before you prepare worker YAML, complete the HCS input checklist in Infrastructure Resources for Huawei Cloud Stack. In particular, list every worker subnet in HCSCluster.spec.network.subnets, allocate worker IPs from planned free IP ranges, and collect the provider-recognized flavorName and availabilityZone API values. If you add a new worker subnet to an existing Ready cluster, patch HCSCluster.spec.network.subnets with the full subnet object instead of adding only the subnet name.
Step 1: Configure Machine Configuration Pool
The HCSMachineConfigPool defines the network configuration and any pool-managed persistent disks for worker node VMs. You must plan and configure the IP addresses, hostnames, persistent disk slots, and other parameters before deployment.
Pool Size Requirement
The pool must include at least as many entries as the number of worker nodes you plan to deploy. Insufficient entries will prevent node deployment.
Use one subnet selector per networks[] entry. For new manifests, set either subnetName or subnetId, but not both. Existing manifests may keep the deprecated subenetName field; if you also add subnetName while updating that manifest, its value must exactly match subenetName. Do not supply conflicting values across subenetName, subnetName, and subnetId.
If you use subnetName for worker nodes, include the same subnet name in the parent HCSCluster.spec.network.subnets list before you create or scale the worker pool. For an existing Ready cluster, append the full subnet object, including the subnet ID, instead of adding only the subnet name.
*For new manifests, set either subnetName or subnetId. Existing manifests may continue to use subenetName, and may add subnetName only if both fields use the same value. Do not provide conflicting subnet selector values.
Persistent disk fields are required when persistentDisks is specified.
Use persistentDisks[] for node-local state that must survive worker replacement. Do not declare the same mount path in HCSMachineTemplate.spec.template.spec.dataVolumes[].
Note: The CRD schema lists subnetName, subenetName, and subnetId as optional fields and does not express their allowed combinations. Follow the provider-level rules above when writing manifests.
Note: networks[] can contain more than one entry when a worker node needs multiple NICs. The current provider only uses each entry to attach a NIC with a subnet selector and static IP. It does not support per-NIC role declarations, default gateway selection, static routes, route metrics, or per-NIC DNS settings.
Pool-Managed Persistent Disks for Workers
Declare worker-node disks that must survive replacement in the matching HCSMachineConfigPool.spec.configs[].persistentDisks[] entry. Use this model for /var/cpaas and for any other node-local state that must be retained during rolling replacement.
- Keep
HCSMachineTemplate.spec.template.spec.dataVolumes[]for temporary disks that may be recreated with each ECS. - Keep slots unique and contiguous from
0for each hostname. The provider uses(hostname, slot)as the persistent-disk identity. - Treat
slot,size,type,format, andmountPathas immutable after the provider accepts the entry. - You can update
mountOptions. The change takes effect after the worker is replaced. - You can append new
persistentDisks[]entries. The provider creates or claims the EVS disk, but it does not hot-mount the disk into the running ECS. Trigger a rolling replacement withMachineDeployment.spec.strategy.rollingUpdate.maxSurge: 0before you expect the new disk to be formatted and mounted inside the guest OS.
To inspect persistent-disk runtime state during worker operations, check the pool status:
Step 2: Configure Machine Template
The HCSMachineTemplate defines the VM specifications for worker nodes.
Configure worker nodes with a system volume and temporary data volumes for paths that may be recreated with each ECS, such as /var/lib/kubelet and /var/lib/containerd. Put /var/cpaas in HCSMachineConfigPool.spec.configs[].persistentDisks[] when platform state must survive worker replacement.
Use the provider-recognized flavorName and availabilityZone API values when you prepare the worker template. These values are not the tenant UI display names.
*Required when dataVolumes is specified.
dataVolumes[] are recreated with the ECS. Do not use them for /var/cpaas or any other path that must survive rolling replacement.
Note: Do not set runtime identity fields such as providerID or serverId in HCSMachineTemplate manifests. The provider assigns these values when it creates HCS instances.
Note: Tenant administrators cannot retrieve the provider-recognized flavorName and availabilityZone values from the HCS UI. Get the exact values from the HCS administrator before you apply the manifest.
Step 3: Configure Bootstrap Template
The KubeadmConfigTemplate defines the bootstrap configuration for worker nodes.
The HCS controller injects /etc/kubernetes/pki/kubelet.crt and /etc/kubernetes/pki/kubelet.key while resolving worker cloud-init data. The kubelet patch above configures kubelet to use those controller-provided certificate files.
Step 4: Configure Machine Deployment
The MachineDeployment orchestrates the creation and management of worker nodes.
Node Management Operations
This section covers common operational tasks for managing worker nodes.
Scaling Worker Nodes
Worker node scaling allows you to adjust cluster capacity based on workload demands.
Adding Worker Nodes
Increase the number of worker nodes to handle increased workload.
Procedure:
-
Check Current Node Status
-
Extend Configuration Pool
Add new machine configurations to the pool for the additional nodes. If the new workers need preserved node-local state such as
/var/cpaas, include the matchingpersistentDisks[]entries in each new configuration.Modify the pool to include new IP entries, then apply:
When you edit the pool, keep all existing
configs[]entries and their acceptedpersistentDisks[]entries unchanged unless you are intentionally appending a new disk slot. -
Scale Up the MachineDeployment
Update the
replicasfield to the desired number of nodes: -
Monitor the Scaling Progress
Removing Worker Nodes
Decrease the number of worker nodes to reduce cluster capacity.
Data Loss Warning
Scaling down removes worker nodes and their ECS instances. Template-owned dataVolumes[] are not preserved. Pool-managed persistent disks declared in HCSMachineConfigPool.spec.configs[].persistentDisks[] remain tracked by the pool and can be reused while the corresponding hostname entry stays in the pool. Ensure:
- Workloads can tolerate node loss through proper replication
- No critical data is stored only on the nodes being removed
- Applications are designed for horizontal scaling
Procedure:
-
Scale Down the MachineDeployment
-
Monitor the Removal Progress
The Cluster API controller will:
- Drain the selected nodes (evict pods if possible)
- Delete the underlying VMs from the HCS platform
- Remove the machine resources
Upgrading Machine Infrastructure
To upgrade worker machine specifications (CPU, memory, disk, VM image), follow these steps:
Note: Worker infrastructure upgrades rely on Cluster API rolling replacement. HCS dataVolumes[] are not preserved during replacement. To preserve node-local state such as /var/cpaas, declare it in HCSMachineConfigPool.spec.configs[].persistentDisks[] before the rollout and keep MachineDeployment.spec.strategy.rollingUpdate.maxSurge: 0.
-
Create New Machine Template
Copy the existing
HCSMachineTemplateand modify the required values:imageName- VM imageflavorName- Instance typerootVolume.size- System disk sizedataVolumes- Temporary data disk configurations
If you need to add a new pool-managed persistent disk, append it to the worker
HCSMachineConfigPoolfirst. The provider creates or claims the EVS disk, but the running ECS does not mount it until this rolling replacement creates a replacement worker.Then edit
new-template.yamlbefore applying:- Change
metadata.nameto<new-template> - Leave runtime identity fields unset, including
spec.template.spec.providerIDandspec.template.spec.serverId - Remove server-generated fields such as:
metadata.resourceVersionmetadata.uidmetadata.creationTimestampmetadata.managedFieldsstatus
-
Deploy New Template
-
Update Machine Deployment
Modify the
MachineDeploymentto reference the new template: -
Monitor Rolling Update
Upgrading Kubernetes Version
Kubernetes version upgrades require coordinated updates to both the MachineDeployment and the underlying VM template.
Note: Ensure the VM template's Kubernetes version matches the version specified in the MachineDeployment. Mismatched versions will cause node join failures.
Procedure:
-
Update Machine Template
Create a new
HCSMachineTemplatewith an updatedimageNamethat supports the target Kubernetes version. -
Update MachineDeployment
Modify the following fields:
-
spec.template.spec.version- Target Kubernetes version -
spec.template.spec.infrastructureRef.name- New machine template name
-
-
Monitor Upgrade
Verify that new nodes join the cluster with the correct Kubernetes version:
Verification
After deploying worker nodes, verify the deployment:
Troubleshooting
Viewing Controller Logs
Common Issues
Node fails to join cluster
- Verify the VM template matches the Kubernetes version
- Check network connectivity between nodes
- Ensure the configuration pool has available entries
Machine stuck in provisioning
- Check HCS platform for resource availability
- Verify credentials and permissions
- Review controller logs for error messages