Upgrading Clusters on Huawei Cloud Stack

This guide explains how to upgrade Kubernetes clusters on Huawei Cloud Stack with minimal downtime, while preserving stability and data integrity.

INFO

Where this page fits in the full ACP upgrade flow

This page covers only the Kubernetes step of the upgrade. The full ACP upgrade flow — including upgrade artifact synchronization, ACP Core upgrade through CVO, Aligned plugin upgrades, and Agnostic plugin upgrades from Marketplace — is documented in the ACP product documentation. Complete those steps before you start the Kubernetes step on this page:

Use this page when the same cluster runs on an immutable operating system, because the Kubernetes step on immutable OS replaces nodes from a new VM template rather than upgrading binaries in place.

INFO

Version

HCS provider v1.0.1 is the first release that supports pool-managed persistent disks.

INFO

Existing Cluster Migration

If your cluster runs ACP v4.3.1 or later and you are moving to HCS provider v1.0.1 or later, complete the migration procedure in Migrate Existing Huawei Cloud Stack Clusters to Pool-Managed Persistent Disks before you rely on upgrade-time disk preservation.

Overview

Cluster upgrades on HCS encompass multiple components and follow a structured approach to ensure system reliability:

  • Control Plane Upgrades: Update Kubernetes control plane components and underlying infrastructure
  • Worker Node Upgrades: Upgrade worker nodes with new machine images and Kubernetes versions
  • Infrastructure Updates: Modify virtual machine specifications, storage, and network configurations

Upgrade Sequence

Upgrade HCS clusters in the following order:

  1. (Prerequisite) Upgrade the ACP platform on the management cluster first. This brings the cluster-api-provider-hcs controller and the related CAPI components to versions that understand the new schema. Trigger workload-cluster upgrades only after the management-side controllers have rolled out and become Ready.
  2. Upgrade the Distribution Version on the workload cluster. See Upgrading Clusters.
  3. Upgrade the control plane Kubernetes version.
  4. Upgrade worker nodes to the target Kubernetes version.

Cluster API orchestrates rolling updates with built-in safety mechanisms to reduce service disruption.

WARNING

Skipping step 1 risks two failure modes: the old controller silently ignores new schema fields written to HCSMachineConfigPool / HCSMachineTemplate; or a controller image swap mid-rollout interrupts persistent-disk state-machine progression. Always settle the management-side upgrade before touching workload rollout.

Prerequisites

Before you start, ensure all of the following prerequisites are met:

  • The Distribution Version upgrade is complete.
  • The control plane is reachable.
  • All nodes are healthy and in Ready state.
  • The target VM image is present in the HCS environment under the same name as the MicroOS Image Version value in the OS Support Matrix row. The upgrade fails if the image is not present when the new HCSMachineTemplate is applied.
  • The target Kubernetes version is compatible with your workloads and add-ons.
  • Review the Kubernetes upgrade path and version skew policy.
  • Any node-local state that must survive replacement is declared in HCSMachineConfigPool.spec.configs[].persistentDisks[], not in HCSMachineTemplate.spec.template.spec.dataVolumes[].

For initial deployment, see the Create Cluster guide.

WARNING

Single-Control-Plane Clusters

The upgrade workflow in this document applies to HCS clusters with a highly available control plane. Single-control-plane HCS clusters are supported for creation, but they are not supported for upgrade through this workflow.

WARNING

Disk Preservation Model

Upgrades rely on Cluster API's rolling replacement mechanism. The HCS provider has four disk classes:

Disk classDeclared inSurvives upgrade?Use for
System disk (root volume)HCSMachineTemplate.spec.template.spec.rootVolume❌ NeverOS + kubelet/kubeadm/containerd. Rebuilt from the new VM image every replacement.
Data volumesHCSMachineTemplate.spec.template.spec.dataVolumes❌ NeverTemporary node-local paths that can be recreated with each ECS. Volumes attached to the old VM may be deleted together with it.
Pool-managed persistent disksHCSMachineConfigPool.spec.configs[].persistentDisks[]✅ YesNode-local state that must survive delete-recreate replacement, such as /var/cpaas.
External CSI volumes (HCS EVS CSI, etc.)Workload PVCs / CSI driver✅ Unrelated to node lifecycleApplication data. Use this path for any state that must survive upgrades.

Do not treat node-local data on HCS dataVolumes[] as preserved state. Move /var/cpaas and any other retained node-local paths to pool-managed persistent disks before the rolling replacement.

WARNING

Templates Cannot Be Modified In Place

HCSMachineTemplate is a Cluster API infrastructure template. Cluster API only triggers rolling replacement when KubeadmControlPlane.spec.machineTemplate.infrastructureRef.name or MachineDeployment.spec.template.spec.infrastructureRef.name points at a different template name. Editing the existing template in place changes the manifest but does not produce a new rollout — the running VMs continue to use the in-memory snapshot of the previous template.

Every upgrade step on this page therefore creates a new HCSMachineTemplate with a new metadata.name, applies it, and then patches the controlling resource's infrastructureRef.name to the new template. Keep the previous template until the new rollout is healthy in case rollback is required.

INFO

Fleet Essentials UI does not support ACP 4.3 cluster upgrades

The Fleet Essentials UI workflow has not been adapted to the Cluster Version Operator (CVO) mechanism introduced in ACP 4.3. HCS clusters do not currently expose a Fleet Essentials UI upgrade path; use the YAML procedure documented below, or the two-step upgrade flow built into the ACP Core platform — see Request the upgrade for workload clusters.

Control Plane Upgrades

Control plane upgrades update the Kubernetes API server, etcd, scheduler, and controller manager, along with the underlying VM infrastructure.

For HCS control planes backed by an HCSMachineConfigPool that uses pool-managed persistent disks, keep KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate.maxSurge: 0 during upgrades. Persistent disks are bound to fixed (hostname, slot) identities, so the rollout must remove the old machine before the replacement machine can reuse the same disk.

Infrastructure Image Updates

Upgrading the underlying machine images for control plane nodes provides security patches, performance improvements, and updated system components.

Procedure

  1. Create Updated Machine Template

    Copy the existing HCSMachineTemplate referenced by KubeadmControlPlane and modify the required specifications:

    kubectl get hcsmachinetemplate <current-template-name> -n cpaas-system -o yaml > new-cp-template.yaml
  2. Modify Template Specifications

    Modify the new template:

    • Set metadata.name to <new-template-name>
    • Remove server-generated metadata and status fields from the copied manifest.
    • Leave runtime identity fields unset, including spec.template.spec.providerID and spec.template.spec.serverId. The HCS provider assigns these values when it creates instances.
    • Keep preserved paths such as /var/cpaas out of spec.template.spec.dataVolumes[]. Declare those paths in the referenced HCSMachineConfigPool.spec.configs[].persistentDisks[].
    • Update as needed:
      • spec.template.spec.imageName
      • spec.template.spec.flavorName
      • spec.template.spec.rootVolume.size
      • spec.template.spec.dataVolumes for temporary disks only
  3. Deploy Updated Template

    Apply the new machine template:

    kubectl apply -f new-cp-template.yaml -n cpaas-system
  4. Update Control Plane Reference

    Modify the KubeadmControlPlane resource to reference the new template:

    kubectl patch kubeadmcontrolplane <kcp-name> -n cpaas-system --type='merge' -p='{"spec":{"machineTemplate":{"infrastructureRef":{"name":"<new-template-name>"}}}}'
  5. Monitor Rolling Update

    The control plane will automatically perform a rolling update:

    kubectl get kubeadmcontrolplane <kcp-name> -n cpaas-system -w
    kubectl get machines -n cpaas-system -l cluster.x-k8s.io/control-plane

Kubernetes Version Upgrades

Upgrading the Kubernetes version involves updating both the control plane software and the supporting virtual machine images.

Required Values From the OS Support Matrix

The authoritative mapping between an ACP release, its MicroOS image, the Kubernetes version, the matching CoreDNS, etcd, and Kube-OVN versions lives in OS Support Matrix. Locate the row that corresponds to the target ACP version before you start; the row supplies every value the procedure below needs.

The cells you read from that row map to the upgrade manifests as follows:

OS Support Matrix columnUsed to setWhere it lands
MicroOS Image VersionHCSMachineTemplate.spec.template.spec.imageNameControl plane and worker HCSMachineTemplate
Kubernetes VersionKubeadmControlPlane.spec.version and MachineDeployment.spec.template.spec.versionBoth control plane and worker
corednsKubeadmControlPlane.spec.kubeadmConfigSpec.clusterConfiguration.dns.imageTagControl plane only
etcdKubeadmControlPlane.spec.kubeadmConfigSpec.clusterConfiguration.etcd.local.imageTagControl plane only
kube-ovn (chart)Cluster.metadata.annotations["cpaas.io/kube-ovn-version"] and the cni-kube-ovn AppRelease spec.source.charts[0].targetRevision on the workload clusterThe annotation records the intended chart version; on an existing cluster the chart revision must be patched separately (see step 3 below). This is the acp/chart-cpaas-kube-ovn chart version (for example v4.3.3), not the Kube-OVN component version.

The CoreDNS and etcd image tags are control-plane-only because clusterConfiguration is a KubeadmControlPlane field. Worker nodes inherit container image versions from the new VM template; the MachineDeployment does not carry its own dns/etcd tags. The Kube-OVN annotation lives on the Cluster resource, not on KubeadmControlPlane, because the HCS provider watches it independently of the Kubernetes control plane rollout.

Procedure

  1. Create a new HCSMachineTemplate for the target Kubernetes version

    Copy the existing control-plane template and apply it under a new metadata.name with the target imageName:

    kubectl get hcsmachinetemplate <current-cp-template-name> -n cpaas-system -o yaml > new-cp-template.yaml

    In new-cp-template.yaml:

    • Set metadata.name to <new-template-name>.

    • Set spec.template.spec.imageName to the MicroOS Image Version value from the target row in the OS Support Matrix.

    • Strip server-generated metadata (resourceVersion, uid, generation, creationTimestamp, managedFields, kubectl.kubernetes.io/last-applied-configuration annotation) and the entire status field.

    • Leave runtime identity fields unset, including spec.template.spec.providerID and spec.template.spec.serverId. The HCS provider sets providerID to hcs://<cluster-name>/<machine-name> and serverId to the HCS ECS instance ID after the VM is created; pre-filling them in the template breaks the controller's identity binding.

      kubectl apply -f new-cp-template.yaml -n cpaas-system
  2. Patch the KubeadmControlPlane with the target Kubernetes values

    Update the KubeadmControlPlane resource in a single edit to keep spec.version, the CoreDNS image tag, the etcd image tag, and the infrastructure template reference consistent with the same MicroOS release:

    • spec.versionKubernetes Version from the OS Support Matrix row

    • spec.kubeadmConfigSpec.clusterConfiguration.dns.imageTagcoredns column from the same row

    • spec.kubeadmConfigSpec.clusterConfiguration.etcd.local.imageTagetcd column from the same row

    • spec.machineTemplate.infrastructureRef.name ← the new HCSMachineTemplate name created in step 1

      kubectl edit kubeadmcontrolplane <kcp-name> -n cpaas-system

    Updating only spec.version is not sufficient. The CoreDNS and etcd image tags must move together with the Kubernetes version because they are built from the same MicroOS release; leaving them at the previous values can result in CoreDNS and etcd pods that do not match the new Kubernetes minor version.

    Keep spec.rolloutStrategy.rollingUpdate.maxSurge: 0 when the referenced control plane pool uses persistent disks. The replacement machine must reuse the same fixed hostname and disk slot after the old machine is removed.

  3. Upgrade the Kube-OVN chart on the workload cluster

    Kube-OVN is a Core lifecycle component, but on immutable OS the HCS provider does not pin its chart version to the cluster's Kubernetes version. The chart version is carried by a separate AppRelease named cni-kube-ovn in the cpaas-system namespace of the workload cluster, and you move it forward in two steps: update the annotation on the Cluster resource for bookkeeping and future re-creation, then patch the existing AppRelease directly to bump the chart revision.

    WARNING

    Why two steps are required on HCS

    The HCS provider creates the cni-kube-ovn AppRelease the first time the cluster is built, and from then on it reconciles only the spec.values block (cluster name, CIDRs, registry, control-plane node list). It does not write to spec.source.charts[0].targetRevision on an AppRelease that already exists. As a result, changing cpaas.io/kube-ovn-version on the Cluster resource alone does not move the chart version on the workload cluster. The annotation must still be updated so the recorded target matches the OS Support Matrix row, but the chart upgrade itself is driven by a direct AppRelease patch.

    3.1. Update the cpaas.io/kube-ovn-version annotation on the Cluster resource

    kubectl annotate cluster <cluster-name> -n cpaas-system \
      cpaas.io/kube-ovn-version=<kube-ovn-version-from-matrix> --overwrite

    The annotation does not update automatically when spec.version changes; keep it in step with the kube-ovn (chart) column of the target row.

    3.2. Patch the AppRelease chart revision on the workload cluster

    Run the patch against the workload cluster's API server (not the bootstrap KIND or the global cluster):

    kubectl patch apprelease cni-kube-ovn -n cpaas-system --type='json' \
      -p='[{"op":"replace","path":"/spec/source/charts/0/targetRevision","value":"<kube-ovn-version-from-matrix>"}]'

    Use the same value you set in the annotation. The releaseName (cpaas-kube-ovn) and name (acp/chart-cpaas-kube-ovn) are managed by the provider; do not change them.

    3.3. Wait for reconciliation to complete

    Watch the chart phase and the installed revision:

    # Overall AppRelease state — Sync and Health columns must reach a Success-equivalent reason
    kubectl get apprelease cni-kube-ovn -n cpaas-system
    
    # Installed revision and chart phase
    kubectl get apprelease cni-kube-ovn -n cpaas-system \
      -o jsonpath='Installed: {.status.charts.*.installedRevision}{"\n"}Phase: {.status.charts.*.phase}{"\n"}'

    The normal sequence is Upgrading → HealthChecking → Success. On small clusters the full transition typically completes within about one minute. Read the phases as follows:

    PhaseMeaninginstalledRevision
    UpgradingHelm release upgrade in progress. Sync condition is Unknown(Syncing).Still the previous version
    HealthCheckingHelm release applied; controller is verifying Kube-OVN pods. Sync condition is True(Synced).Already the target version
    SuccessAll three conditions (Validate, Sync, Health) are True.Target version
    WARNING

    Do not declare the upgrade complete on installedRevision alone. The field flips to the target value during HealthChecking, before pods have been verified Ready. The chart is only considered upgraded when phase is Success and installedRevision matches the target.

    The AppRelease API also defines Downloading, Installing, Syncing, DownloadFailed, DeployFailed, and NotReady. The first three are transient and the upgrade should converge on its own. The last three indicate a failure that needs manual investigation; start with kubectl describe apprelease cni-kube-ovn -n cpaas-system to read the per-condition message field.

  4. Verify Upgrade Progress

    Monitor the rolling upgrade process:

    # Check control plane status
    kubectl get kubeadmcontrolplane <kcp-name> -n cpaas-system
    
    # Monitor individual machines
    kubectl get machines -n cpaas-system -l cluster.x-k8s.io/control-plane
    
    # Verify cluster health
    kubectl get nodes

Worker Node Upgrades

Worker node upgrades are managed via MachineDeployment resources.

INFO

For detailed worker node procedures, see the Managing Nodes section.

Rolling Back a Failed Upgrade

If the rolling update fails — new VMs fail to boot, nodes do not become Ready, or the new Kubernetes minor version surfaces an incompatibility — revert the template reference and Kubernetes-version fields back to the previous values. Cluster API treats the reversion as a new spec drift and rolls the v2 machines back to the previous template, one at a time.

Three facts to internalize before rolling back:

  • The old VMs are gone. They were destroyed during the upgrade. Rollback uses the old template to build a fresh set of replacement machines; it does not restore the original VMs.
  • The old HCSMachineTemplate resource must still exist. Do not delete the previous template until the new rollout is healthy. If you already deleted it, recreate it from version control or backup before rolling back.
  • Only pool-managed persistent disks preserve node-local state. Data written to HCSMachineTemplate.spec.template.spec.dataVolumes[] during the upgrade window is lost when that VM is replaced. Data written to disks declared in HCSMachineConfigPool.spec.configs[].persistentDisks[] is retained and reattached to the replacement VM. Application data should still use external persistent storage such as HCS EVS CSI unless your operational design explicitly depends on node-local state.

Procedure:

  • Control plane: patch KubeadmControlPlane to restore the previous spec.machineTemplate.infrastructureRef.name, spec.version, spec.kubeadmConfigSpec.clusterConfiguration.dns.imageTag, and spec.kubeadmConfigSpec.clusterConfiguration.etcd.local.imageTag.

  • Workers: patch each MachineDeployment to restore the previous spec.template.spec.infrastructureRef.name and spec.template.spec.version.

  • Kube-OVN: if the Kube-OVN chart was upgraded, revert it the same way the upgrade was applied — first restore the annotation, then patch the AppRelease chart revision back. Verify with the same installedRevision + phase=Success check used in step 3.

    kubectl annotate cluster <cluster-name> -n cpaas-system \
      cpaas.io/kube-ovn-version=<previous-kube-ovn-version> --overwrite
    
    kubectl patch apprelease cni-kube-ovn -n cpaas-system --type='json' \
      -p='[{"op":"replace","path":"/spec/source/charts/0/targetRevision","value":"<previous-kube-ovn-version>"}]'

If the new control plane never reached etcd quorum, the KubeadmControlPlane controller may refuse to roll back any machine because its preflight checks block on an unhealthy etcd. Recover etcd quorum first (operator intervention) before retrying the rollback.

Additional Resources