Upgrading Kubernetes on Huawei DCS
This guide explains how to complete Phase 2 of the upgrade workflow for clusters on Huawei DCS. Before you upgrade Kubernetes, complete the Distribution Version upgrade described in Upgrading Clusters.
Where this page fits in the full ACP upgrade flow
This page covers only the Kubernetes step of the upgrade. The full ACP upgrade flow — including upgrade artifact synchronization, ACP Core upgrade through CVO, Aligned plugin upgrades, and Agnostic plugin upgrades from Marketplace — is documented in the ACP product documentation. Complete those steps before you start the Kubernetes step on this page:
- Upgrade Overview (scope and sequencing)
- Pre-Upgrade Preparation
- Upgrade the global cluster (Core, Aligned, Agnostic)
- Upgrade workload clusters (Core, Aligned, Agnostic)
Use this page when the same cluster runs on an immutable operating system, because the Kubernetes step on immutable OS replaces nodes from a new MicroOS-based VM template rather than upgrading binaries in place.
Version
DCS provider v1.0.16 is the first release that supports pool-managed persistent disks.
Existing Cluster Migration
If your cluster runs ACP v4.2.1 or later and you are moving to DCS provider v1.0.16 or later, complete the migration procedure in Migrate Existing Huawei DCS Clusters to Pool-Managed Persistent Disks before you rely on upgrade-time disk preservation.
TOC
Upgrade SequencePrerequisitesUsing YAMLRequired Values From the OS Support MatrixUpgrade Control Plane InfrastructureProcedureUpgrade Control Plane Kubernetes VersionProcedureUpgrade Worker NodesProcedureRolling Back a Failed UpgradeUsing the Web UIPrerequisitesUpgrade WorkflowChecking Available UpgradesUpgrade the Control Plane Node PoolUpgrade Worker Node PoolsCross-Version UpgradesTroubleshootingAdditional ResourcesUpgrade Sequence
Upgrade DCS clusters in the following order:
- (Prerequisite) Upgrade the ACP platform on the management cluster first. This brings the cluster-api-provider-dcs controller and the related CAPI components (core, KubeadmControlPlane provider, bootstrap provider) to versions that understand the new schema. Trigger workload-cluster upgrades only after the management-side controllers have rolled out and become Ready.
- Upgrade the Distribution Version (Aligned Extensions) on the workload cluster. See Upgrading Distribution Version.
- Upgrade the control plane Kubernetes version.
- Upgrade worker nodes to the target Kubernetes version.
Cluster API orchestrates rolling updates with built-in safety mechanisms to reduce service disruption.
Skipping step 1 risks two failure modes: the old controller silently ignores new schema fields written to DCSIpHostnamePool / DCSMachineTemplate; or a controller image swap mid-rollout interrupts persistent-disk state-machine progression. Always settle the management-side upgrade before touching workload rollout.
Prerequisites
Before you start, ensure all of the following prerequisites are met:
- The Distribution Version upgrade is complete
- The control plane is reachable
- All nodes are healthy and in Ready state
- The IP Pool has sufficient capacity for rolling updates
- The VM template supports the target Kubernetes version. See OS Support Matrix for version mapping
- The target Kubernetes version is compatible with your workloads and add-ons
- DCS VM templates are
4.2.1or later if you use pool-managed persistent disks, because safe shutdown and disk detach depend on guest tools - If you rely on pool-managed persistent disks, keep
KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate.maxSurge = 0and eachMachineDeployment.spec.strategy.rollingUpdate.maxSurge = 0
Disk Preservation Model
Upgrades rely on Cluster API's rolling update mechanism. Each cluster has four disk classes; only the pool-managed class survives a delete-recreate.
"Preserved" means the same disk identity is reattached — it does not mean the disk's contents are time-traveled. Anything written to a pool-managed disk during the upgrade window stays after the upgrade and stays after a rollback.
Pool-managed preservation requires one-by-one replacement, so keep maxSurge = 0 on both KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate and MachineDeployment.spec.strategy.rollingUpdate.
If your existing cluster still keeps preserved data in the old template-disk layout, migrate it first by following Migrate Existing Huawei DCS Clusters to Pool-Managed Persistent Disks.
Templates Cannot Be Modified In Place
DCSMachineTemplate is a Cluster API infrastructure template. Cluster API only triggers rolling replacement when KubeadmControlPlane.spec.machineTemplate.infrastructureRef.name or MachineDeployment.spec.template.spec.infrastructureRef.name points at a different template name. Editing the existing template in place changes the manifest but does not produce a new rollout — the running VMs continue to use the in-memory snapshot of the previous template.
Every upgrade step on this page therefore creates a new DCSMachineTemplate with a new metadata.name, applies it, and then patches the controlling resource's infrastructureRef.name to the new template. The previous template should be kept until the new rollout is healthy in case rollback is required.
Using YAML
YAML-based upgrades do not depend on Fleet Essentials.
Required Values From the OS Support Matrix
The authoritative mapping between an ACP release, its MicroOS image, the Kubernetes version, the matching CoreDNS, etcd, and Kube-OVN versions lives in OS Support Matrix. Locate the row that corresponds to the target ACP version before you start; the row supplies every value the YAML steps below need.
The cells you read from that row map to the upgrade manifests as follows:
The CoreDNS and etcd image tags are control-plane-only because clusterConfiguration is a KubeadmControlPlane field. Worker nodes inherit container image versions from the new VM template; the MachineDeployment does not carry its own dns/etcd tags. The Kube-OVN annotation lives on the Cluster resource, not on KubeadmControlPlane, because the DCS provider watches it independently of the Kubernetes control plane rollout.
Confirm with the cluster's platform owner that the target MicroOS image has already been uploaded to the DCS platform under the same name as the MicroOS Image Version value in the matrix row. The upgrade fails if that VM template is not present on DCS when the DCSMachineTemplate is applied.
Upgrade Control Plane Infrastructure
Upgrading the control plane machine template lets you roll out updated VM specifications, system patches, and infrastructure settings.
Procedure
-
Create an updated machine template
Copy the existing
DCSMachineTemplatereferenced byKubeadmControlPlaneand save it as a new file: -
Modify the template specifications
Update the new template as needed:
- Set
metadata.nameto<new-template-name> - Update
spec.template.spec.vmTemplateName - Update
spec.template.spec.vmConfig.dcsMachineCpuSpec.quantity - Update
spec.template.spec.vmConfig.dcsMachineMemorySpec.quantity - Update
spec.template.spec.vmConfig.dcsMachineDiskSpecfor system and template-local disks only - Strip server-generated metadata (
resourceVersion,uid,generation,creationTimestamp,managedFields,kubectl.kubernetes.io/last-applied-configurationannotation) and the entirestatusfield from the copied manifest. - Leave
spec.template.spec.providerIDunset. The DCS provider setsproviderIDtodcs://<machine-name>once the VM is created; pre-filling it in the template breaks the controller's identity binding.
Keep pool-managed persistent disks, including
/var/cpaas, inDCSIpHostnamePool.spec.pool[].persistentDisk. - Set
-
Apply the updated template
-
Update the control plane reference
Modify the
KubeadmControlPlaneresource to reference the new template: -
Monitor the rolling update
Upgrade Control Plane Kubernetes Version
Upgrading the control plane Kubernetes version on immutable OS is a delete-recreate workflow. The control plane VMs are replaced one by one from a new DCSMachineTemplate that points at the target MicroOS VM template, and the KubeadmControlPlane resource is patched to carry the matching Kubernetes version, CoreDNS image tag, and etcd image tag.
Before you start, collect every required value from the target ACP row in the OS Support Matrix as described in Required Values From the OS Support Matrix.
Procedure
-
Create a new
DCSMachineTemplatefor the target Kubernetes versionCopy the existing control-plane template and update
metadata.nameto a new name andspec.template.spec.vmTemplateNameto the MicroOS Image Version value read from the target row in the OS Support Matrix. Keep pool-managed persistent disks inDCSIpHostnamePool.spec.pool[].persistentDiskrather than reintroducing them as template disks. -
Patch the
KubeadmControlPlanewith the target Kubernetes valuesUpdate the
KubeadmControlPlaneresource in a single edit to keepspec.version, the CoreDNS image tag, the etcd image tag, and the infrastructure template reference consistent with the same MicroOS release:-
spec.version← Kubernetes Version from the OS Support Matrix row -
spec.kubeadmConfigSpec.clusterConfiguration.dns.imageTag← coredns column from the same row -
spec.kubeadmConfigSpec.clusterConfiguration.etcd.local.imageTag← etcd column from the same row -
spec.machineTemplate.infrastructureRef.name← the newDCSMachineTemplatename created in step 1
Updating only
spec.versionis not sufficient. The CoreDNS and etcd image tags must move together with the Kubernetes version because they are built from the same MicroOS release; leaving them at the previous values can result in CoreDNS and etcd pods that do not match the new Kubernetes minor version. -
-
Upgrade the Kube-OVN chart on the workload cluster
Kube-OVN is a Core lifecycle component, but on immutable OS the DCS provider does not pin its chart version to the cluster's Kubernetes version. The chart version is carried by a separate
AppReleasenamedcni-kube-ovnin thecpaas-systemnamespace of the workload cluster, and you move it forward in two steps: update the annotation on theClusterresource for bookkeeping and future re-creation, then patch the existingAppReleasedirectly to bump the chart revision.WARNINGWhy two steps are required on DCS
The DCS provider creates the
cni-kube-ovnAppReleasethe first time the cluster is built, and from then on it reconciles only thespec.valuesblock (cluster name, CIDRs, registry, control-plane node list). It does not write tospec.source.charts[0].targetRevisionon anAppReleasethat already exists. As a result, changingcpaas.io/kube-ovn-versionon theClusterresource alone does not move the chart version on the workload cluster. The annotation must still be updated so the recorded target matches the OS Support Matrix row, but the chart upgrade itself is driven by a directAppReleasepatch.3.1. Update the
cpaas.io/kube-ovn-versionannotation on theClusterresourceThe annotation does not update automatically when
spec.versionchanges; keep it in step with the kube-ovn (chart) column of the target row.3.2. Patch the
AppReleasechart revision on the workload clusterRun the patch against the workload cluster's API server (not the bootstrap KIND or the
globalcluster):Use the same value you set in the annotation. The
releaseName(cpaas-kube-ovn) andname(acp/chart-cpaas-kube-ovn) are managed by the provider; do not change them.3.3. Wait for reconciliation to complete
Watch the chart phase and the installed revision:
The normal sequence is
Upgrading → HealthChecking → Success. On small clusters the full transition typically completes within about one minute. Read the phases as follows:WARNINGDo not declare the upgrade complete on
installedRevisionalone. The field flips to the target value duringHealthChecking, before pods have been verified Ready. The chart is only considered upgraded whenphaseisSuccessandinstalledRevisionmatches the target.The
AppReleaseAPI also definesDownloading,Installing,Syncing,DownloadFailed,DeployFailed, andNotReady. The first three are transient and the upgrade should converge on its own. The last three indicate a failure that needs manual investigation; start withkubectl describe apprelease cni-kube-ovn -n cpaas-systemto read the per-conditionmessagefield. -
Monitor the rolling update
KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate.maxSurgemust remain0when the cluster relies on pool-managed persistent disks, so the control plane VMs are replaced one at a time.
Upgrade Worker Nodes
Worker node Kubernetes upgrades are managed through MachineDeployment resources. Worker upgrades carry fewer fields than the control plane: the CoreDNS and etcd image tags are part of KubeadmControlPlane.spec.kubeadmConfigSpec.clusterConfiguration, which MachineDeployment does not have. Worker nodes inherit Kubernetes component versions from the new VM template; the MachineDeployment only needs the target Kubernetes version and the new template reference.
Before you start, read the MicroOS Image Version and Kubernetes Version cells from the target ACP row in the OS Support Matrix as described in Required Values From the OS Support Matrix.
Procedure
-
Create a new
DCSMachineTemplatefor worker nodes- Create a new
DCSMachineTemplatewith avmTemplateNameset to the MicroOS Image Version value from the target row in the OS Support Matrix - Keep
/var/cpaasand any other upgrade-preserved disks inDCSIpHostnamePool.spec.pool[].persistentDiskrather than reintroducing them as template disks
- Create a new
-
Update the
MachineDeployment- Set
spec.template.spec.versionto the Kubernetes Version value from the same OS Support Matrix row - Set
spec.template.spec.infrastructureRef.nameto the newDCSMachineTemplatename created in step 1 - Optionally update
spec.template.spec.bootstrap.configRef.nameif bootstrap configuration changes are required for this release
- Set
-
Monitor the rolling update
- Verify that the rolling update completes successfully
- Verify that the new worker nodes join the cluster with the target Kubernetes version
MachineDeployment.spec.strategy.rollingUpdate.maxSurgemust remain0when the cluster relies on pool-managed persistent disks, so the worker nodes are replaced one at a time.
Rolling Back a Failed Upgrade
If the rolling update fails — new VMs fail to boot, nodes do not become Ready, or the new Kubernetes minor version surfaces an incompatibility — revert the template reference and Kubernetes-version fields back to the previous values. Cluster API treats the reversion as a new spec drift and rolls the v2 machines back to the previous template, one at a time.
Three facts to internalize before rolling back:
- The old VMs are gone. They were destroyed during the upgrade. Rollback uses the old template to build a fresh set of replacement machines; it does not restore the original VMs.
- The old
DCSMachineTemplateresource must still exist. Do not delete the previous template until the new rollout is healthy. If you already deleted it, recreate it from version control or backup before rolling back. - Pool-managed disk identity is preserved, but data state is not. Disks declared in
DCSIpHostnamePool.spec.pool[].persistentDiskreattach to the rolled-back machines at the same IP slot, but any data written to those disks during the upgrade window (for example, etcd entries in the new Kubernetes minor format) stays. If the new format is unreadable by the older Kubernetes minor version, the rollback may still fail and require manual etcd restoration.
Procedure:
-
Control plane: patch
KubeadmControlPlaneto restore the previousspec.machineTemplate.infrastructureRef.name,spec.version,spec.kubeadmConfigSpec.clusterConfiguration.dns.imageTag, andspec.kubeadmConfigSpec.clusterConfiguration.etcd.local.imageTag. -
Workers: patch each
MachineDeploymentto restore the previousspec.template.spec.infrastructureRef.nameandspec.template.spec.version. -
Kube-OVN: if the Kube-OVN chart was upgraded, revert it the same way the upgrade was applied — first restore the annotation, then patch the
AppReleasechart revision back. Verify with the sameinstalledRevision+phase=Successcheck used in step 3.
If the new control plane never reached etcd quorum, the KubeadmControlPlane controller may refuse to roll back any machine because its preflight checks block on an unhealthy etcd. Recover etcd quorum first (operator intervention) before retrying the rollback.
Using the Web UI
Fleet Essentials UI does not support ACP 4.3 cluster upgrades
The Fleet Essentials UI workflow has not been adapted to the Cluster Version Operator (CVO) mechanism introduced in ACP 4.3. Do not use the Fleet Essentials UI to upgrade DCS clusters on ACP 4.3.
Two supported alternatives:
- YAML path — Follow the YAML-based upgrade procedure documented earlier on this page.
- ACP Core cluster management UI — Use the two-step upgrade flow built into the ACP Core platform; see Request the upgrade for the global cluster or Request the upgrade for workload clusters.
Cluster creation and node-pool management through the Fleet Essentials UI are unaffected by this limitation.
Use this workflow to upgrade Kubernetes from the web UI after Phase 1 is complete.
Version requirement: This workflow requires Fleet Essentials and Alauda Container Platform DCS Infrastructure Provider 1.0.13 or later. If the provider version is earlier than 1.0.13, use YAML manifests. If the upgrade relies on pool-managed persistent disks, use DCS provider v1.0.16 or later. In v1.0.16, the persistentDisk declaration on DCSIpHostnamePool remains YAML-only and is not exposed in the web UI.
Prerequisites
- The Distribution Version upgrade is complete. See Upgrading Distribution Version
- The Control Plane Node Pool is in Running state
- The IP Pool has sufficient capacity for rolling updates
- If the upgrade relies on pool-managed persistent disks, ensure the required
DCSIpHostnamePool.spec.pool[].persistentDiskentries have already been created or updated through YAML
Upgrade Workflow
Kubernetes upgrades follow this sequence after the Distribution Version upgrade:
- Upgrade the Control Plane Node Pool.
- Wait for the Control Plane Node Pool upgrade to complete.
- Upgrade Worker Node Pools in any order.
Checking Available Upgrades
Navigation: Clusters → Clusters → Select cluster → Node Pools Tab
Node Pools with available upgrades show Upgrade available indicator. Click on the Node Pool card to view Current vs Target versions.
Upgrade the Control Plane Node Pool
Steps:
- In the Node Pools Tab, locate the Control Plane Node Pool
- Click Upgrade
- Review upgrade information:
- Current Version: Current Kubernetes version
- Target Version: Latest minor version supported (automatically selected)
- Click Confirm to start
Monitoring:
- Watch the Node Pool status
- Nodes will roll update one by one (
maxSurge=0). This one-by-one replacement is also required when the cluster relies on persistent disks. - Upgrade time depends on node count and resources
Upgrade Worker Node Pools
Worker Node Pools cannot be upgraded until the Control Plane Node Pool upgrade completes.
When Worker Pool Upgrade is Available:
- Control Plane Kubernetes Version matches global cluster version
- Control Plane is in Running state
Upgrade Steps:
- For each Worker Node Pool:
- Click Upgrade button
- Review and confirm
- Pools can be upgraded in parallel after Control Plane completes
Upgrade Constraints:
Cross-Version Upgrades
When upgrading across multiple minor versions (for example, v1.32 → v1.34):
- Upgrade Control Plane to v1.33
- Wait for completion
- Upgrade Control Plane to v1.34
- Repeat for Worker Pools
Why: Kubernetes only supports single minor version upgrades.