Upgrading Clusters on VMware vSphere
This document explains how to upgrade Kubernetes clusters on VMware vSphere after the platform-side distribution upgrade is complete. The documented workflow focuses on updating the control plane and worker nodes through Cluster API resources.
Where this page fits in the full ACP upgrade flow
This page covers only the Kubernetes step of the upgrade. The full ACP upgrade flow — including upgrade artifact synchronization, ACP Core upgrade through CVO, Aligned plugin upgrades, and Agnostic plugin upgrades from Marketplace — is documented in the ACP product documentation. Complete those steps before you start the Kubernetes step on this page:
- Upgrade Overview (scope and sequencing)
- Pre-Upgrade Preparation
- Upgrade the global cluster (Core, Aligned, Agnostic)
- Upgrade workload clusters (Core, Aligned, Agnostic)
Use this page when the same cluster runs on an immutable operating system, because the Kubernetes step on immutable OS replaces nodes from a new VM template rather than upgrading binaries in place.
TOC
Upgrade SequencePrerequisitesRequired Values From the OS Support MatrixStepsRolling Back a Failed UpgradeVerificationNext StepsUpgrade Sequence
Upgrade VMware vSphere clusters in the following order:
- (Prerequisite) Upgrade the ACP platform on the management cluster first. This brings the
cluster-api-provider-vspherecontroller and the related CAPI components to versions that understand the new schema. Trigger workload-cluster upgrades only after the management-side controllers have rolled out and become Ready. - Complete the distribution-version upgrade described in Upgrading Clusters.
- Verify that the control plane is healthy and the current cluster is stable.
- Upgrade the control plane Kubernetes version.
- Upgrade worker nodes to the target Kubernetes version.
Prerequisites
Before you begin, ensure the following conditions are met:
- The distribution-version upgrade is complete.
- The control plane is healthy and reachable.
- All nodes are in the
Readystate. - The target VM template is present in the vSphere environment under the same name as the MicroOS Image Version value in the OS Support Matrix row. The upgrade fails if the template is not present when the new
VSphereMachineTemplateis applied. - The target Kubernetes version is compatible with your workloads and add-ons.
- The machine config pools have enough capacity for rolling updates.
- Review the Kubernetes upgrade path and version skew policy.
Disk Preservation Model
Upgrades rely on Cluster API's rolling replacement mechanism. Each cluster has four disk classes; only the pool-managed class survives a delete-recreate.
"Preserved" means the same disk identity is reattached — it does not mean the disk's contents are time-traveled. Anything written to a pool-managed disk during the upgrade window stays after the upgrade and stays after a rollback.
Templates Cannot Be Modified In Place
VSphereMachineTemplate.spec.template.spec is immutable. The vSphere admission webhook rejects any update with the message "VSphereMachineTemplate spec.template.spec field is immutable. Please create a new resource instead." Every upgrade step on this page therefore creates a new VSphereMachineTemplate with a new metadata.name, applies it, and then patches the controlling resource's infrastructureRef.name to the new template. Keep the previous template until the new rollout is healthy in case rollback is required.
Fleet Essentials UI does not support ACP 4.3 cluster upgrades
vSphere clusters do not currently expose a Fleet Essentials UI upgrade path. Use the YAML procedure documented below, or the two-step upgrade flow built into the ACP Core platform — see Request the upgrade for workload clusters.
Required Values From the OS Support Matrix
The authoritative mapping between an ACP release, its VM template, the Kubernetes version, the matching CoreDNS, etcd, and Kube-OVN versions lives in OS Support Matrix. Locate the row that corresponds to the target ACP version before you start; the row supplies every value the steps below need.
The cells you read from that row map to the upgrade manifests as follows:
The CoreDNS and etcd image tags are control-plane-only because clusterConfiguration is a KubeadmControlPlane field. Worker nodes inherit container image versions from the new VM template; the MachineDeployment does not carry its own dns/etcd tags. The Kube-OVN annotation lives on the Cluster resource, not on KubeadmControlPlane, because the vSphere provider watches it independently of the Kubernetes control plane rollout.
Steps
Create the target machine templates
Before you start the rolling upgrade, create new VSphereMachineTemplate resources for the control plane and workers.
-
Export the existing control plane template
-
Modify the control plane template
Edit
new-cp-template.yaml:- Set
metadata.nameto a new unique name (for example,<cluster_name>-control-plane-v2) - Update
spec.template.spec.templateto the target VM template name - Update CPU, memory, or disk settings if needed
- Remove server-generated fields:
metadata.resourceVersion,metadata.uid,metadata.generation,metadata.creationTimestamp,metadata.managedFields,metadata.annotations["kubectl.kubernetes.io/last-applied-configuration"], andstatus - Leave
spec.template.spec.providerIDunset. The vSphere provider setsproviderIDto the VM's BIOS UUID once the VM is created; pre-filling it in the template breaks the controller's identity binding.
- Set
-
Export and modify the worker template
Edit
new-worker-template.yaml:- Set
metadata.nameto a new unique name (for example,<cluster_name>-worker-v2) - Update
spec.template.spec.templateto the target VM template name - Update CPU, memory, or disk settings if needed
- Remove the same server-generated fields listed above
- Set
-
Apply both new templates
Upgrade the control plane
Before you start, collect every required value from the target ACP row in the OS Support Matrix as described in Required Values From the OS Support Matrix.
-
Patch the
KubeadmControlPlanewith the target Kubernetes valuesUpdate the
KubeadmControlPlaneresource in a single edit to keepspec.version, the CoreDNS image tag, the etcd image tag, and the infrastructure template reference consistent with the same VM template:-
spec.version← Kubernetes Version from the OS Support Matrix row -
spec.kubeadmConfigSpec.clusterConfiguration.dns.imageTag← coredns column from the same row -
spec.kubeadmConfigSpec.clusterConfiguration.etcd.local.imageTag← etcd column from the same row -
spec.machineTemplate.infrastructureRef.name← the newVSphereMachineTemplatename created above
Updating only
spec.versionis not sufficient. The CoreDNS and etcd image tags must move together with the Kubernetes version because they are built from the same release; leaving them at the previous values can result in CoreDNS and etcd pods that do not match the new Kubernetes minor version. -
-
Update the Kube-OVN version annotation on the
ClusterresourceIf the target ACP row in the OS Support Matrix shows a different kube-ovn (chart) value than the current cluster, patch the annotation on the
Clusterresource so the vSphere provider reconciles the new Kube-OVN AppRelease.INFOPrerequisite — Kube-OVN reconcile gating annotation (vSphere only)
On vSphere, the provider reconciles the Kube-OVN
AppReleaseonly when theClusterresource carries the annotationcpaas.io/network-type: kube-ovn. This annotation is normally set at cluster creation. If it is missing, the steps below will succeed at writing the version annotation but the Kube-OVNAppReleasewill not be reconciled. Verify before proceeding:This precondition is vSphere-specific. The DCS and Huawei Cloud Stack providers use the
DCSCluster/HCSClusterspec.networkTypefield instead and do not require this annotation.Kube-OVN is a Core lifecycle component, but on immutable OS the vSphere provider drives its delivery from this annotation; the annotation does not update automatically when
spec.versionchanges.The vSphere provider reconciles a single Kube-OVN
AppReleasenamedcni-kube-ovnin thecpaas-systemnamespace of the workload cluster. Run the following on the workload cluster (not the bootstrap KIND or theglobalcluster) to follow the reconciliation:The normal sequence is
Upgrading → HealthChecking → Success. On small clusters the full transition typically completes within about one minute. Read the phases as follows:WARNINGDo not declare the upgrade complete on
installedRevisionalone. The field flips to the target value duringHealthChecking, before pods have been verified Ready. The chart is only considered upgraded whenphaseisSuccessandinstalledRevisionmatches the target.The
AppReleaseAPI also definesDownloading,Installing,Syncing,DownloadFailed,DeployFailed, andNotReady. The first three are transient and the upgrade should converge on its own. The last three indicate a failure that needs manual investigation; start withkubectl describe apprelease cni-kube-ovn -n cpaas-systemto read the per-conditionmessagefield. -
Monitor the control plane rollout
Upgrade the worker nodes
After the control plane upgrade completes, update the MachineDeployment to reference the new worker template and the target Kubernetes version.
Typical changes include:
spec.template.spec.version— the target Kubernetes versionspec.template.spec.infrastructureRef.name— the newVSphereMachineTemplatenamespec.template.spec.bootstrap.configRef.name— the newKubeadmConfigTemplatename, if bootstrap settings must change (see Updating Bootstrap Templates)
Apply the changes:
Monitor the worker rollout:
Rolling Back a Failed Upgrade
If the rolling update fails — new VMs fail to boot, nodes do not become Ready, or the new Kubernetes minor version surfaces an incompatibility — revert the template reference and Kubernetes-version fields back to the previous values. Cluster API treats the reversion as a new spec drift and rolls the v2 machines back to the previous template, one at a time.
Three facts to internalize before rolling back:
- The old VMs are gone. They were destroyed during the upgrade. Rollback uses the old template to build a fresh set of replacement machines; it does not restore the original VMs.
- The old
VSphereMachineTemplateresource must still exist. Do not delete the previous template until the new rollout is healthy. If you already deleted it, recreate it from version control or backup before rolling back. - Pool-managed disk identity is preserved, but data state is not. Disks declared in
VSphereMachineConfigPool.spec.slot[].persistentDisksreattach to the rolled-back machines at the same slot, but any data written to those disks during the upgrade window (for example, etcd entries in the new Kubernetes minor format) stays. If the new format is unreadable by the older Kubernetes minor version, the rollback may still fail and require manual etcd restoration.
Procedure:
-
Control plane: patch
KubeadmControlPlaneto restore the previousspec.machineTemplate.infrastructureRef.name,spec.version,spec.kubeadmConfigSpec.clusterConfiguration.dns.imageTag, andspec.kubeadmConfigSpec.clusterConfiguration.etcd.local.imageTag. -
Workers: patch each
MachineDeploymentto restore the previousspec.template.spec.infrastructureRef.nameandspec.template.spec.version. -
Kube-OVN: if the kube-ovn annotation was changed, restore the previous value on the
Clusterresource:
If the new control plane never reached etcd quorum, the KubeadmControlPlane controller may refuse to roll back any machine because its preflight checks block on an unhealthy etcd. Recover etcd quorum first (operator intervention) before retrying the rollback.
Verification
Confirm the following results after the upgrade:
KubeadmControlPlanereaches the target version and desired replica count.MachineDeploymentreaches the target version and desired replica count.- Control plane and worker nodes return to the
Readystate. - The vSphere CPI daemonset remains available in the workload cluster.
Next Steps
After the Kubernetes upgrade is complete, continue with routine node operations in Managing Nodes on VMware vSphere.