Managing Nodes on Bare Metal

This document explains how to deploy worker nodes, scale them up and down, replace specific machines, and recover failed nodes on the bare-metal provider. Node management uses Cluster API Machine resources orchestrated through MachineDeployment; the provider binds each Machine to one MachineInventory from a pool and drives the host through clean / reprovision plans.

Prerequisites

WARNING

Important Prerequisites

  • The control plane must already be running. See Create Cluster.
  • The worker MachineInventoryPool must contain at least as many Available inventories as the target replica count, plus the headroom required by the rollout strategy.
  • The target Machine.spec.version must be a key in the elemental-image-catalog ConfigMap.
INFO

Configuration Guidelines

When working with the configurations in this document:

  • Only modify values enclosed in <> brackets.
  • Replace placeholder values with your environment-specific settings.
  • Preserve all other default configurations unless explicitly required.

Overview

The four resources that compose a worker node group:

  1. MachineInventoryPool (<cluster-name>-worker-pool) — the allowed set of MachineInventory names. Already created in Create Cluster → Step 2.
  2. BaremetalMachineTemplate (<cluster-name>-worker-template) — points at the worker pool. CAPI requires that this template be replaced (new metadata.name) every time the underlying pool reference or allocation policy changes.
  3. KubeadmConfigTemplate (<cluster-name>-worker-bootstrap) — cloud-init user-data for kubeadm join. The bare-metal provider normalizes this user-data at reprovision time (hostname, provider-id, criSocket); operators should not pre-fill those fields.
  4. MachineDeployment — controls replica count, version, rollout strategy.

The Cluster API contract for templates: the running Machine set keeps an in-memory snapshot of the previous template, so editing a template in place does not trigger a rollout. Worker upgrades create a new BaremetalMachineTemplate and patch MachineDeployment.spec.template.spec.infrastructureRef.name to point at it.

Worker Node Deployment

Step 1: Confirm the Worker Pool Capacity

kubectl -n cpaas-system get machineinventorypools.infrastructure.cluster.x-k8s.io <cluster-name>-worker-pool

status.available must be at least the target replica count. If you need additional capacity, register more hosts (boot the SeedImage ISO on them — see Create Cluster → Step 1) and add their MachineInventory names to spec.machineInventories.

Step 2: Configure the Worker BaremetalMachineTemplate

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: BaremetalMachineTemplate
metadata:
  name: <cluster-name>-worker-template
  namespace: cpaas-system
spec:
  template:
    spec:
      machineInventoryPoolRef:
        name: <cluster-name>-worker-pool

Key parameters:

ParameterTypeDescriptionRequired
.spec.template.spec.machineInventoryPoolRef.namestringWorker pool name (same namespace). Immutable once the template is created.Yes

allocationPolicy is reserved for future expansion. The provider currently treats every pool as Ordered — it picks the first Available inventory in declaration order.

Step 3: Configure the Bootstrap Template

apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: <cluster-name>-worker-bootstrap
  namespace: cpaas-system
spec:
  template:
    spec:
      users:
        - name: boot
          sudo: ALL=(ALL) NOPASSWD:ALL
          shell: /bin/bash
          sshAuthorizedKeys:
            - "<ssh-authorized-keys>"
      files:
        - path: /etc/kubernetes/patches/kubeletconfiguration0+strategic.json
          owner: "root:root"
          permissions: "0644"
          content: |
            {
              "apiVersion": "kubelet.config.k8s.io/v1beta1",
              "kind": "KubeletConfiguration",
              "protectKernelDefaults": true,
              "staticPodPath": null,
              "tlsCertFile": "/etc/kubernetes/pki/kubelet.crt",
              "tlsPrivateKeyFile": "/etc/kubernetes/pki/kubelet.key",
              "streamingConnectionIdleTimeout": "5m",
              "clientCAFile": "/etc/kubernetes/pki/ca.crt"
            }
      joinConfiguration:
        patches:
          directory: /etc/kubernetes/patches

The provider applies a minimal normalization to bootstrap user-data before writing it into the reprovision plan, so leave the following fields out of the template:

  • Hostname / FQDN — set automatically from MachineInventoryPool.spec.machineInventories[].hostname (or the inventory name when omitted).
  • kubeletExtraArgs.provider-id — set automatically to baremetal:///<inventory-name>.
  • nodeRegistration.criSocket — set automatically to unix:///var/run/containerd/containerd.sock when unset.

The normalized user-data is also written back into the bootstrap secret under data["resolved-value"] for debugging; the original data["value"] is left untouched.

Step 4: Configure the MachineDeployment

apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: <cluster-name>-workers
  namespace: cpaas-system
spec:
  clusterName: <cluster-name>
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 0        # Bare-metal pools cannot over-provision.
      maxUnavailable: 1
  selector:
    matchLabels: {}
  template:
    metadata:
      labels:
        cluster.x-k8s.io/cluster-name: <cluster-name>
        pool.name: <cluster-name>-workers
    spec:
      clusterName: <cluster-name>
      version: <kubernetes-version>
      nodeDrainTimeout: 5m
      nodeVolumeDetachTimeout: 5m
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: <cluster-name>-worker-bootstrap
          namespace: cpaas-system
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: BaremetalMachineTemplate
        name: <cluster-name>-worker-template
        namespace: cpaas-system

Key parameters:

ParameterTypeDescriptionRequired
.spec.clusterNamestringTarget cluster name.Yes
.spec.replicasintNumber of worker nodes. Must satisfy replicas ≤ MachineInventoryPool.status.available + status.allocated (for the worker pool).Yes
.spec.template.spec.versionstringWorker Kubernetes version. Must be a key in elemental-image-catalog. May differ from the control-plane version while respecting the standard kubelet skew policy.Yes
.spec.strategy.rollingUpdate.maxSurgeintBare-metal does not over-provision — keep 0. The provider has no spare physical host to bring up an extra node first.Yes
.spec.strategy.rollingUpdate.maxUnavailableintMust be > 0 when maxSurge=0. Workers are recycled one-by-one within this budget.Yes

Apply and watch:

kubectl apply -f workers.yaml
kubectl -n cpaas-system get machinedeployments.cluster.x-k8s.io
kubectl -n cpaas-system get baremetalmachines.infrastructure.cluster.x-k8s.io -w
kubectl get nodes -o wide                     # workload cluster

Node Management Operations

Scaling Worker Nodes

Adding Worker Nodes

Use Case: Increase cluster capacity.

Prerequisites:

  • The worker pool has enough Available inventories. If not, register the additional hosts (boot the SeedImage ISO and confirm the new MachineInventory objects) and append them to MachineInventoryPool.spec.machineInventories[].

Procedure:

  1. Check current state

    kubectl -n cpaas-system get machines.cluster.x-k8s.io \
      -l cluster.x-k8s.io/deployment-name=<cluster-name>-workers
    kubectl -n cpaas-system get machineinventorypools.infrastructure.cluster.x-k8s.io \
      <cluster-name>-worker-pool

    Confirm that MachineInventoryPool.status.available is high enough for the planned increment.

  2. Extend the worker pool when more inventories are needed

    kubectl -n cpaas-system edit machineinventorypool <cluster-name>-worker-pool

    Append the new MachineInventory names — they must already be registered and Ready. Save and exit; the pool reconciler picks up the change and increases status.total and status.available.

  3. Scale up the MachineDeployment

    kubectl -n cpaas-system patch machinedeployment <cluster-name>-workers \
      --type='json' -p='[{"op":"replace","path":"/spec/replicas","value":<new-replica-count>}]'
  4. Monitor

    kubectl -n cpaas-system get baremetalmachines.infrastructure.cluster.x-k8s.io -w
    kubectl get nodes                # workload cluster

    New BaremetalMachine objects advance Pending → Allocated → Reprovisioning → Running; the pool's available counter decreases as new nodes are bound.

Removing Worker Nodes

Two strategies are supported, identical in shape to the upstream Cluster API contract:

StrategyWhen to use
Random removalAny node can be removed — for example, a temporary capacity reduction.
Targeted removalA specific physical host should be released (hardware maintenance, replacement, IP recovery).
INFO

Inventory recycling

A clean plan stops kubelet, clears CRI workload, and stops containerd. It does not wipe Kubernetes persistent directories or reset the OS — that work happens later, when the same or a different inventory is picked for a new BaremetalMachine and runs the reprovision plan. Until then, the inventory returns to Available and may be allocated to another cluster.

Random Removal
kubectl -n cpaas-system patch machinedeployment <cluster-name>-workers \
  --type='json' -p='[{"op":"replace","path":"/spec/replicas","value":<new-replica-count>}]'

CAPI selects machines for deletion in its standard order. Each selected BaremetalMachine moves to Preparing, the provider writes a clean plan, and once the plan reports Applied, the inventory returns to Available.

Targeted Removal
  1. Identify the machines

    kubectl -n cpaas-system get machines.cluster.x-k8s.io \
      -l cluster.x-k8s.io/deployment-name=<cluster-name>-workers
  2. Annotate the target machines

    kubectl -n cpaas-system patch machine <machine-name> \
      --type='merge' -p='{"metadata":{"annotations":{"cluster.x-k8s.io/delete-machine":"true"}}}'

    Repeat for every machine you want to remove.

  3. Scale down by exactly the number of annotated machines

    kubectl -n cpaas-system patch machinedeployment <cluster-name>-workers \
      --type='json' -p='[{"op":"replace","path":"/spec/replicas","value":<new-replica-count>}]'

    Reducing by fewer leaves annotated machines in place; reducing by more sends random machines through the clean plan as well.

  4. Verify cleanup

    kubectl -n cpaas-system get baremetalmachines.infrastructure.cluster.x-k8s.io
    kubectl -n cpaas-system get machineinventories.elemental.cattle.io <inventory-name> -o yaml

    The released inventory must show baremetal.alauda.io/allocation-state=Available, the baremetal.alauda.io/owner-* annotations must be cleared, and the pool annotation must remain. MachineInventory itself is not deleted.

Replacing a Single Failed Node

If a BaremetalMachine lands in Failed, the safest recovery is to delete the failed Machine. CAPI immediately creates a replacement Machine (because replicas did not change), and the bare-metal provider picks an Available inventory from the same pool. Most often the replacement is a different MachineInventory; the provider does not guarantee that the freshly released inventory will be re-picked.

kubectl -n cpaas-system delete machine <machine-name>
kubectl -n cpaas-system get baremetalmachines.infrastructure.cluster.x-k8s.io -w

Investigate the original failure separately: read BaremetalMachine.status.conditions, MachineInventory.status.plan.state, and the failing plan secret's failed-output key on the host that ran the plan. Common root causes are documented in the Common Failure Modes section below.

Upgrading Machine Infrastructure

BaremetalMachineTemplate carries only the pool reference and allocation policy — there is no CPU / memory / disk spec to revisit. Infrastructure-side changes that require a template swap are limited to:

  • Moving a MachineDeployment to a different pool.
  • Adjusting allocation policy (when more policies are added).

To swap templates safely:

  1. Create a new BaremetalMachineTemplate with a new metadata.name referencing the new pool.

  2. Apply the new template.

  3. Patch the MachineDeployment:

    kubectl -n cpaas-system patch machinedeployment <cluster-name>-workers \
      --type='merge' \
      -p='{"spec":{"template":{"spec":{"infrastructureRef":{"name":"<new-template-name>"}}}}}'
  4. Watch the rolling replacement complete (maxSurge=0, one node at a time).

Updating Bootstrap Templates

KubeadmConfigTemplate is an immutable template in the same sense as BaremetalMachineTemplate. Modifying an existing template in place does not roll out existing machines; only newly created machines pick up the changes.

To roll out a bootstrap change:

  1. Export the existing template:

    kubectl -n cpaas-system get kubeadmconfigtemplate <cluster-name>-worker-bootstrap -o yaml \
      > new-worker-bootstrap.yaml
  2. Change metadata.name, remove server-generated fields (resourceVersion, uid, creationTimestamp, managedFields, kubectl.kubernetes.io/last-applied-configuration) and the entire status, and edit the desired fields.

  3. Apply the new template:

    kubectl apply -f new-worker-bootstrap.yaml
  4. Patch the MachineDeployment to reference the new template:

    kubectl -n cpaas-system patch machinedeployment <cluster-name>-workers \
      --type='merge' \
      -p='{"spec":{"template":{"spec":{"bootstrap":{"configRef":{"name":"<new-template-name>"}}}}}}'

    This triggers a rolling replacement.

Upgrading Kubernetes Version

For Kubernetes upgrades on bare-metal, see Upgrading Clusters on Bare Metal. The upgrade path always replaces nodes — there is no in-place kubeadm upgrade step.


Pool and Inventory Observability

The annotations the provider maintains on each MachineInventory are the authoritative source for "what is this host being used for right now":

AnnotationValuesMeaning
baremetal.alauda.io/poolPool nameWhich pool owns this inventory.
baremetal.alauda.io/allocation-stateAvailable / Allocated / Preparing / Reprovisioning / UnavailableLifecycle phase from the provider's point of view.
baremetal.alauda.io/owner-clusterCluster nameCluster currently using this inventory (cleared on release).
baremetal.alauda.io/owner-machineMachine nameOwning Cluster API Machine.
baremetal.alauda.io/owner-baremetalmachineBaremetalMachine nameOwning provider machine.

Each BaremetalMachine.status.planSecretRef plan secret also carries baremetal.alauda.io/plan.type=clean|reprovision so you can distinguish which plan is currently being driven.


Common Failure Modes

ScenarioExpected conditionWhat to check
Pool absentBaremetalMachine stays Pending, InventoryAllocated=False / Reason=PoolMissingBaremetalMachineTemplate.spec.template.spec.machineInventoryPoolRef.name and namespace
Pool exhaustedBaremetalMachine stays Pending, InventoryAllocated=False / Reason=PoolExhaustedMachineInventoryPool.status.available; inventory ownership annotations
Member missingMachineInventoryPool.MembersValid=False / Ready=FalsePool's status.unavailable, message on the failing condition
Inventory not ReadyCounted toward status.unavailableMachineInventory.status.conditions[Ready], plan secret existence
Plan secret missingInventory ineligible for allocationMachineInventory.status.plan.secretRef, presence of the referenced Secret
Bootstrap secret missingBootstrapReady=False / Reason=BootstrapWaitingMachine.spec.bootstrap.dataSecretName
Catalog missBaremetalMachine Failed, ImageResolved=False / Reason=ImageCatalogMiss, no plan writtenelemental-image-catalog keys
Registry annotation missingImageResolved=False / Reason=ImageRegistryMissingCluster.metadata.annotations["cpaas.io/registry-address"]
Reprovision plan failedBaremetalMachine Failed, inventory marked UnavailablePlan secret failed-output; host serial console; reachable platform registry
Clean plan failedDeletion blocked by finalizer; inventory marked UnavailableSame as above, focused on clean plan output

For the full operator-side state machine reference, see Provider Overview → BaremetalMachine.


Next Steps