Bare Metal Provider

Overview

The Bare Metal Infrastructure Provider enables Immutable Infrastructure on physical servers, with no virtualization layer in between. It composes two long-running components on the global cluster:

  • elemental-operator — registers physical hosts, builds installation ISOs (SeedImage), and maintains the long-lived MachineInventory object for every host. elemental-system-agent runs on each host and executes MachineInventory plan secrets.
  • cluster-api-provider-baremetal — the Cluster API infrastructure provider. It groups available MachineInventory objects into pools, binds each Machine to a MachineInventory, and writes clean / reprovision plans that drive the host through Kubernetes node lifecycle.

Unlike VM providers (DCS, vSphere), the bare-metal provider does not create or destroy machines. A host is installed once (live ISO → on-disk OS via elemental install), registers itself as a MachineInventory, and stays in the inventory across the cluster lifecycle. Node "creation" and "deletion" are realized through elemental upgrade driven by plans, followed by reboot and cloud-init re-execution.

Status

The provider currently follows a YAML-only workflow. There is no Fleet Essentials UI for bare-metal clusters yet — every step on this page is driven by kubectl apply.

Key Features

  • MachineInventoryPool allocation model — operators pre-declare which MachineInventory objects may back a KubeadmControlPlane or MachineDeployment. The provider picks an Available inventory from that pool when a Machine is created; nothing is provisioned outside the declared set.
  • Plan-driven node lifecycle — node attach uses a reprovision plan (write cloud-init, cloud-init clean, elemental upgrade, reboot, cloud-init re-execute, kubeadm init/join). Node detach uses a clean plan (stop kubelet, clear CRI workload, stop containerd). MachineInventory is never deleted by the provider during scaling, upgrade, or cluster deletion.
  • Cluster API native object tree — uses upstream Cluster, KubeadmControlPlane, MachineDeployment, Machine. The provider only owns the infrastructure tree (BaremetalCluster, BaremetalMachine, BaremetalMachineTemplate, MachineInventoryPool). There is no custom control-plane CRD.
  • Image-catalog driven Kubernetes versions — the provider keeps a cluster-scoped elemental-image-catalog ConfigMap that maps Machine.spec.version to an elemental upgrade image. Upgrades become "patch the version, controller resolves the matching image, reprovisions the node."
  • Control-plane HA via aliveBaremetalCluster.spec.controlPlaneLoadBalancer declares a control-plane VIP and VRID; the provider deploys the alive chart (keepalived + IPVS + kube-lock Lease arbitration) onto the control-plane nodes once the workload cluster is reachable.
  • Network identity preservationelemental-register reports the live-ISO observed network as MachineInventory.spec.observedNetwork. The first elemental install and every later reprovision plan replay that snapshot as cloud-init network-config v2, so a host keeps its address, default route, and DNS across the entire lifecycle.

Differences from VM-Based Providers

AspectVM providers (DCS, vSphere)Bare-metal provider
Node creationClone a VM template from a credential-scoped platformReprovision an already-installed physical host via elemental upgrade
Node deletionPower-off and delete the VMRun clean plan; host stays in inventory and returns to pool
Pool modelIP pool / hostname pool sized per replicaMachineInventoryPool listing concrete MachineInventory names
Data on the nodeSystem and template disks recreated on every replacement; declared persistent disks survive (DCS)elemental upgrade snapshot model — initramfs clears Kubernetes persistent state; pool-managed persistent disks are not part of this provider
Recovery timeMinutes per replacementMinutes per elemental upgrade + reboot
First install pathOne-off VM template upload by the platform adminLive ISO built by SeedImage, booted on the host, elemental install to disk
Control-plane LBExternal LB supplied by operatorInternal LB managed by alive static pods on the control-plane nodes (External LB supported through type: External)
UI supportYAML + Fleet Essentials UI (DCS 1.0.13 and later)YAML only

Concepts and Terminology

Object hierarchy

elemental.cattle.io                              infrastructure.cluster.x-k8s.io
─────────────────────                            ────────────────────────────────
MachineRegistration  ──┐                         BaremetalCluster
                       │                         BaremetalMachineTemplate
SeedImage            ──┤                         BaremetalMachine
                       ▼                         MachineInventoryPool
MachineInventory  ◄──────────────────────────►  (referenced by name)

   │ (default plan secret, status.plan)

elemental-system-agent on the host

elemental-operator owns the left column (host registration and the long-lived MachineInventory). cluster-api-provider-baremetal owns the right column (the Cluster API infrastructure tree) and only references MachineInventory by name.

Bare-metal concepts

MachineRegistration

Declares the registration endpoint and first-install cloud-config that elemental-register consumes on the live ISO. Operators set machineName, machineInventoryLabels, and machineInventoryAnnotations (with ${SMBIOS/...} templating) plus the elemental.install and elemental.registration blocks. MachineRegistration is queried but not modified by the bare-metal provider — it belongs to the elemental layer.

SeedImage

Triggers elemental-operator to build a bootable ISO that contains the registration URL and the operator's TLS material baked into it. spec.baseImage must reference the ISO variant of the OS image that matches the target Kubernetes version (the repository name carries the -iso suffix; the tag/digest matches the elemental-image-catalog entry for the version you intend to install). The ISO is booted once per physical host; on boot, the host runs elemental-register followed by elemental install and creates a MachineInventory.

MachineInventory

The long-lived host identity object. The provider only relies on the following parts of its contract:

  • status.plan.secretRef — the single default plan secret owned by elemental-operator.
  • status.plan.stateApplied / Failed transitions used to drive the BaremetalMachine state machine.
  • status.conditions — host-side readiness signals.
  • spec.observedNetwork — fork-only field populated by elemental-register from the live-ISO NICs; replayed during install and during every reprovision plan.

The provider never deletes a MachineInventory, never uses MachineInventorySelector, and does not run a separate MachineInventoryLifecycleController — that lifecycle remains with elemental-operator.

MachineInventoryPool

Operator-authored set of MachineInventory names that a given BaremetalMachineTemplate is allowed to draw from. Pools are scoped to a single clusterName, and each MachineInventory belongs to at most one active pool at a time. The pool reconciler aggregates the pool-wide capacity counters used everywhere in the docs:

  • available — free for allocation
  • allocated — bound to an active BaremetalMachine
  • preparing — running a clean plan
  • reprovisioning — running a reprovision plan
  • unavailable — Ready=False, plan failed, missing plan secret, or not present in the cluster

BaremetalCluster

The Cluster API infrastructure cluster resource. Owns controlPlaneLoadBalancer (the VIP and vrid consumed by the alive chart) and controlPlaneEndpoint (backfilled from controlPlaneLoadBalancer when only the VIP is set). The reconciler defers cluster-addon deployment (alive, kube-ovn) until the workload control plane is reachable.

BaremetalMachineTemplate

The Cluster API infrastructure template referenced by KubeadmControlPlane.spec.machineTemplate.infrastructureRef and by MachineDeployment.spec.template.spec.infrastructureRef. Templates only carry machineInventoryPoolRef (which pool this machine group draws from) and allocationPolicy (Ordered is currently the only supported value — picks the first Available inventory in declaration order).

There is deliberately no version, role, or upgradeImage on the template. Role comes from the owning Cluster API resource; the version comes from Machine.spec.version; the upgrade image is resolved at reprovision time from the global image catalog.

BaremetalMachine

The Cluster API infrastructure machine. Reconciles a single Machine against a single MachineInventory:

  1. Picks an Available inventory from the pool referenced by the owning template.
  2. Reads the owning Machine.spec.bootstrap.dataSecretName and resolves the elemental upgrade image for Machine.spec.version from the image catalog.
  3. Normalizes the bootstrap user-data (hostname, kubelet provider-id, criSocket) and writes the reprovision plan into the MachineInventory plan secret.
  4. Watches MachineInventory.status.plan.state until the plan reports Applied, then sets BaremetalMachine.status.providerID = baremetal:///<inventory-name>.
  5. On deletion, writes a clean plan and clears the owner annotations once the plan applies, returning the inventory to the pool.

Phase transitions: Pending → Allocated → Reprovisioning → Running; deletion: Running → Preparing → Deleted; failure: * → Failed.

Image catalog

A cluster-scoped ConfigMap (default name elemental-image-catalog in cpaas-system) that maps Machine.spec.version to an elemental upgrade image. The bare-metal provider chart renders this ConfigMap from provider.imageCatalog.images (global.registry.address is prepended to each repository) and from provider.imageCatalog.data (for fully-qualified overrides such as digest-pinned images). The reconciler hot-reloads the ConfigMap; a missing key is a terminal Failed state, not a fallback to a default image.

The image catalog also drives SeedImage.spec.baseImage: the ISO variant is the same repository with -iso appended and the same tag/digest as the catalog entry.

clean and reprovision plans

The only two plan types the provider writes into MachineInventory.spec.plan (annotated with baremetal.alauda.io/plan.type=clean|reprovision):

  • reprovision — runs on node attach. Writes NoCloud user-data / meta-data (and, when MachineInventory.spec.observedNetwork is non-empty, network-config v2), writes a cleanup marker, runs cloud-init clean --logs --seed, runs elemental upgrade --reboot=false --system <image>, and triggers a delayed reboot. After reboot, initramfs clears /var/lib/kubelet, /var/lib/containerd, /var/lib/etcd, /etc/kubernetes; cloud-init re-runs and performs kubeadm init / kubeadm join.
  • clean — runs on node detach. Stops kubelet, clears CRI workload, stops containerd. It explicitly does not run kubeadm reset, cloud-init clean, or elemental upgrade, and it does not reboot. Real cleanup is deferred to the reprovision plan that runs when the host is re-allocated.

The provider applies the plans through the upstream "single default plan secret" semantics; it does not use MachineInventorySelector or FleetBundle.

alive (control-plane HA)

alive is a set of static pods (keepalived + IPVS) plus a kube-lock Lease arbitrator that maintain the control-plane VIP described in BaremetalCluster.spec.controlPlaneLoadBalancer. The provider deploys alive as an AppRelease on the workload cluster once the first control-plane Node is Ready, and re-renders it whenever the control-plane membership changes. During the very first kubeadm init, the provider prepends a one-off ip addr add <vip>/32 dev eth0 command to the bootstrap so that the first node holds the VIP until alive takes over.

The VIP must live in the control-plane nodes' Layer-2 broadcast domain; vrid must be unique within that domain.

MachineInventory.spec.observedNetwork

Fork-only field populated by elemental-register from the live-ISO NIC state and reported back via the MsgObservedNetworkConfig registration message. It is consumed in two places:

  • During the first elemental install, when MachineInventory.spec.network is empty, the registration server falls back to translating observedNetwork into an nmconnections NetworkConfig so the on-disk OS keeps the same address that the live ISO had.
  • During every reprovision, the provider translates observedNetwork into cloud-init network-config v2 (a netplan subset) and writes it as the third NoCloud seed file. Explicit spec.network takes precedence in both cases.

API Group

All bare-metal infrastructure resources belong to infrastructure.cluster.x-k8s.io/v1beta1. Elemental resources belong to elemental.cattle.io/v1beta1.

ResourceDescriptionDocumentation
BaremetalClusterCluster-level infrastructure (control-plane VIP, endpoint, network type)BaremetalCluster
BaremetalMachineSingle infra Machine bound to one MachineInventoryBaremetalMachine
BaremetalMachineTemplateTemplate that binds a pool to a KubeadmControlPlane or MachineDeploymentBaremetalMachineTemplate
MachineInventoryPoolAllowed set of MachineInventory names for a clusterMachineInventoryPool

Supported Kubernetes Versions

The bare-metal provider supports the Kubernetes versions listed in its elemental-image-catalog. The default chart values ship two entries (the v1.33.7-2 and v1.34.5 baremetal-base-image releases); additional versions are introduced by appending entries to provider.imageCatalog.images (renders to global.registry.address/<repository>:<tag>) or by overriding with a full image reference under provider.imageCatalog.data. See OS Support Matrix for the matching component versions.

Requirements

  • Physical hosts (or PXE-bootable VMs for lab use) with BIOS or UEFI access to mount the SeedImage ISO.
  • A platform registry reachable from each host (used by both elemental install and elemental upgrade). Set global.registry.address and, when the registry is self-signed, leave global.registry.tlsVerify=false (chart default).
  • A control-plane VIP, free port (typically 6443), and a vrid unique within the control-plane Layer-2 broadcast domain.
  • One MachineInventoryPool per role (control plane, worker) sized to at least the target replica count plus headroom for upgrades.
  • TPM available on production hosts. Lab and PoC hosts can keep MachineRegistration.spec.config.elemental.registration.emulate-tpm: true to bypass real TPM hardware.

Documentation

For detailed instructions on using the bare-metal provider, see: