Back to blog

Automated Bare Metal Node Provisioning at Scale

November 12, 2024
7 min read

Automated Bare Metal Node Provisioning at Scale

Manual server provisioning doesn't scale. Learn how to automate bare metal deployment from power-on to production-ready.

Why Automation Matters

Manual provisioning problems:

  • Time consuming (hours per server)
  • Error prone
  • Inconsistent configurations
  • Doesn't scale beyond 10-20 servers

Automated provisioning benefits:

  • Minutes per server
  • Consistent and repeatable
  • Scales to thousands of nodes
  • Version controlled configurations

Provisioning Architecture

Complete Workflow

Power On → PXE Boot → DHCP → TFTP → Boot Image → Kickstart/Preseed → OS Install → Post-Install → Configuration Management → Production Ready

Required Infrastructure

Components: - DHCP Server (IP assignment, PXE options) - TFTP Server (Boot files) - HTTP/FTP Server (OS repositories) - Configuration Management (Ansible/Puppet/Chef) - IPMI/BMC Access (Remote power control)

PXE Boot Setup

DHCP Configuration

# /etc/dhcp/dhcpd.conf subnet 10.1.1.0 netmask 255.255.255.0 { range 10.1.1.100 10.1.1.200; option routers 10.1.1.1; option domain-name-servers 10.1.1.10; # PXE Boot options next-server 10.1.1.10; # TFTP server filename "pxelinux.0"; # UEFI boot if exists user-class and option user-class = "iPXE" { filename "http://10.1.1.10/boot.ipxe"; } } # Static assignments for known servers host server01 { hardware ethernet aa:bb:cc:dd:ee:ff; fixed-address 10.1.1.101; filename "pxelinux.0"; }

TFTP Server Setup

# Install TFTP server yum install tftp-server syslinux-tftpboot # Copy PXE boot files cp /usr/share/syslinux/pxelinux.0 /var/lib/tftpboot/ cp /usr/share/syslinux/menu.c32 /var/lib/tftpboot/ cp /usr/share/syslinux/memdisk /var/lib/tftpboot/ cp /usr/share/syslinux/mboot.c32 /var/lib/tftpboot/ # Create menu structure mkdir -p /var/lib/tftpboot/pxelinux.cfg mkdir -p /var/lib/tftpboot/images/{rhel8,rhel9,ubuntu22}

PXE Menu Configuration

# /var/lib/tftpboot/pxelinux.cfg/default DEFAULT menu.c32 PROMPT 0 TIMEOUT 300 MENU TITLE Datacenter PXE Boot Menu LABEL rhel8 MENU LABEL RHEL 8 Automated Install KERNEL images/rhel8/vmlinuz APPEND initrd=images/rhel8/initrd.img inst.ks=http://10.1.1.10/kickstart/rhel8.cfg LABEL rhel9 MENU LABEL RHEL 9 Automated Install KERNEL images/rhel9/vmlinuz APPEND initrd=images/rhel9/initrd.img inst.ks=http://10.1.1.10/kickstart/rhel9.cfg LABEL ubuntu22 MENU LABEL Ubuntu 22.04 Automated Install KERNEL images/ubuntu22/vmlinuz APPEND initrd=images/ubuntu22/initrd.img url=http://10.1.1.10/preseed/ubuntu22.cfg LABEL local MENU LABEL Boot from local disk LOCALBOOT 0

Kickstart Configuration (RHEL/CentOS)

Complete Kickstart File

# /var/www/html/kickstart/rhel8.cfg #version=RHEL8 # System authorization auth --enableshadow --passalgo=sha512 # Use network installation url --url="http://10.1.1.10/repos/rhel8/" # Keyboard and language keyboard --vckeymap=us --xlayouts='us' lang en_US.UTF-8 # Network configuration network --bootproto=dhcp --device=eth0 --onboot=yes --ipv6=auto network --hostname=server.datacenter.local # Root password (encrypted) rootpw --iscrypted $6$rounds=656000$encrypted_hash_here # System timezone timezone America/New_York --isUtc # Disk partitioning ignoredisk --only-use=sda clearpart --all --initlabel --drives=sda # LVM partitioning part /boot --fstype="xfs" --ondisk=sda --size=1024 part /boot/efi --fstype="efi" --ondisk=sda --size=512 part pv.01 --fstype="lvmpv" --ondisk=sda --size=1 --grow volgroup vg_root pv.01 logvol / --fstype="xfs" --size=51200 --name=lv_root --vgname=vg_root logvol /var --fstype="xfs" --size=20480 --name=lv_var --vgname=vg_root logvol /tmp --fstype="xfs" --size=10240 --name=lv_tmp --vgname=vg_root logvol /home --fstype="xfs" --size=10240 --name=lv_home --vgname=vg_root logvol swap --fstype="swap" --size=16384 --name=lv_swap --vgname=vg_root # Bootloader bootloader --location=mbr --boot-drive=sda # Firewall and SELinux firewall --enabled --ssh selinux --enforcing # Package selection %packages @^minimal-environment @standard vim wget curl net-tools bind-utils tcpdump htop python3 ansible %end # Post-installation script %post --log=/root/kickstart-post.log # Update system yum update -y # Configure SSH sed -i 's/#PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config sed -i 's/#PubkeyAuthentication yes/PubkeyAuthentication yes/' /etc/ssh/sshd_config # Add SSH keys mkdir -p /root/.ssh chmod 700 /root/.ssh cat << 'EOF' > /root/.ssh/authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQ... admin@provisioning EOF chmod 600 /root/.ssh/authorized_keys # Configure NTP cat << 'EOF' > /etc/chrony.conf server ntp1.datacenter.local iburst server ntp2.datacenter.local iburst driftfile /var/lib/chrony/drift makestep 1.0 3 rtcsync EOF systemctl enable chronyd # Register with configuration management curl -o /tmp/bootstrap.sh http://10.1.1.10/scripts/ansible-bootstrap.sh bash /tmp/bootstrap.sh # Send completion notification curl -X POST http://10.1.1.10/api/provision/complete \ -d "hostname=$(hostname)" \ -d "ip=$(ip addr show eth0 | grep 'inet ' | awk '{print $2}' | cut -d/ -f1)" %end # Reboot after installation reboot

Modern Provisioning Tools

1. Foreman + Katello

Features:

  • Web-based management
  • Lifecycle management
  • Content management
  • Puppet integration
# Install Foreman yum install foreman-installer foreman-installer \ --enable-foreman-plugin-katello \ --enable-foreman-plugin-discovery # Add compute resource hammer compute-resource create \ --name "Datacenter" \ --provider "Libvirt" \ --url "qemu+ssh://root@hypervisor.local/system"

2. MaaS (Metal as a Service)

Ubuntu-focused:

# Install MaaS snap install maas # Initialize maas init region+rack --database-uri postgres://user:pass@localhost/maas # Commission nodes maas admin machines commission

3. Cobbler

Lightweight option:

# Install Cobbler yum install cobbler cobbler-web # Add distro cobbler import --name=rhel8 --path=/mnt/rhel8-dvd # Add profile cobbler profile add \ --name=rhel8-datacenter \ --distro=rhel8 \ --kickstart=/var/lib/cobbler/kickstarts/rhel8.ks # Add system cobbler system add \ --name=server01 \ --profile=rhel8-datacenter \ --mac=aa:bb:cc:dd:ee:ff \ --ip-address=10.1.1.101

IPMI/BMC Automation

Remote Power Management

#!/usr/bin/env python3 import subprocess import time def ipmi_command(host, user, password, command): """Execute IPMI command""" cmd = [ 'ipmitool', '-I', 'lanplus', '-H', host, '-U', user, '-P', password, 'power', command ] result = subprocess.run(cmd, capture_output=True, text=True) return result.stdout.strip() def provision_server(ipmi_host, ipmi_user, ipmi_pass): """Automated server provisioning""" # Set boot device to PXE print(f"Setting {ipmi_host} to PXE boot...") subprocess.run([ 'ipmitool', '-I', 'lanplus', '-H', ipmi_host, '-U', ipmi_user, '-P', ipmi_pass, 'chassis', 'bootdev', 'pxe' ]) # Power cycle print("Power cycling server...") ipmi_command(ipmi_host, ipmi_user, ipmi_pass, 'cycle') # Wait for installation print("Waiting for OS installation (15 minutes)...") time.sleep(900) # Verify server is up print("Verifying server status...") status = ipmi_command(ipmi_host, ipmi_user, ipmi_pass, 'status') print(f"Power status: {status}") return True # Provision multiple servers servers = [ {'host': '10.0.1.101', 'user': 'admin', 'pass': 'password'}, {'host': '10.0.1.102', 'user': 'admin', 'pass': 'password'}, ] for server in servers: provision_server(server['host'], server['user'], server['pass'])

Post-Provisioning Configuration

Ansible Bootstrap

# ansible-bootstrap.yml --- - name: Post-provision configuration hosts: new_servers become: yes tasks: - name: Update all packages yum: name: '*' state: latest - name: Install monitoring agent yum: name: telegraf state: present - name: Configure monitoring template: src: telegraf.conf.j2 dest: /etc/telegraf/telegraf.conf notify: restart telegraf - name: Install security updates yum: name: '*' state: latest security: yes - name: Configure firewall firewalld: service: "{{ item }}" permanent: yes state: enabled loop: - ssh - http - https - name: Join to domain command: realm join -U admin datacenter.local args: creates: /etc/krb5.keytab handlers: - name: restart telegraf service: name: telegraf state: restarted

Validation and Testing

Automated Testing

#!/bin/bash # validate-provisioning.sh SERVER=$1 echo "Validating $SERVER provisioning..." # Test SSH connectivity if ssh -o ConnectTimeout=5 root@$SERVER "echo OK" &>/dev/null; then echo "✓ SSH connectivity" else echo "✗ SSH connectivity FAILED" exit 1 fi # Check OS version OS_VERSION=$(ssh root@$SERVER "cat /etc/redhat-release") echo "✓ OS Version: $OS_VERSION" # Check disk layout DISK_LAYOUT=$(ssh root@$SERVER "lsblk -o NAME,SIZE,TYPE,MOUNTPOINT") echo "✓ Disk Layout:" echo "$DISK_LAYOUT" # Check services SERVICES=("sshd" "chronyd" "firewalld") for service in "${SERVICES[@]}"; do if ssh root@$SERVER "systemctl is-active $service" &>/dev/null; then echo "✓ Service $service is running" else echo "✗ Service $service is NOT running" fi done # Check network configuration IP_ADDR=$(ssh root@$SERVER "ip addr show eth0 | grep 'inet ' | awk '{print \$2}'") echo "✓ IP Address: $IP_ADDR" echo "Provisioning validation complete!"

Monitoring Provisioning Status

Status Dashboard

# provision-status.py from flask import Flask, jsonify import sqlite3 app = Flask(__name__) @app.route('/api/provision/status') def provision_status(): conn = sqlite3.connect('provisioning.db') cursor = conn.cursor() cursor.execute(""" SELECT hostname, ip, status, start_time, end_time FROM provisions WHERE DATE(start_time) = DATE('now') ORDER BY start_time DESC """) provisions = [] for row in cursor.fetchall(): provisions.append({ 'hostname': row[0], 'ip': row[1], 'status': row[2], 'start_time': row[3], 'end_time': row[4], 'duration': calculate_duration(row[3], row[4]) }) return jsonify(provisions) if __name__ == '__main__': app.run(host='0.0.0.0', port=5000)

Best Practices

Configuration Management

  1. Version control everything

    • Kickstart files in Git
    • Track changes
    • Peer review
  2. Test in staging

    • Validate changes before production
    • Use virtual machines for testing
  3. Document standards

    • Naming conventions
    • IP addressing schemes
    • Partition layouts

Security

Security Checklist: - Encrypted root passwords in kickstart - SSH key-based authentication only - Firewall enabled by default - SELinux enforcing - Automatic security updates - Minimal package installation - Disable unnecessary services

Conclusion

Automated bare metal provisioning is essential for datacenter operations at scale. With proper tooling and processes, you can provision hundreds of servers per day with consistency and reliability.

Key Takeaways:

  • Automate everything from power-on to production
  • Use PXE boot for network-based installation
  • Implement configuration management for post-install
  • Test and validate every deployment
  • Monitor provisioning status
  • Version control all configurations

References:

  • Red Hat Kickstart Documentation
  • Debian Preseed Documentation
  • PXE Specification (Intel)
  • IPMI 2.0 Specification