Automated Bare Metal Node Provisioning at Scale
November 12, 2024
7 min read
Automated Bare Metal Node Provisioning at Scale
Manual server provisioning doesn't scale. Learn how to automate bare metal deployment from power-on to production-ready.
Why Automation Matters
Manual provisioning problems:
- Time consuming (hours per server)
- Error prone
- Inconsistent configurations
- Doesn't scale beyond 10-20 servers
Automated provisioning benefits:
- Minutes per server
- Consistent and repeatable
- Scales to thousands of nodes
- Version controlled configurations
Provisioning Architecture
Complete Workflow
Power On → PXE Boot → DHCP → TFTP → Boot Image →
Kickstart/Preseed → OS Install → Post-Install →
Configuration Management → Production Ready
Required Infrastructure
Components:
- DHCP Server (IP assignment, PXE options)
- TFTP Server (Boot files)
- HTTP/FTP Server (OS repositories)
- Configuration Management (Ansible/Puppet/Chef)
- IPMI/BMC Access (Remote power control)PXE Boot Setup
DHCP Configuration
# /etc/dhcp/dhcpd.conf
subnet 10.1.1.0 netmask 255.255.255.0 {
range 10.1.1.100 10.1.1.200;
option routers 10.1.1.1;
option domain-name-servers 10.1.1.10;
# PXE Boot options
next-server 10.1.1.10; # TFTP server
filename "pxelinux.0";
# UEFI boot
if exists user-class and option user-class = "iPXE" {
filename "http://10.1.1.10/boot.ipxe";
}
}
# Static assignments for known servers
host server01 {
hardware ethernet aa:bb:cc:dd:ee:ff;
fixed-address 10.1.1.101;
filename "pxelinux.0";
}TFTP Server Setup
# Install TFTP server
yum install tftp-server syslinux-tftpboot
# Copy PXE boot files
cp /usr/share/syslinux/pxelinux.0 /var/lib/tftpboot/
cp /usr/share/syslinux/menu.c32 /var/lib/tftpboot/
cp /usr/share/syslinux/memdisk /var/lib/tftpboot/
cp /usr/share/syslinux/mboot.c32 /var/lib/tftpboot/
# Create menu structure
mkdir -p /var/lib/tftpboot/pxelinux.cfg
mkdir -p /var/lib/tftpboot/images/{rhel8,rhel9,ubuntu22}PXE Menu Configuration
# /var/lib/tftpboot/pxelinux.cfg/default
DEFAULT menu.c32
PROMPT 0
TIMEOUT 300
MENU TITLE Datacenter PXE Boot Menu
LABEL rhel8
MENU LABEL RHEL 8 Automated Install
KERNEL images/rhel8/vmlinuz
APPEND initrd=images/rhel8/initrd.img inst.ks=http://10.1.1.10/kickstart/rhel8.cfg
LABEL rhel9
MENU LABEL RHEL 9 Automated Install
KERNEL images/rhel9/vmlinuz
APPEND initrd=images/rhel9/initrd.img inst.ks=http://10.1.1.10/kickstart/rhel9.cfg
LABEL ubuntu22
MENU LABEL Ubuntu 22.04 Automated Install
KERNEL images/ubuntu22/vmlinuz
APPEND initrd=images/ubuntu22/initrd.img url=http://10.1.1.10/preseed/ubuntu22.cfg
LABEL local
MENU LABEL Boot from local disk
LOCALBOOT 0Kickstart Configuration (RHEL/CentOS)
Complete Kickstart File
# /var/www/html/kickstart/rhel8.cfg
#version=RHEL8
# System authorization
auth --enableshadow --passalgo=sha512
# Use network installation
url --url="http://10.1.1.10/repos/rhel8/"
# Keyboard and language
keyboard --vckeymap=us --xlayouts='us'
lang en_US.UTF-8
# Network configuration
network --bootproto=dhcp --device=eth0 --onboot=yes --ipv6=auto
network --hostname=server.datacenter.local
# Root password (encrypted)
rootpw --iscrypted $6$rounds=656000$encrypted_hash_here
# System timezone
timezone America/New_York --isUtc
# Disk partitioning
ignoredisk --only-use=sda
clearpart --all --initlabel --drives=sda
# LVM partitioning
part /boot --fstype="xfs" --ondisk=sda --size=1024
part /boot/efi --fstype="efi" --ondisk=sda --size=512
part pv.01 --fstype="lvmpv" --ondisk=sda --size=1 --grow
volgroup vg_root pv.01
logvol / --fstype="xfs" --size=51200 --name=lv_root --vgname=vg_root
logvol /var --fstype="xfs" --size=20480 --name=lv_var --vgname=vg_root
logvol /tmp --fstype="xfs" --size=10240 --name=lv_tmp --vgname=vg_root
logvol /home --fstype="xfs" --size=10240 --name=lv_home --vgname=vg_root
logvol swap --fstype="swap" --size=16384 --name=lv_swap --vgname=vg_root
# Bootloader
bootloader --location=mbr --boot-drive=sda
# Firewall and SELinux
firewall --enabled --ssh
selinux --enforcing
# Package selection
%packages
@^minimal-environment
@standard
vim
wget
curl
net-tools
bind-utils
tcpdump
htop
python3
ansible
%end
# Post-installation script
%post --log=/root/kickstart-post.log
# Update system
yum update -y
# Configure SSH
sed -i 's/#PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config
sed -i 's/#PubkeyAuthentication yes/PubkeyAuthentication yes/' /etc/ssh/sshd_config
# Add SSH keys
mkdir -p /root/.ssh
chmod 700 /root/.ssh
cat << 'EOF' > /root/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQ... admin@provisioning
EOF
chmod 600 /root/.ssh/authorized_keys
# Configure NTP
cat << 'EOF' > /etc/chrony.conf
server ntp1.datacenter.local iburst
server ntp2.datacenter.local iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
EOF
systemctl enable chronyd
# Register with configuration management
curl -o /tmp/bootstrap.sh http://10.1.1.10/scripts/ansible-bootstrap.sh
bash /tmp/bootstrap.sh
# Send completion notification
curl -X POST http://10.1.1.10/api/provision/complete \
-d "hostname=$(hostname)" \
-d "ip=$(ip addr show eth0 | grep 'inet ' | awk '{print $2}' | cut -d/ -f1)"
%end
# Reboot after installation
rebootModern Provisioning Tools
1. Foreman + Katello
Features:
- Web-based management
- Lifecycle management
- Content management
- Puppet integration
# Install Foreman
yum install foreman-installer
foreman-installer \
--enable-foreman-plugin-katello \
--enable-foreman-plugin-discovery
# Add compute resource
hammer compute-resource create \
--name "Datacenter" \
--provider "Libvirt" \
--url "qemu+ssh://root@hypervisor.local/system"2. MaaS (Metal as a Service)
Ubuntu-focused:
# Install MaaS
snap install maas
# Initialize
maas init region+rack --database-uri postgres://user:pass@localhost/maas
# Commission nodes
maas admin machines commission3. Cobbler
Lightweight option:
# Install Cobbler
yum install cobbler cobbler-web
# Add distro
cobbler import --name=rhel8 --path=/mnt/rhel8-dvd
# Add profile
cobbler profile add \
--name=rhel8-datacenter \
--distro=rhel8 \
--kickstart=/var/lib/cobbler/kickstarts/rhel8.ks
# Add system
cobbler system add \
--name=server01 \
--profile=rhel8-datacenter \
--mac=aa:bb:cc:dd:ee:ff \
--ip-address=10.1.1.101IPMI/BMC Automation
Remote Power Management
#!/usr/bin/env python3
import subprocess
import time
def ipmi_command(host, user, password, command):
"""Execute IPMI command"""
cmd = [
'ipmitool',
'-I', 'lanplus',
'-H', host,
'-U', user,
'-P', password,
'power', command
]
result = subprocess.run(cmd, capture_output=True, text=True)
return result.stdout.strip()
def provision_server(ipmi_host, ipmi_user, ipmi_pass):
"""Automated server provisioning"""
# Set boot device to PXE
print(f"Setting {ipmi_host} to PXE boot...")
subprocess.run([
'ipmitool', '-I', 'lanplus',
'-H', ipmi_host, '-U', ipmi_user, '-P', ipmi_pass,
'chassis', 'bootdev', 'pxe'
])
# Power cycle
print("Power cycling server...")
ipmi_command(ipmi_host, ipmi_user, ipmi_pass, 'cycle')
# Wait for installation
print("Waiting for OS installation (15 minutes)...")
time.sleep(900)
# Verify server is up
print("Verifying server status...")
status = ipmi_command(ipmi_host, ipmi_user, ipmi_pass, 'status')
print(f"Power status: {status}")
return True
# Provision multiple servers
servers = [
{'host': '10.0.1.101', 'user': 'admin', 'pass': 'password'},
{'host': '10.0.1.102', 'user': 'admin', 'pass': 'password'},
]
for server in servers:
provision_server(server['host'], server['user'], server['pass'])Post-Provisioning Configuration
Ansible Bootstrap
# ansible-bootstrap.yml
---
- name: Post-provision configuration
hosts: new_servers
become: yes
tasks:
- name: Update all packages
yum:
name: '*'
state: latest
- name: Install monitoring agent
yum:
name: telegraf
state: present
- name: Configure monitoring
template:
src: telegraf.conf.j2
dest: /etc/telegraf/telegraf.conf
notify: restart telegraf
- name: Install security updates
yum:
name: '*'
state: latest
security: yes
- name: Configure firewall
firewalld:
service: "{{ item }}"
permanent: yes
state: enabled
loop:
- ssh
- http
- https
- name: Join to domain
command: realm join -U admin datacenter.local
args:
creates: /etc/krb5.keytab
handlers:
- name: restart telegraf
service:
name: telegraf
state: restartedValidation and Testing
Automated Testing
#!/bin/bash
# validate-provisioning.sh
SERVER=$1
echo "Validating $SERVER provisioning..."
# Test SSH connectivity
if ssh -o ConnectTimeout=5 root@$SERVER "echo OK" &>/dev/null; then
echo "✓ SSH connectivity"
else
echo "✗ SSH connectivity FAILED"
exit 1
fi
# Check OS version
OS_VERSION=$(ssh root@$SERVER "cat /etc/redhat-release")
echo "✓ OS Version: $OS_VERSION"
# Check disk layout
DISK_LAYOUT=$(ssh root@$SERVER "lsblk -o NAME,SIZE,TYPE,MOUNTPOINT")
echo "✓ Disk Layout:"
echo "$DISK_LAYOUT"
# Check services
SERVICES=("sshd" "chronyd" "firewalld")
for service in "${SERVICES[@]}"; do
if ssh root@$SERVER "systemctl is-active $service" &>/dev/null; then
echo "✓ Service $service is running"
else
echo "✗ Service $service is NOT running"
fi
done
# Check network configuration
IP_ADDR=$(ssh root@$SERVER "ip addr show eth0 | grep 'inet ' | awk '{print \$2}'")
echo "✓ IP Address: $IP_ADDR"
echo "Provisioning validation complete!"Monitoring Provisioning Status
Status Dashboard
# provision-status.py
from flask import Flask, jsonify
import sqlite3
app = Flask(__name__)
@app.route('/api/provision/status')
def provision_status():
conn = sqlite3.connect('provisioning.db')
cursor = conn.cursor()
cursor.execute("""
SELECT hostname, ip, status, start_time, end_time
FROM provisions
WHERE DATE(start_time) = DATE('now')
ORDER BY start_time DESC
""")
provisions = []
for row in cursor.fetchall():
provisions.append({
'hostname': row[0],
'ip': row[1],
'status': row[2],
'start_time': row[3],
'end_time': row[4],
'duration': calculate_duration(row[3], row[4])
})
return jsonify(provisions)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)Best Practices
Configuration Management
-
Version control everything
- Kickstart files in Git
- Track changes
- Peer review
-
Test in staging
- Validate changes before production
- Use virtual machines for testing
-
Document standards
- Naming conventions
- IP addressing schemes
- Partition layouts
Security
Security Checklist:
- Encrypted root passwords in kickstart
- SSH key-based authentication only
- Firewall enabled by default
- SELinux enforcing
- Automatic security updates
- Minimal package installation
- Disable unnecessary servicesConclusion
Automated bare metal provisioning is essential for datacenter operations at scale. With proper tooling and processes, you can provision hundreds of servers per day with consistency and reliability.
Key Takeaways:
- Automate everything from power-on to production
- Use PXE boot for network-based installation
- Implement configuration management for post-install
- Test and validate every deployment
- Monitor provisioning status
- Version control all configurations
References:
- Red Hat Kickstart Documentation
- Debian Preseed Documentation
- PXE Specification (Intel)
- IPMI 2.0 Specification