treeru.com

When setting up a new server, it's tempting to install the OS and jump straight into deploying services. But if auto-boot after power loss isn't configured, you'll be driving to the office at 3 AM. If a kernel panic strikes, you'll need physical monitor access. If disk failure goes undetected, you'll lose data. This guide shares the 8-step checklist verified across 16 production servers.

8 Steps

Setup Checklist

16

Servers Deployed

10 sec

Panic Auto-Reboot

1 GB

Log Retention Limit

Why You Need a Checklist

Setting up one server from memory is fine. But when you manage multiple machines, questions like "Did I register SSH keys on this one?" or "Was journald configured?" start creeping in. We've actually had incidents on servers where a step was skipped.

What Happens When You Skip a Step

BIOS Auto-BootServer stays off after power outage — midnight office visit required
SSH Key AuthPassword auth exposed — brute-force attack target
kernel.panicKernel panic freezes the system — physical monitor access needed
journald PersistenceLogs lost on reboot — impossible to trace failure causes
smartmontoolsDisk degradation undetected — risk of data loss

With a checklist, every server gets the same quality setup regardless of who installs it. Create it once, and it scales from 1 server to 100.

1BIOS — Auto-Boot After Power Loss

Servers run headless by default. When a power outage occurs, the machine must boot automatically when power is restored. Without this setting, you'll need to physically press the power button after every outage.

BIOS Configuration

BIOS → Power Management (or Advanced → ACPI)

  Restore on AC Power Loss → [Power On]
  ─────────────────────────────────────
  Power Off  : Stay off after outage (default)
  Power On   : Auto-boot after outage ✅
  Last State : Restore pre-outage state

The menu name varies by motherboard manufacturer: "AC Power Recovery", "After Power Failure", or "Restore on AC Power Loss". Always set it to Power On.

Mini PCs (like N100-based systems) can have tricky BIOS access, so it's best to verify this setting during initial installation. This single configuration enables remote recovery after overnight power outages.

2Install SSH Server

Ubuntu Desktop doesn't ship with SSH server pre-installed. Even the Server edition may miss it if you skip the checkbox during installation.

Install and Enable SSH

# Install
sudo apt update && sudo apt install -y openssh-server

# Start and enable on boot
sudo systemctl enable --now ssh

# Verify status
sudo systemctl status ssh
# ● ssh.service - OpenBSD Secure Shell server
#   Active: active (running)

Once SSH is running, you can connect from another machine with ssh user@server-ip. In the next step, we'll switch from password to key-based authentication.

3SSH Key Authentication

Password authentication is vulnerable to brute-force attacks. Switching to ed25519 key authentication lets you connect securely without passwords. For the full walkthrough, see our SSH Key Multi-Server Management guide. Here are the essential commands.

Run on Your Admin Machine

# Generate key pair (skip if you already have one)
ssh-keygen -t ed25519 -C "admin@office"

# Copy public key to the new server
ssh-copy-id -i ~/.ssh/id_ed25519.pub user@new-server-ip

Disable Password Authentication on the Server

# Edit /etc/ssh/sshd_config
sudo sed -i 's/^#\?PasswordAuthentication.*/PasswordAuthentication no/' \
  /etc/ssh/sshd_config

# Restart SSH
sudo systemctl restart ssh

Always verify key-based login works before disabling password authentication. If you disable passwords without a registered key, you'll lock yourself out.

4SSH Config Aliases

Memorizing IP addresses for multiple servers gets old fast. Add aliases to ~/.ssh/config and connect with ssh web-server instead.

~/.ssh/config Example

# Admin machine ~/.ssh/config

Host web-server
    HostName 10.0.10.10
    User admin
    IdentityFile ~/.ssh/id_ed25519

Host gpu-server
    HostName 10.0.10.20
    User admin
    IdentityFile ~/.ssh/id_ed25519

Host backup-server
    HostName 10.0.10.30
    User admin
    IdentityFile ~/.ssh/id_ed25519

This works identically on Linux, macOS, and Windows (OpenSSH). On Windows, the config path is C:\Users\YourName\.ssh\config with the same format.

OSConfig PathNotes
Linux / macOS~/.ssh/configBuilt-in
Windows 10+%USERPROFILE%\.ssh\configOpenSSH built-in
Windows (PuTTY)Use saved sessions instead

5kernel.panic Auto-Reboot

By default, a kernel panic leaves the server frozen. You'd need to connect a monitor and manually reboot — a serious problem for remote servers. Setting kernel.panic triggers an automatic reboot after a panic event.

Configuration

# Add to /etc/sysctl.conf
echo "kernel.panic = 10" | sudo tee -a /etc/sysctl.conf

# Apply immediately
sudo sysctl -p

# Verify
sysctl kernel.panic
# kernel.panic = 10
ValueBehaviorBest For
0 (default)No reboot (stays frozen)Dev environments (debugging needed)
10 (recommended)Auto-reboot after 10 secondsProduction servers (uptime priority)
30Auto-reboot after 30 secondsWhen crash dump collection is needed

kernel.panic = 10 means "reboot 10 seconds after a kernel panic." 10 seconds is enough for logs to flush to disk while still recovering quickly.

6journald Persistent Logging

Ubuntu's systemd-journald defaults to volatile mode. All logs vanish on reboot. After a crash and reboot, you can't answer "What happened right before the reboot?"

Enable Persistent Storage

# Create log directory
sudo mkdir -p /var/log/journal

# Edit /etc/systemd/journald.conf
sudo tee /etc/systemd/journald.conf > /dev/null << 'JEOF'
[Journal]
Storage=persistent
SystemMaxUse=1G
SystemMaxFileSize=100M
MaxRetentionSec=3month
JEOF

# Restart journald
sudo systemctl restart systemd-journald

# Verify: list previous boot logs
journalctl --list-boots
SettingValueDescription
StoragepersistentSave logs to disk permanently
SystemMaxUse1GMaximum total log size
SystemMaxFileSize100MMaximum single log file size
MaxRetentionSec3monthAuto-delete logs older than 3 months

Capping at SystemMaxUse=1G prevents excessive disk usage while safely retaining the last 3 months of logs for troubleshooting.

7smartmontools + Email Alerts

Disk failures strike without warning. But by monitoring SMART (Self-Monitoring, Analysis and Reporting Technology), you can catch early signs of degradation. smartmontools automates SMART checks and sends email alerts on anomalies.

Installation and Setup

# Install
sudo apt install -y smartmontools

# Check SMART support
sudo smartctl -i /dev/sda
# SMART support is: Available
# SMART support is: Enabled

# Current health status
sudo smartctl -H /dev/sda
# SMART overall-health self-assessment test result: PASSED

/etc/smartd.conf Configuration

# /etc/smartd.conf example
# DEVICESCAN auto-detects all disks

DEVICESCAN \
  -d removable \
  -n standby \
  -s (S/../../1/02|L/../../5/03) \
  -W 0,45,50 \
  -m admin@example.com \
  -M exec /usr/share/smartmontools/smartd_warning.sh

# -s : Short test every Monday 2AM, Long test every Friday 3AM
# -W 0,45,50 : Warning at 45°C, Critical at 50°C
# -m : Alert recipient email
# -M exec : Alert notification script

Enable the Service

# Start smartd service
sudo systemctl enable --now smartd

# Verify status
sudo systemctl status smartd
# ● smartd.service - Self Monitoring and Reporting Technology
#   Active: active (running)
Test TypeScheduleDurationScope
Short self-testEvery Monday~2 minBasic read/electrical checks
Long self-testEvery Friday~2 hours (varies by capacity)Full surface scan

For NVMe SSDs, use smartctl -i /dev/nvme0. NVMe SMART attributes differ from SATA, so thresholds need separate configuration. Email alerts require mailutils + SMTP relay setup.

8One-Command Verification

After completing all 7 steps, run a verification script to confirm every setting is properly applied. Automated checks are faster and more reliable than manually verifying each item.

server-check.sh

#!/bin/bash
# server-check.sh — Server initial setup verification script

echo "=== Server Setup Verification ==="
echo ""

# 1. SSH service
echo -n "[1] SSH Service: "
systemctl is-active ssh > /dev/null 2>&1 && echo "✅ Running" || echo "❌ Not running"

# 2. Password authentication
echo -n "[2] Password Auth: "
grep -q "^PasswordAuthentication no" /etc/ssh/sshd_config && \
  echo "✅ Disabled" || echo "⚠️  Still enabled"

# 3. kernel.panic
echo -n "[3] kernel.panic: "
val=$(sysctl -n kernel.panic 2>/dev/null)
[ "$val" -gt 0 ] 2>/dev/null && \
  echo "✅ Reboot after ${val}s" || echo "❌ Not configured (0)"

# 4. journald persistence
echo -n "[4] journald Storage: "
[ -d /var/log/journal ] && echo "✅ Persistent" || echo "❌ Volatile"

# 5. smartd service
echo -n "[5] smartd Service: "
systemctl is-active smartd > /dev/null 2>&1 && echo "✅ Running" || echo "❌ Not running"

echo ""
echo "=== Verification Complete ==="

Sample Output

=== Server Setup Verification ===

[1] SSH Service: ✅ Running
[2] Password Auth: ✅ Disabled
[3] kernel.panic: ✅ Reboot after 10s
[4] journald Storage: ✅ Persistent
[5] smartd Service: ✅ Running

=== Verification Complete ===

Transfer this script to new servers via scp and run it. If any item shows a failure mark, revisit that step. For bulk-checking multiple servers, combine it with SSH config aliases to run ssh web-server 'bash -s' < server-check.sh for remote batch verification.

Checklist Template

Copy this table and use it every time you set up a new server. When every item is checked off, the server is production-ready.

#ItemVerification Command / MethodExpected Result
1BIOS Auto-BootBIOS → AC Power LossPower On
2SSH Serversystemctl is-active sshactive
3SSH Key Authssh alias (no password prompt)Connection succeeds
4Password Auth Disabledgrep PasswordAuthentication /etc/ssh/sshd_configno
5SSH Config Aliases~/.ssh/config Host entries addedConnect by name
6kernel.panicsysctl kernel.panic10
7journald Persistencels /var/log/journalDirectory exists
8smartmontoolssystemctl is-active smartdactive

This checklist covers the minimum essential settings. Depending on server role, you may also want to add firewall (ufw), fail2ban, time sync (chrony), and swap configuration. For monitoring, see our Grafana + Prometheus monitoring guide.

After applying this checklist across 16 servers, auto-recovery from power outages, automatic kernel panic reboots, and proactive disk anomaly detection all worked flawlessly. Repeating this same procedure for every new server ensures consistently reliable infrastructure.