treeru.com

Building AI Server Infrastructure in the Office — A 16-Server Setup Guide

We built a full AI infrastructure in the office — no cloud required. Starting from a single GPU-equipped AI server, we expanded to 16 machines with dedicated roles: reverse proxy, project servers, lightweight monitoring nodes, and cold backup storage. Every server is connected via an SSH key mesh network, bandwidth is measured and tiered, and external servers are fully isolated. This is our practical guide to on-premise AI infrastructure.

Server Role Separation

We organized 16 servers into 5 role groups. Each group operates independently — a failure in one group never propagates to another. This isolation is the foundation of our entire infrastructure.

AI Brain Server (1 unit)

The core of all AI inference. Equipped with an AMD Ryzen 9950X3D, 96GB DDR5 RAM, and an NVIDIA RTX PRO 6000 with 96GB VRAM. Storage uses a 3-tier architecture: Intel Optane 905P for hot data, Samsung 980 PRO for warm data, and standard NVMe for cold storage.

AI Auxiliary Server (1 unit)

Handles FAQ and simple queries as a lightweight AI node. Runs on a Ryzen 7500F with an RTX 5060 Ti (16GB VRAM). Through cross-server inference, this server increases the main AI server's throughput by 70%.

Reverse Proxy (1 unit)

The single entry point for all external traffic. An Intel N100 mini PC running Caddy handles SSL termination, domain routing, and load balancing. Low-power and purpose-built — no GPU, no unnecessary services.

Project Servers (7 units)

Each project server runs a dedicated web service, API, or database. CPUs range from Ryzen 5700G to 7840HS with 32–64GB RAM. One project per server means resource conflicts are impossible and deployments are independent.

Lightweight Servers (5 units)

Intel H255-based mini PCs with 16GB RAM handling monitoring, log collection, and lightweight APIs. Low power draw and silent operation make them ideal for auxiliary tasks that run 24/7.

Cold Backup (1 unit)

An NFS-based network storage server with two Seagate IronWolf 12TB drives. Regularly backs up critical data from all servers. Dual NICs provide network redundancy.

Why Separate Roles?

PrincipleBenefit
Fault isolationA project server crash never affects AI inference
Resource independenceGPU/VRAM usage does not compete with web service CPU/RAM
Horizontal scalingAdd a project server for new services; add a GPU for more AI capacity

Network Bandwidth Tiers

All servers share the same subnet (10.0.10.0/24), but actual bandwidth varies by NIC capability. We measured every link with iperf3 --bidir and organized servers into two tiers. After cable and port replacements, we eliminated all 100Mbps bottlenecks — every server now achieves 1Gbps or higher.

Tier 1: 2.5Gbps Connections (5 servers)

Server RoleCPUMeasured TXMeasured RX
Project Server ARyzen 5700G2.35 Gbps2.18 Gbps
Web ServerRyzen 7840HS2.33 Gbps2.34 Gbps
Project Server BRyzen 5800U2.32 Gbps2.33 Gbps
Lightweight Server DIntel H2552.34 Gbps2.25 Gbps
AI AuxiliaryRyzen 7500F2.31 Gbps2.24 Gbps

These 2.5Gbps NICs connect directly to the AI brain server's 10Gbps NIC. The AI auxiliary server was upgraded from 100Mbps to 2.5Gbps — a 24x improvement — simply by replacing the cable and switch port.

Tier 2: 1Gbps Connections (10 servers)

Server RoleCPUMeasured TXMeasured RX
Lightweight Server AIntel H255923 Mbps860 Mbps
Project Server CRyzen 5825U922 Mbps883 Mbps
Project Server DRyzen 5825U921 Mbps925 Mbps
Project Server ERyzen 5825U921 Mbps919 Mbps
Proxy ServerIntel N100921 Mbps938 Mbps
Backup ServerRyzen 5825U920 Mbps838 Mbps

All 1Gbps-tier servers achieve 920+ Mbps TX. Previously, four servers ran at 456–731 Mbps due to faulty cables or mismatched port speeds. After diagnosing and replacing hardware, every server reached its rated speed.

Network Optimization Results

ServerBeforeAfterAction
AI Auxiliary95 Mbps2.31 GbpsCable/port swap — upgraded to 2.5Gbps (24x)
Project Server D456 Mbps921 MbpsReached full 1Gbps (2x)
Project Server C610 Mbps922 MbpsNormalized (51% improvement)
Project Server E731 Mbps921 MbpsNormalized (26% improvement)

SSH Mesh Security

All 16 servers have 17 SSH public keys cross-registered, enabling bidirectional key authentication between any pair of servers. Password authentication is completely disabled across the internal network.

Key Structure

The 17 keys include: 1 external workstation key (access to all servers), 1 AI brain server key, 1 proxy server key, 1 web server key, 12 project/lightweight server keys, and 1 backup server key. All keys use the Ed25519 algorithm. Every server's authorized_keys file contains all 17 public keys, so any server can SSH into any other server instantly.

Security Configuration

SettingInternal (16 servers)External BackupPurpose
PasswordAuthenticationOFFONBlock password login (key-only)
PermitRootLoginOFFDefaultBlock direct root access
MaxAuthTries36Limit attempts against brute force
sudo NOPASSWDEnabledEnabledAutomation-friendly sudo
DenyUsersEnabledBlock external server IPs

Why Mesh Instead of a Bastion Host?

Traditional bastion (jump host) architectures create a single point of failure. In our setup, every server needs to communicate with every other server for deployments, monitoring, and backups. A bastion host would bottleneck all inter-server traffic. With a mesh topology, we distribute 17 keys to all servers — any server can reach any other directly. Security is maintained through disabled password auth and a MaxAuthTries limit of 3.

External Server Isolation

The backup server sits on an external network and is completely blocked from accessing internal servers. Internal servers can reach the backup server (for pushing backups), but the reverse direction is denied at two layers.

Dual-Layer Blocking

LayerMethodEffect
NetworkRemove internal subnet routing from WireGuard VPN configExternal server cannot ping or reach internal IPs
SSHDenyUsers directive on all 16 internal serversEven if VPN is bypassed, SSH authentication is rejected

Network-level blocking alone is insufficient — a VPN configuration change could bypass it. Adding SSH-level denial means that even if the network layer is compromised, authentication still fails. If the external backup server is ever breached, the attacker has zero access to internal infrastructure.

CPU Turbo Boost Control

The biggest enemy of 24/7 server operation is heat. Disabling CPU turbo boost caps the maximum clock speed, but stabilizes temperature and power consumption — critical for long-term reliability.

Which Servers Get Boost Disabled?

Server GroupCPUBoost DisabledReason
AI BrainRyzen 9950X3DNoMaximum inference performance required
AI AuxiliaryRyzen 7500FNoInference performance needed
ProxyIntel N100YesReverse proxy needs minimal CPU
Web ServerRyzen 7840HSYesWeb serving does not need boost
Project Servers (5)Ryzen 5825U x5YesLong-term stability over peak speed
Backup ServerRyzen 5825UYesNFS serving does not need boost
Lightweight (4)Intel H255 x4NoLow-power CPUs — boost impact is minimal

Implementation: AMD vs Intel

On AMD systems using amd-pstate-epp, writing 0 to the/sys/.../boost file disables turbo boost. On Intel systems usingintel_pstate, writing 1 to /sys/.../no_turboachieves the same effect — note the inverted logic. Both are implemented as systemd services that apply at boot and can be toggled on demand:

# AMD: echo 0 to disable boost
[Service]
ExecStart=/bin/bash -c 'echo 0 > /sys/.../boost'
ExecStop=/bin/bash -c 'echo 1 > /sys/.../boost'
RemainAfterExit=yes

# Intel: echo 1 to disable (inverted)
[Service]
ExecStart=/bin/bash -c 'echo 1 > /sys/.../no_turbo'
ExecStop=/bin/bash -c 'echo 0 > /sys/.../no_turbo'
RemainAfterExit=yes

One critical caveat: power-profiles-daemon may re-enable boost on startup. Set After=power-profiles-daemon.service in your systemd unit to ensure correct ordering. AI servers keep boost enabled — instead, we manage GPU temperatures through power limit tuning.

Operational Principles — Summary

After months of building and operating this 16-server infrastructure, we follow five principles that keep everything running reliably:

  1. Separate servers by role. AI, proxy, project, and backup servers are physically isolated. A crash in one role group never affects another.
  2. Key-only SSH, passwords off. 17 SSH keys cross-registered across all servers with password authentication disabled. Brute force attacks are eliminated.
  3. Dual-layer external isolation. Network (VPN routing) plus SSH (DenyUsers) blocks external servers from reaching internal infrastructure — even if one layer fails.
  4. Disable CPU boost for stability. All non-AI servers run without turbo boost. For 24/7 operation, thermal and power stability equals reliability.
  5. Measure bandwidth before placement. Use iperf3 to measure actual throughput, then place critical services on the fastest links. Data-driven decisions, not guesses.

Infrastructure Summary

CategoryStatus
Server count16 internal + 1 external backup
Network10.0.10.0/24 — 5 servers at 2.5Gbps, 10 at 1Gbps (all above 920Mbps)
SSH security17-key mesh, passwords OFF, MaxAuthTries 3
External isolationWireGuard + DenyUsers dual blocking
CPU boost controlDisabled on 8 servers (project/proxy/backup)
AI GPUsRTX PRO 6000 + RTX 5060 Ti (cross-server inference)
BackupIronWolf 12TB x2, NFS network storage

Running AI infrastructure on-premise without cloud services is entirely feasible. The key is not the number of servers — it is the discipline of role separation, security, and stability. As servers multiply, adding Grafana + Prometheus monitoring becomes essential for real-time visibility. These operational principles are what make advanced capabilities like cross-server inference, concurrent load testing, and GPU power optimization actually work in production.