Building AI Server Infrastructure in the Office — A 16-Server Setup Guide

2026-01-06

Treeru

We built a full AI infrastructure in the office — no cloud required. Starting from a single GPU-equipped AI server, we expanded to 16 machines with dedicated roles: reverse proxy, project servers, lightweight monitoring nodes, and cold backup storage. Every server is connected via an SSH key mesh network, bandwidth is measured and tiered, and external servers are fully isolated. This is our practical guide to on-premise AI infrastructure.

Server Role Separation

We organized 16 servers into 5 role groups. Each group operates independently — a failure in one group never propagates to another. This isolation is the foundation of our entire infrastructure.

AI Brain Server (1 unit)

The core of all AI inference. Equipped with an AMD Ryzen 9950X3D, 96GB DDR5 RAM, and an NVIDIA RTX PRO 6000 with 96GB VRAM. Storage uses a 3-tier architecture: Intel Optane 905P for hot data, Samsung 980 PRO for warm data, and standard NVMe for cold storage.

AI Auxiliary Server (1 unit)

Handles FAQ and simple queries as a lightweight AI node. Runs on a Ryzen 7500F with an RTX 5060 Ti (16GB VRAM). Through cross-server inference, this server increases the main AI server's throughput by 70%.

Reverse Proxy (1 unit)

The single entry point for all external traffic. An Intel N100 mini PC running Caddy handles SSL termination, domain routing, and load balancing. Low-power and purpose-built — no GPU, no unnecessary services.

Project Servers (7 units)

Each project server runs a dedicated web service, API, or database. CPUs range from Ryzen 5700G to 7840HS with 32–64GB RAM. One project per server means resource conflicts are impossible and deployments are independent.

Lightweight Servers (5 units)

Intel H255-based mini PCs with 16GB RAM handling monitoring, log collection, and lightweight APIs. Low power draw and silent operation make them ideal for auxiliary tasks that run 24/7.

Cold Backup (1 unit)

An NFS-based network storage server with two Seagate IronWolf 12TB drives. Regularly backs up critical data from all servers. Dual NICs provide network redundancy.

Why Separate Roles?

Principle	Benefit
Fault isolation	A project server crash never affects AI inference
Resource independence	GPU/VRAM usage does not compete with web service CPU/RAM
Horizontal scaling	Add a project server for new services; add a GPU for more AI capacity

Network Bandwidth Tiers

All servers share the same subnet (10.0.10.0/24), but actual bandwidth varies by NIC capability. We measured every link with iperf3 --bidir and organized servers into two tiers. After cable and port replacements, we eliminated all 100Mbps bottlenecks — every server now achieves 1Gbps or higher.

Tier 1: 2.5Gbps Connections (5 servers)

Server Role	CPU	Measured TX	Measured RX
Project Server A	Ryzen 5700G	2.35 Gbps	2.18 Gbps
Web Server	Ryzen 7840HS	2.33 Gbps	2.34 Gbps
Project Server B	Ryzen 5800U	2.32 Gbps	2.33 Gbps
Lightweight Server D	Intel H255	2.34 Gbps	2.25 Gbps
AI Auxiliary	Ryzen 7500F	2.31 Gbps	2.24 Gbps

These 2.5Gbps NICs connect directly to the AI brain server's 10Gbps NIC. The AI auxiliary server was upgraded from 100Mbps to 2.5Gbps — a 24x improvement — simply by replacing the cable and switch port.

Tier 2: 1Gbps Connections (10 servers)

Server Role	CPU	Measured TX	Measured RX
Lightweight Server A	Intel H255	923 Mbps	860 Mbps
Project Server C	Ryzen 5825U	922 Mbps	883 Mbps
Project Server D	Ryzen 5825U	921 Mbps	925 Mbps
Project Server E	Ryzen 5825U	921 Mbps	919 Mbps
Proxy Server	Intel N100	921 Mbps	938 Mbps
Backup Server	Ryzen 5825U	920 Mbps	838 Mbps

All 1Gbps-tier servers achieve 920+ Mbps TX. Previously, four servers ran at 456–731 Mbps due to faulty cables or mismatched port speeds. After diagnosing and replacing hardware, every server reached its rated speed.

Network Optimization Results

Server	Before	After	Action
AI Auxiliary	95 Mbps	2.31 Gbps	Cable/port swap — upgraded to 2.5Gbps (24x)
Project Server D	456 Mbps	921 Mbps	Reached full 1Gbps (2x)
Project Server C	610 Mbps	922 Mbps	Normalized (51% improvement)
Project Server E	731 Mbps	921 Mbps	Normalized (26% improvement)

SSH Mesh Security

All 16 servers have 17 SSH public keys cross-registered, enabling bidirectional key authentication between any pair of servers. Password authentication is completely disabled across the internal network.

Key Structure

The 17 keys include: 1 external workstation key (access to all servers), 1 AI brain server key, 1 proxy server key, 1 web server key, 12 project/lightweight server keys, and 1 backup server key. All keys use the Ed25519 algorithm. Every server's authorized_keys file contains all 17 public keys, so any server can SSH into any other server instantly.

Security Configuration

Setting	Internal (16 servers)	External Backup	Purpose
PasswordAuthentication	OFF	ON	Block password login (key-only)
PermitRootLogin	OFF	Default	Block direct root access
MaxAuthTries	3	6	Limit attempts against brute force
sudo NOPASSWD	Enabled	Enabled	Automation-friendly sudo
DenyUsers	Enabled	—	Block external server IPs

Why Mesh Instead of a Bastion Host?

Traditional bastion (jump host) architectures create a single point of failure. In our setup, every server needs to communicate with every other server for deployments, monitoring, and backups. A bastion host would bottleneck all inter-server traffic. With a mesh topology, we distribute 17 keys to all servers — any server can reach any other directly. Security is maintained through disabled password auth and a MaxAuthTries limit of 3.

External Server Isolation

The backup server sits on an external network and is completely blocked from accessing internal servers. Internal servers can reach the backup server (for pushing backups), but the reverse direction is denied at two layers.

Dual-Layer Blocking

Layer	Method	Effect
Network	Remove internal subnet routing from WireGuard VPN config	External server cannot ping or reach internal IPs
SSH	DenyUsers directive on all 16 internal servers	Even if VPN is bypassed, SSH authentication is rejected

Network-level blocking alone is insufficient — a VPN configuration change could bypass it. Adding SSH-level denial means that even if the network layer is compromised, authentication still fails. If the external backup server is ever breached, the attacker has zero access to internal infrastructure.

CPU Turbo Boost Control

The biggest enemy of 24/7 server operation is heat. Disabling CPU turbo boost caps the maximum clock speed, but stabilizes temperature and power consumption — critical for long-term reliability.

Which Servers Get Boost Disabled?

Server Group	CPU	Boost Disabled	Reason
AI Brain	Ryzen 9950X3D	No	Maximum inference performance required
AI Auxiliary	Ryzen 7500F	No	Inference performance needed
Proxy	Intel N100	Yes	Reverse proxy needs minimal CPU
Web Server	Ryzen 7840HS	Yes	Web serving does not need boost
Project Servers (5)	Ryzen 5825U x5	Yes	Long-term stability over peak speed
Backup Server	Ryzen 5825U	Yes	NFS serving does not need boost
Lightweight (4)	Intel H255 x4	No	Low-power CPUs — boost impact is minimal

Implementation: AMD vs Intel

On AMD systems using amd-pstate-epp, writing 0 to the/sys/.../boost file disables turbo boost. On Intel systems usingintel_pstate, writing 1 to /sys/.../no_turboachieves the same effect — note the inverted logic. Both are implemented as systemd services that apply at boot and can be toggled on demand:

# AMD: echo 0 to disable boost
[Service]
ExecStart=/bin/bash -c 'echo 0 > /sys/.../boost'
ExecStop=/bin/bash -c 'echo 1 > /sys/.../boost'
RemainAfterExit=yes

# Intel: echo 1 to disable (inverted)
[Service]
ExecStart=/bin/bash -c 'echo 1 > /sys/.../no_turbo'
ExecStop=/bin/bash -c 'echo 0 > /sys/.../no_turbo'
RemainAfterExit=yes

One critical caveat: power-profiles-daemon may re-enable boost on startup. Set After=power-profiles-daemon.service in your systemd unit to ensure correct ordering. AI servers keep boost enabled — instead, we manage GPU temperatures through power limit tuning.

Operational Principles — Summary

After months of building and operating this 16-server infrastructure, we follow five principles that keep everything running reliably:

Separate servers by role. AI, proxy, project, and backup servers are physically isolated. A crash in one role group never affects another.
Key-only SSH, passwords off. 17 SSH keys cross-registered across all servers with password authentication disabled. Brute force attacks are eliminated.
Dual-layer external isolation. Network (VPN routing) plus SSH (DenyUsers) blocks external servers from reaching internal infrastructure — even if one layer fails.
Disable CPU boost for stability. All non-AI servers run without turbo boost. For 24/7 operation, thermal and power stability equals reliability.
Measure bandwidth before placement. Use iperf3 to measure actual throughput, then place critical services on the fastest links. Data-driven decisions, not guesses.

Infrastructure Summary

Category	Status
Server count	16 internal + 1 external backup
Network	10.0.10.0/24 — 5 servers at 2.5Gbps, 10 at 1Gbps (all above 920Mbps)
SSH security	17-key mesh, passwords OFF, MaxAuthTries 3
External isolation	WireGuard + DenyUsers dual blocking
CPU boost control	Disabled on 8 servers (project/proxy/backup)
AI GPUs	RTX PRO 6000 + RTX 5060 Ti (cross-server inference)
Backup	IronWolf 12TB x2, NFS network storage

Running AI infrastructure on-premise without cloud services is entirely feasible. The key is not the number of servers — it is the discipline of role separation, security, and stability. As servers multiply, adding Grafana + Prometheus monitoring becomes essential for real-time visibility. These operational principles are what make advanced capabilities like cross-server inference, concurrent load testing, and GPU power optimization actually work in production.