treeru.com

GPU 24/7 Long-Term Monitoring — 13 Days of Real Data

What happens when you run an AI server 24/7 for two weeks? We monitored an RTX PRO 6000 (96 GB VRAM) every 5 minutes for 13 days straight — 3,667 data points covering temperature, power consumption, VRAM usage, and GPU utilization. We also compared the effects of 600W vs 350W power limits on thermal stability.

Monitoring Setup

ComponentDetails
GPUNVIDIA RTX PRO 6000 (96 GB GDDR7)
CPUAMD Ryzen 9950X3D
RAM96 GB DDR5
EnvironmentOffice (no dedicated HVAC, natural ventilation only)
Collection IntervalEvery 5 minutes (cron + nvidia-smi)
DurationFeb 10 – Feb 22, 2026 (13 days)
Total Data Points3,667 rows

Temperature Analysis

98.7% of all measurements were below 20°C. The AI serving engine keeps the model loaded in VRAM but draws minimal power when idle. Temperature only spikes during active inference requests.

Temperature Distribution (3,667 measurements)

RangeCountPercentage
0–20°C3,61898.7%
21–30°C300.8%
31–40°C70.2%
41–50°C10.03%
51–60°C100.3%
71°C+10.03%

600W vs 350W Power Limit — Temperature Impact

Metric600W Limit (Feb 10–15)350W Limit (Feb 16–22)
Average Temperature15.0°C21.0°C
Peak Temperature73°C60°C
Peak Power Measured518.9W350.0W

Reducing the power limit from 600W to 350W dropped peak temperature by 13°C (73°C → 60°C) while inference performance loss remained under 5%. For long-term operation, the 350W limit is the optimal balance between performance and thermal safety.

Power Consumption Patterns

GPU power consumption follows a stark binary pattern: 13–15W idle, 350–519W under inference. There is virtually no middle ground. This reflects the AI serving engine's behavior — completely idle when no requests arrive, full power the moment inference begins.

StatePowerTime RatioDescription
Idle (GPU 0%)8–18W99.0%Model loaded, no requests
Light Load75–124W<0.1%Model loading, single request
Medium Load225–300W<0.1%Small concurrent requests
Heavy Load (GPU 100%)350–519W~1.0%Multiple concurrent requests, benchmarks

The average power draw of ~15W with models loaded means the electricity cost of keeping an AI serving engine running 24/7 is comparable to a single fluorescent light bulb. The model stays in VRAM ready to serve instantly, at negligible idle cost.

VRAM Usage Patterns

VRAM usage has two distinct states: serving engine OFF (~200 MiB) and ON (~85,775 MiB). Once the serving engine loads the model, 87.6% of VRAM is permanently occupiedregardless of whether inference requests are active.

StateVRAM UsedUtilizationMeasurements
Engine OFF~200 MiB0.2%1,531
Engine ON (idle)~85,775 MiB87.6%2,135
Engine ON (peak)~95,385 MiB97.4%1

The 96 GB VRAM accommodates a 32B model plus 7 LoRA adapters simultaneously. The remaining ~10 GB serves as KV cache for concurrent requests. At peak load, KV cache expands to push utilization to 97.4% — leaving just 2.6 GB of headroom.

Daily Load Patterns

GPU was actively computing in only 37 out of 3,667 measurements (1.0%). AI serving is fundamentally a "wait-and-respond" workload — the GPU sits idle 99% of the time, ready to respond instantly when requests arrive.

DatePeak TempPeak PowerPower LimitGPU Active
Feb 10 (Mon)18°C75.3W600W0/251
Feb 11 (Tue)32°C413.6W600W2/288
Feb 12 (Wed)28°C123.9W600W0/288
Feb 13 (Thu)16°C16.3W600W0/288
Feb 14 (Fri)16°C15.8W600W0/288
Feb 15 (Sat)34°C225.6W600→350W1/288
Feb 16 (Sun)30°C350.0W350W3/288
Feb 17 (Mon)15°C16.2W350W0/288
Feb 18 (Tue)24°C78.0W350W6/264
Feb 19 (Wed)38°C423.8W350→600W10/278
Feb 20 (Thu)17°C17.6W350W0/288
Feb 21 (Fri)22°C18.5W350W2/284
Feb 22 (Sat)73°C518.9W600→350W13/286

Conclusion: Long-Term Operation Stability

Office environment is sufficient: Without dedicated HVAC, the GPU stayed below 20°C for 98.7% of the monitoring period. Natural office ventilation handles the thermal load with wide margin.

350W power limit is optimal: Peak temperature drops 13°C compared to 600W (73°C → 60°C) with less than 5% inference performance loss. This is the ideal setting for 24/7 operation.

Idle power is negligible: At ~15W average with the model loaded, monthly electricity cost is roughly $1. Keeping the AI serving engine running 24/7 is practically free.

87.6% VRAM always utilized: A 32B model plus LoRA adapters permanently occupies 85.7 GB. The remaining 10 GB serves as dynamic KV cache for concurrent requests.

GPU Long-Term Monitoring Checklist

MetricNormal RangeWarning Threshold
Idle Temperature10–20°CAbove 30°C sustained — check ventilation
Load Temperature40–65°CAbove 80°C — reduce power limit
Idle Power8–18WAbove 50W — check for runaway processes
VRAM UsageModel size + 10–15%Above 95% sustained — limit concurrent requests
Fan Speed~30% (default)Above 70% sustained — check temperature

Running an AI server 24/7 is simpler than most assume. A proper power limit, stable infrastructure, and 5-minute interval monitoring are all you need for reliable operation — even in a standard office environment without specialized cooling.