NVMe Storage Benchmark: Optane to HDD — 20 Devices Compared with fio
Are all NVMe SSDs the same? We benchmarked every storage device installed across our AI servers — 14 NVMe SSDs and 6 HDDs — using identical fio parameters. From Samsung PM9A1 to Intel Optane 905P to no-name Chinese SSDs, Sequential Read ranged from 6,193 MB/s down to 526 MB/s (a 12x gap), and Optane delivered 3.8x the Random 4K QD1 IOPS of the best NAND drive.
20
Devices Tested
3.8x
Optane QD1 IOPS Gap
11μs
Optane QD1 Latency
6,193
Top Seq Read (MB/s)
Test Environment
Every storage device across our production servers was tested under identical conditions. Room temperature was 13°C with zero server load during benchmarks.
fio Test Parameters
- •
direct=1— bypass OS cache (raw device performance) - •
ioengine=libaio— Linux async I/O - • Sequential: bs=1M, iodepth=32
- • Random 4K: bs=4k, QD1 and QD32 tested separately
- • Sustained Write: 120-second continuous write (drains SLC cache)
Devices Under Test
- • NVMe SSDs: 14 units (9 distinct models)
- • HDDs: 6 units (2 models)
- • Special: Intel Optane 905P (3D XPoint)
- • Capacity range: 119 GB to 9.1 TB
- • Price range: budget OEM to enterprise-grade
Sequential Performance Rankings
AI model loading reads multi-GB files sequentially — Sequential Read speed directly determines startup time. Sustained Write reflects real write throughput after SLC cache depletion.
| # | NVMe Model | Capacity | Seq Read | Seq Write | Sustained Write |
|---|---|---|---|---|---|
| 1 | Samsung PM9A1 1TB | 953GB | 6,193 | 5,009 | 5,015 |
| 2 | Samsung PM9A1 512GB ① | 476GB | 3,453 | 3,088 | 3,088 |
| 3 | Samsung PM9A1 512GB ② | 476GB | 3,453 | 3,319 | 3,319 |
| 4 | Samsung 970 EVO Plus 500GB ① | 465GB | 3,451 | 3,071 | 2,575 |
| 5 | Samsung 970 EVO Plus 500GB ② | 465GB | 3,448 | 3,071 | 2,467 |
| 6 | Samsung 980 PRO 2TB | 1,863GB | 3,436 | 1,902 | 1,897 |
| 7 | Lexar NM6A1 512GB | 476GB | 3,214 | 2,851 | 2,674 |
| 8 | SK hynix PC601 512GB ① | 476GB | 3,171 | 836 | 750 |
| 9 | SK hynix PC601 512GB ② | 476GB | 3,139 | 872 | 738 |
| 10 | SK hynix PC601 512GB ③ | 476GB | 3,073 | 833 | 680 |
| 11 | Samsung 980 1TB | 931GB | 2,684 | 2,360 | 2,408 |
| 12 | Intel Optane 905P 960GB | 894GB | 2,556 | 2,282 | 2,284 |
| 13 | Samsung MZVLQ256 256GB | 238GB | 2,324 | 1,167 | 1,184 |
| 14 | ShiJi 256GB M.2 | 238GB | 2,246 | 2,039 | 1,142 |
| 15 | Biwin NVMe 1TB | 953GB | 1,850 | 702 | 534 |
| 16 | Samsung MZNLN128 128GB | 119GB | 526 | 159 | 158 |
All values in MB/s. HDDs are compared in a separate section below.
Key Takeaways
- • PM9A1 1TB dominates at 6,193 MB/s — full PCIe 4.0 bandwidth
- • 980 PRO 2TB: Read 3,436 vs Write 1,902 — write performance halved, exposing SLC cache limits on large-capacity drives
- • SK hynix PC601: Read is decent at 3,000+ but Write collapses to 836 MB/s — the hidden OEM write performance trap
AI Model Loading Time Estimates
- • 14B AWQ (~8 GB): PM9A1 1.3s, Biwin 4.3s
- • 32B AWQ (~18 GB): PM9A1 2.9s, Biwin 9.7s
- • 128 GB model edge case: PM9A1 20s vs 128 GB SSD 243s
- • In practice, models stay in VRAM — loading is a one-time startup cost
Random 4K IOPS Rankings
Vector DB lookups, metadata queries, log writes — most AI service I/O is random 4K. QD1 (queue depth 1) measures real single-query response time. QD32 reflects throughput under concurrent load.
| # | NVMe Model | QD1 IOPS | QD1 p50 | QD32 IOPS | Mixed R/W |
|---|---|---|---|---|---|
| 1 | Intel Optane 905P 960GB3D XPoint | 83,989 | 11μs | 581,158 | 527,987 |
| 2 | Samsung PM9A1 1TB | 22,120 | 42μs | 859,871 | 562,594 |
| 3 | Samsung PM9A1 512GB ② | 21,820 | 42μs | 787,044 | 502,102 |
| 4 | Samsung PM9A1 512GB ① | 16,963 | 51μs | 392,992 | 342,918 |
| 5 | Samsung 970 EVO Plus ① | 14,304 | 60μs | 358,041 | 279,570 |
| 6 | SK hynix PC601 ② | 14,219 | 60μs | 313,607 | 197,082 |
| 7 | Samsung 980 1TB | 14,201 | 67μs | 499,929 | 439,421 |
| 8 | SK hynix PC601 ① | 14,137 | 61μs | 330,393 | 187,363 |
| 9 | Lexar NM6A1 512GB | 14,038 | 60μs | 338,898 | 160,560 |
| 10 | Samsung 970 EVO Plus ② | 14,026 | 59μs | 357,298 | 279,362 |
| 11 | ShiJi 256GB M.2 | 13,950 | 64μs | 366,109 | 177,883 |
| 12 | Samsung 980 PRO 2TB | 11,896 | 79μs | 656,561 | 365,201 |
| 13 | Biwin NVMe 1TB | 11,788 | 79μs | 213,785 | 117,037 |
| 14 | SK hynix PC601 ③ | 10,935 | 75μs | 342,707 | 175,672 |
| 15 | Samsung MZVLQ256 256GB | 10,930 | 86μs | 226,398 | 194,831 |
| 16 | Samsung MZNLN128 128GB | 8,701 | 98μs | 68,565 | 44,912 |
Mixed R/W: 70% Read / 30% Write combined workload
Why QD1 and QD32 Rankings Diverge
Optane 905P is the runaway QD1 leader at 83,989 IOPS, yet falls behind PM9A1 1TB (859,871) at QD32. The reason lies in fundamentally different storage technologies.
Optane (3D XPoint)
- • Cell-level fast response → 11μs QD1 latency
- • Internal parallelism more limited than NAND
- • Unbeatable for single requests; bulk parallel favors NAND
NAND Flash (PM9A1 etc.)
- • Individual cells are slower but thousands operate in parallel
- • QD32 parallelism unlocks massive IOPS scaling
- • Single-request latency ranges from 42–100μs
Optane 905P Deep Dive
The Optane 905P ranks 12th in sequential throughput but 1st in Random 4K QD1 by a wide margin. This “ranking inversion” has significant implications for AI workloads.
| Metric | Optane 905P | Best NAND (PM9A1) | Ratio |
|---|---|---|---|
| QD1 IOPS | 83,989 | 22,120 | 3.8x |
| QD1 p50 Latency | 11μs | 42μs | 3.8x faster |
| QD1 p99 Latency | 22μs | 49μs | 2.2x faster |
| QD32 IOPS | 581,158 | 859,871 | 0.68x |
| Mixed R/W IOPS | 527,987 | 562,594 | 0.94x |
| Seq Read (MB/s) | 2,556 | 6,193 | 0.41x |
Why QD1 Matters for AI Services
QD1-Dominated Workloads
- • RAG vector search: one user query → QD1 pattern
- • SQLite / metadata lookups: single transactions
- • Chat log reads and writes: sequential per-record
- • LoRA adapter loading: one file at a time
Perceived Latency Impact
- • 10K vector search: Optane 0.12s vs NAND 0.45s
- • 10 concurrent users: Optane 1.2s vs NAND 4.5s
- • RAG overhead must stay under 1s for natural feel
- • Optane is the only option that breaks the NAND QD1 ceiling
When to Choose Optane vs. NAND
Optane Wins
- • Dedicated vector DB drive (Qdrant, Milvus, etc.)
- • SQLite / PostgreSQL metadata databases
- • Real-time log ingestion and analysis
- • Any workload where latency directly impacts service quality
NAND Wins
- • AI model loading (sequential read — PM9A1 is 2.4x faster)
- • Training dataset reads/writes (large sequential I/O)
- • High-concurrency serving (QD32 NAND advantage)
- • Cost-per-TB priority scenarios
Temperature Comparison
Thermal management is critical for 24/7 server operations — heat directly impacts drive longevity. Temperatures were measured immediately after 120 seconds of sustained writes. Ambient: 13°C.
| NVMe Model | Idle (°C) | Under Load (°C) | Rise | Verdict |
|---|---|---|---|---|
| Samsung 980 PRO 2TB | 32°C | 37°C | +5°C | Safe |
| ShiJi 256GB M.2 | 43°C | 47°C | +4°C | Safe |
| Intel Optane 905P | 30°C | 39°C | +9°C | Safe |
| Biwin NVMe 1TB | 21°C | 32°C | +11°C | Safe |
| Samsung MZNLN128 | 35°C | 46°C | +11°C | Safe |
| Samsung PM9A1 512GB ② | 23°C | 39°C | +16°C | Safe |
| Samsung 980 1TB | 27°C | 43°C | +16°C | Safe |
| SK hynix PC601 ① | 25°C | 42°C | +17°C | Caution |
| SK hynix PC601 ② | 25°C | 42°C | +17°C | Caution |
| Samsung PM9A1 1TB | 24°C | 49°C | +25°C | Caution |
| Lexar NM6A1 512GB | 36°C | 64°C | +28°C | Caution |
| Samsung 970 EVO Plus ② | 54°C | 83°C | +29°C | Overheat Risk |
| Samsung 970 EVO Plus ① | 52°C | 83°C | +31°C | Overheat Risk |
| SK hynix PC601 ③ | 48°C | 80°C | +32°C | Overheat Risk |
| Samsung PM9A1 512GB ① | 29°C | 65°C | +36°C | Overheat Risk |
| Samsung MZVLQ256 256GB | 25°C | 62°C | +37°C | Overheat Risk |
Overheat-Risk Devices
- • 970 EVO Plus: idles at 52–54°C, hits 83°C under load — bare M.2 slot with no heatsink
- • PC601 ③: already 48°C at idle due to poor airflow in a dense server chassis
- • NVMe thermal throttling typically starts at 70–80°C, causing performance degradation
Best-Cooled Devices
- • 980 PRO 2TB: only +5°C rise — motherboard M.2 heatsink doing its job
- • Optane 905P: +9°C rise — U.2 form factor with built-in thermal design
- • Heatsink presence alone accounts for 10–20°C differences across identical workloads
HDD Replacement: IronWolf 12TB → Red Pro 10TB
We replaced Seagate IronWolf 12TB drives with WD Red Pro 10TB on the backup server. Same server, same fio parameters — direct before-and-after comparison.
| Metric | IronWolf 12TB | Red Pro 10TB | Delta |
|---|---|---|---|
| Seq Read (MB/s) | 257 | 256 | Equal |
| Seq Write (MB/s) | 241 | 186 | -23% |
| Sustained Write (MB/s) | 236 | 257 | +9% |
| Random 4K QD1 IOPS | 169 | 162 | -4% |
| Random 4K QD1 p99 (μs) | 16,318 | 25,210 | +55% |
| Random 4K QD32 IOPS | 619 | 622 | Equal |
| Mixed R/W IOPS | 618 | 690 | +12% |
| Idle Temp (°C) | 24 | 34 | +10°C |
| Load Temp (°C) | 26 | 35 | +9°C |
Replacement Verdict
Red Pro Advantages
- • Sustained Write +9%: better for long-running backup jobs
- • Mixed R/W IOPS +12%: stronger during concurrent read/write
- • 10TB × 3 = 30TB total (vs IronWolf 12TB × 2 = 24TB — +25% capacity)
IronWolf Advantages
- • Seq Write -23%: faster for multi-job sequential writes
- • QD1 p99 latency: 16ms vs 25ms (worst-case gap)
- • Temperature: idle 24°C vs 34°C (10°C cooler)
For a backup server, Sustained Write throughput and total capacity matter more — the Red Pro swap was the right call. The temperature difference (+10°C) keeps Red Pro at 35°C, well within safe operating range.
AI Service Suitability — Final Recommendations
Vector DB Dedicated
QD1 IOPS is everything. RAG search is single-query random read at its core.
Pick: Optane 905P
83,989 QD1 IOPS at 11μs. Nothing else comes close.
Model Loading + OS
Sequential Read is king. Large model files need to stream fast at startup.
Pick: PM9A1 / 980 PRO
3,400–6,100 MB/s. A 32B model loads in 3–6 seconds.
Model Archive (Cold)
Cost-per-TB matters most. Low access frequency but re-downloading takes hours.
Pick: Budget NVMe / HDD
Performance is secondary. Internal NVMe preferred over USB external.
Pitfalls to Avoid
- • OEM SSD write performance trap: SK hynix PC601 shows 3,100 MB/s Read but only 836 MB/s Write — OEM models without public datasheets must be benchmarked before deployment
- • SLC cache depletion on large SSDs: 980 PRO 2TB Sustained Write matches Seq Write at 1,897 MB/s (stable post-cache). But ShiJi 256GB drops from 2,039 to 1,142 — a 44% cliff
- • High-performance SSD without heatsink: 970 EVO Plus reaches 83°C under load. Heatsink presence creates 30°C differences on identical hardware
The key insight from benchmarking storage is “what will you use it for?” Datasheet peak specs mean nothing — what matters is matching the right device to your actual I/O pattern (Sequential vs Random, QD1 vs QD32). See our 3-Tier storage strategy guide for how we applied this benchmark data to real disk placement decisions.