treeru.com
Hardware

NVMe Storage Benchmark: Optane to HDD — 20 Devices Compared with fio

Are all NVMe SSDs the same? We benchmarked every storage device installed across our AI servers — 14 NVMe SSDs and 6 HDDs — using identical fio parameters. From Samsung PM9A1 to Intel Optane 905P to no-name Chinese SSDs, Sequential Read ranged from 6,193 MB/s down to 526 MB/s (a 12x gap), and Optane delivered 3.8x the Random 4K QD1 IOPS of the best NAND drive.

20

Devices Tested

3.8x

Optane QD1 IOPS Gap

11μs

Optane QD1 Latency

6,193

Top Seq Read (MB/s)

Test Environment

Every storage device across our production servers was tested under identical conditions. Room temperature was 13°C with zero server load during benchmarks.

fio Test Parameters

  • direct=1 — bypass OS cache (raw device performance)
  • ioengine=libaio — Linux async I/O
  • • Sequential: bs=1M, iodepth=32
  • • Random 4K: bs=4k, QD1 and QD32 tested separately
  • • Sustained Write: 120-second continuous write (drains SLC cache)

Devices Under Test

  • • NVMe SSDs: 14 units (9 distinct models)
  • • HDDs: 6 units (2 models)
  • • Special: Intel Optane 905P (3D XPoint)
  • • Capacity range: 119 GB to 9.1 TB
  • • Price range: budget OEM to enterprise-grade

Sequential Performance Rankings

AI model loading reads multi-GB files sequentially — Sequential Read speed directly determines startup time. Sustained Write reflects real write throughput after SLC cache depletion.

#NVMe ModelCapacitySeq ReadSeq WriteSustained Write
1Samsung PM9A1 1TB953GB6,1935,0095,015
2Samsung PM9A1 512GB ①476GB3,4533,0883,088
3Samsung PM9A1 512GB ②476GB3,4533,3193,319
4Samsung 970 EVO Plus 500GB ①465GB3,4513,0712,575
5Samsung 970 EVO Plus 500GB ②465GB3,4483,0712,467
6Samsung 980 PRO 2TB1,863GB3,4361,9021,897
7Lexar NM6A1 512GB476GB3,2142,8512,674
8SK hynix PC601 512GB ①476GB3,171836750
9SK hynix PC601 512GB ②476GB3,139872738
10SK hynix PC601 512GB ③476GB3,073833680
11Samsung 980 1TB931GB2,6842,3602,408
12Intel Optane 905P 960GB894GB2,5562,2822,284
13Samsung MZVLQ256 256GB238GB2,3241,1671,184
14ShiJi 256GB M.2238GB2,2462,0391,142
15Biwin NVMe 1TB953GB1,850702534
16Samsung MZNLN128 128GB119GB526159158

All values in MB/s. HDDs are compared in a separate section below.

Key Takeaways

  • PM9A1 1TB dominates at 6,193 MB/s — full PCIe 4.0 bandwidth
  • 980 PRO 2TB: Read 3,436 vs Write 1,902 — write performance halved, exposing SLC cache limits on large-capacity drives
  • SK hynix PC601: Read is decent at 3,000+ but Write collapses to 836 MB/s — the hidden OEM write performance trap

AI Model Loading Time Estimates

  • • 14B AWQ (~8 GB): PM9A1 1.3s, Biwin 4.3s
  • • 32B AWQ (~18 GB): PM9A1 2.9s, Biwin 9.7s
  • • 128 GB model edge case: PM9A1 20s vs 128 GB SSD 243s
  • • In practice, models stay in VRAM — loading is a one-time startup cost

Random 4K IOPS Rankings

Vector DB lookups, metadata queries, log writes — most AI service I/O is random 4K. QD1 (queue depth 1) measures real single-query response time. QD32 reflects throughput under concurrent load.

#NVMe ModelQD1 IOPSQD1 p50QD32 IOPSMixed R/W
1Intel Optane 905P 960GB3D XPoint83,98911μs581,158527,987
2Samsung PM9A1 1TB22,12042μs859,871562,594
3Samsung PM9A1 512GB ②21,82042μs787,044502,102
4Samsung PM9A1 512GB ①16,96351μs392,992342,918
5Samsung 970 EVO Plus ①14,30460μs358,041279,570
6SK hynix PC601 ②14,21960μs313,607197,082
7Samsung 980 1TB14,20167μs499,929439,421
8SK hynix PC601 ①14,13761μs330,393187,363
9Lexar NM6A1 512GB14,03860μs338,898160,560
10Samsung 970 EVO Plus ②14,02659μs357,298279,362
11ShiJi 256GB M.213,95064μs366,109177,883
12Samsung 980 PRO 2TB11,89679μs656,561365,201
13Biwin NVMe 1TB11,78879μs213,785117,037
14SK hynix PC601 ③10,93575μs342,707175,672
15Samsung MZVLQ256 256GB10,93086μs226,398194,831
16Samsung MZNLN128 128GB8,70198μs68,56544,912

Mixed R/W: 70% Read / 30% Write combined workload

Why QD1 and QD32 Rankings Diverge

Optane 905P is the runaway QD1 leader at 83,989 IOPS, yet falls behind PM9A1 1TB (859,871) at QD32. The reason lies in fundamentally different storage technologies.

Optane (3D XPoint)

  • • Cell-level fast response → 11μs QD1 latency
  • • Internal parallelism more limited than NAND
  • • Unbeatable for single requests; bulk parallel favors NAND

NAND Flash (PM9A1 etc.)

  • • Individual cells are slower but thousands operate in parallel
  • • QD32 parallelism unlocks massive IOPS scaling
  • • Single-request latency ranges from 42–100μs

Optane 905P Deep Dive

The Optane 905P ranks 12th in sequential throughput but 1st in Random 4K QD1 by a wide margin. This “ranking inversion” has significant implications for AI workloads.

MetricOptane 905PBest NAND (PM9A1)Ratio
QD1 IOPS83,98922,1203.8x
QD1 p50 Latency11μs42μs3.8x faster
QD1 p99 Latency22μs49μs2.2x faster
QD32 IOPS581,158859,8710.68x
Mixed R/W IOPS527,987562,5940.94x
Seq Read (MB/s)2,5566,1930.41x

Why QD1 Matters for AI Services

QD1-Dominated Workloads

  • • RAG vector search: one user query → QD1 pattern
  • • SQLite / metadata lookups: single transactions
  • • Chat log reads and writes: sequential per-record
  • • LoRA adapter loading: one file at a time

Perceived Latency Impact

  • • 10K vector search: Optane 0.12s vs NAND 0.45s
  • • 10 concurrent users: Optane 1.2s vs NAND 4.5s
  • • RAG overhead must stay under 1s for natural feel
  • Optane is the only option that breaks the NAND QD1 ceiling

When to Choose Optane vs. NAND

Optane Wins

  • • Dedicated vector DB drive (Qdrant, Milvus, etc.)
  • • SQLite / PostgreSQL metadata databases
  • • Real-time log ingestion and analysis
  • • Any workload where latency directly impacts service quality

NAND Wins

  • • AI model loading (sequential read — PM9A1 is 2.4x faster)
  • • Training dataset reads/writes (large sequential I/O)
  • • High-concurrency serving (QD32 NAND advantage)
  • • Cost-per-TB priority scenarios

Temperature Comparison

Thermal management is critical for 24/7 server operations — heat directly impacts drive longevity. Temperatures were measured immediately after 120 seconds of sustained writes. Ambient: 13°C.

NVMe ModelIdle (°C)Under Load (°C)RiseVerdict
Samsung 980 PRO 2TB32°C37°C+5°CSafe
ShiJi 256GB M.243°C47°C+4°CSafe
Intel Optane 905P30°C39°C+9°CSafe
Biwin NVMe 1TB21°C32°C+11°CSafe
Samsung MZNLN12835°C46°C+11°CSafe
Samsung PM9A1 512GB ②23°C39°C+16°CSafe
Samsung 980 1TB27°C43°C+16°CSafe
SK hynix PC601 ①25°C42°C+17°CCaution
SK hynix PC601 ②25°C42°C+17°CCaution
Samsung PM9A1 1TB24°C49°C+25°CCaution
Lexar NM6A1 512GB36°C64°C+28°CCaution
Samsung 970 EVO Plus ②54°C83°C+29°COverheat Risk
Samsung 970 EVO Plus ①52°C83°C+31°COverheat Risk
SK hynix PC601 ③48°C80°C+32°COverheat Risk
Samsung PM9A1 512GB ①29°C65°C+36°COverheat Risk
Samsung MZVLQ256 256GB25°C62°C+37°COverheat Risk

Overheat-Risk Devices

  • 970 EVO Plus: idles at 52–54°C, hits 83°C under load — bare M.2 slot with no heatsink
  • PC601 ③: already 48°C at idle due to poor airflow in a dense server chassis
  • • NVMe thermal throttling typically starts at 70–80°C, causing performance degradation

Best-Cooled Devices

  • 980 PRO 2TB: only +5°C rise — motherboard M.2 heatsink doing its job
  • Optane 905P: +9°C rise — U.2 form factor with built-in thermal design
  • • Heatsink presence alone accounts for 10–20°C differences across identical workloads

HDD Replacement: IronWolf 12TB → Red Pro 10TB

We replaced Seagate IronWolf 12TB drives with WD Red Pro 10TB on the backup server. Same server, same fio parameters — direct before-and-after comparison.

MetricIronWolf 12TBRed Pro 10TBDelta
Seq Read (MB/s)257256Equal
Seq Write (MB/s)241186-23%
Sustained Write (MB/s)236257+9%
Random 4K QD1 IOPS169162-4%
Random 4K QD1 p99 (μs)16,31825,210+55%
Random 4K QD32 IOPS619622Equal
Mixed R/W IOPS618690+12%
Idle Temp (°C)2434+10°C
Load Temp (°C)2635+9°C

Replacement Verdict

Red Pro Advantages

  • • Sustained Write +9%: better for long-running backup jobs
  • • Mixed R/W IOPS +12%: stronger during concurrent read/write
  • • 10TB × 3 = 30TB total (vs IronWolf 12TB × 2 = 24TB — +25% capacity)

IronWolf Advantages

  • • Seq Write -23%: faster for multi-job sequential writes
  • • QD1 p99 latency: 16ms vs 25ms (worst-case gap)
  • • Temperature: idle 24°C vs 34°C (10°C cooler)

For a backup server, Sustained Write throughput and total capacity matter more — the Red Pro swap was the right call. The temperature difference (+10°C) keeps Red Pro at 35°C, well within safe operating range.

AI Service Suitability — Final Recommendations

Vector DB Dedicated

QD1 IOPS is everything. RAG search is single-query random read at its core.

Pick: Optane 905P

83,989 QD1 IOPS at 11μs. Nothing else comes close.

Model Loading + OS

Sequential Read is king. Large model files need to stream fast at startup.

Pick: PM9A1 / 980 PRO

3,400–6,100 MB/s. A 32B model loads in 3–6 seconds.

Model Archive (Cold)

Cost-per-TB matters most. Low access frequency but re-downloading takes hours.

Pick: Budget NVMe / HDD

Performance is secondary. Internal NVMe preferred over USB external.

Pitfalls to Avoid

  • OEM SSD write performance trap: SK hynix PC601 shows 3,100 MB/s Read but only 836 MB/s Write — OEM models without public datasheets must be benchmarked before deployment
  • SLC cache depletion on large SSDs: 980 PRO 2TB Sustained Write matches Seq Write at 1,897 MB/s (stable post-cache). But ShiJi 256GB drops from 2,039 to 1,142 — a 44% cliff
  • High-performance SSD without heatsink: 970 EVO Plus reaches 83°C under load. Heatsink presence creates 30°C differences on identical hardware

The key insight from benchmarking storage is “what will you use it for?” Datasheet peak specs mean nothing — what matters is matching the right device to your actual I/O pattern (Sequential vs Random, QD1 vs QD32). See our 3-Tier storage strategy guide for how we applied this benchmark data to real disk placement decisions.