NVMe SSD Benchmark: 20 Devices Compared — Optane 905P QD1 IOPS 3.8x Faster Than NAND

Test Environment

Every storage device across our production servers was tested under identical conditions. Room temperature was 13°C with zero server load during benchmarks.

fio Test Parameters

• direct=1 — bypass OS cache (raw device performance)
• ioengine=libaio — Linux async I/O
• Sequential: bs=1M, iodepth=32
• Random 4K: bs=4k, QD1 and QD32 tested separately
• Sustained Write: 120-second continuous write (drains SLC cache)

Devices Under Test

• NVMe SSDs: 14 units (9 distinct models)
• HDDs: 6 units (2 models)
• Special: Intel Optane 905P (3D XPoint)
• Capacity range: 119 GB to 9.1 TB
• Price range: budget OEM to enterprise-grade

Sequential Performance Rankings

AI model loading reads multi-GB files sequentially — Sequential Read speed directly determines startup time. Sustained Write reflects real write throughput after SLC cache depletion.

#	NVMe Model	Capacity	Seq Read	Seq Write	Sustained Write
1	Samsung PM9A1 1TB	953GB	6,193	5,009	5,015
2	Samsung PM9A1 512GB ①	476GB	3,453	3,088	3,088
3	Samsung PM9A1 512GB ②	476GB	3,453	3,319	3,319
4	Samsung 970 EVO Plus 500GB ①	465GB	3,451	3,071	2,575
5	Samsung 970 EVO Plus 500GB ②	465GB	3,448	3,071	2,467
6	Samsung 980 PRO 2TB	1,863GB	3,436	1,902	1,897
7	Lexar NM6A1 512GB	476GB	3,214	2,851	2,674
8	SK hynix PC601 512GB ①	476GB	3,171	836	750
9	SK hynix PC601 512GB ②	476GB	3,139	872	738
10	SK hynix PC601 512GB ③	476GB	3,073	833	680
11	Samsung 980 1TB	931GB	2,684	2,360	2,408
12	Intel Optane 905P 960GB	894GB	2,556	2,282	2,284
13	Samsung MZVLQ256 256GB	238GB	2,324	1,167	1,184
14	ShiJi 256GB M.2	238GB	2,246	2,039	1,142
15	Biwin NVMe 1TB	953GB	1,850	702	534
16	Samsung MZNLN128 128GB	119GB	526	159	158

All values in MB/s. HDDs are compared in a separate section below.

Key Takeaways

• PM9A1 1TB dominates at 6,193 MB/s — full PCIe 4.0 bandwidth
• 980 PRO 2TB: Read 3,436 vs Write 1,902 — write performance halved, exposing SLC cache limits on large-capacity drives
• SK hynix PC601: Read is decent at 3,000+ but Write collapses to 836 MB/s — the hidden OEM write performance trap

AI Model Loading Time Estimates

• 14B AWQ (~8 GB): PM9A1 1.3s, Biwin 4.3s
• 32B AWQ (~18 GB): PM9A1 2.9s, Biwin 9.7s
• 128 GB model edge case: PM9A1 20s vs 128 GB SSD 243s
• In practice, models stay in VRAM — loading is a one-time startup cost

Random 4K IOPS Rankings

Vector DB lookups, metadata queries, log writes — most AI service I/O is random 4K. QD1 (queue depth 1) measures real single-query response time. QD32 reflects throughput under concurrent load.

#	NVMe Model	QD1 IOPS	QD1 p50	QD32 IOPS	Mixed R/W
1	Intel Optane 905P 960GB3D XPoint	83,989	11μs	581,158	527,987
2	Samsung PM9A1 1TB	22,120	42μs	859,871	562,594
3	Samsung PM9A1 512GB ②	21,820	42μs	787,044	502,102
4	Samsung PM9A1 512GB ①	16,963	51μs	392,992	342,918
5	Samsung 970 EVO Plus ①	14,304	60μs	358,041	279,570
6	SK hynix PC601 ②	14,219	60μs	313,607	197,082
7	Samsung 980 1TB	14,201	67μs	499,929	439,421
8	SK hynix PC601 ①	14,137	61μs	330,393	187,363
9	Lexar NM6A1 512GB	14,038	60μs	338,898	160,560
10	Samsung 970 EVO Plus ②	14,026	59μs	357,298	279,362
11	ShiJi 256GB M.2	13,950	64μs	366,109	177,883
12	Samsung 980 PRO 2TB	11,896	79μs	656,561	365,201
13	Biwin NVMe 1TB	11,788	79μs	213,785	117,037
14	SK hynix PC601 ③	10,935	75μs	342,707	175,672
15	Samsung MZVLQ256 256GB	10,930	86μs	226,398	194,831
16	Samsung MZNLN128 128GB	8,701	98μs	68,565	44,912

Mixed R/W: 70% Read / 30% Write combined workload

Why QD1 and QD32 Rankings Diverge

Optane 905P is the runaway QD1 leader at 83,989 IOPS, yet falls behind PM9A1 1TB (859,871) at QD32. The reason lies in fundamentally different storage technologies.

Optane (3D XPoint)

• Cell-level fast response → 11μs QD1 latency
• Internal parallelism more limited than NAND
• Unbeatable for single requests; bulk parallel favors NAND

NAND Flash (PM9A1 etc.)

• Individual cells are slower but thousands operate in parallel
• QD32 parallelism unlocks massive IOPS scaling
• Single-request latency ranges from 42–100μs

Optane 905P Deep Dive

The Optane 905P ranks 12th in sequential throughput but 1st in Random 4K QD1 by a wide margin. This “ranking inversion” has significant implications for AI workloads.

Metric	Optane 905P	Best NAND (PM9A1)	Ratio
QD1 IOPS	83,989	22,120	3.8x
QD1 p50 Latency	11μs	42μs	3.8x faster
QD1 p99 Latency	22μs	49μs	2.2x faster
QD32 IOPS	581,158	859,871	0.68x
Mixed R/W IOPS	527,987	562,594	0.94x
Seq Read (MB/s)	2,556	6,193	0.41x

Why QD1 Matters for AI Services

QD1-Dominated Workloads

• RAG vector search: one user query → QD1 pattern
• SQLite / metadata lookups: single transactions
• Chat log reads and writes: sequential per-record
• LoRA adapter loading: one file at a time

Perceived Latency Impact

• 10K vector search: Optane 0.12s vs NAND 0.45s
• 10 concurrent users: Optane 1.2s vs NAND 4.5s
• RAG overhead must stay under 1s for natural feel
• Optane is the only option that breaks the NAND QD1 ceiling

When to Choose Optane vs. NAND

Optane Wins

• Dedicated vector DB drive (Qdrant, Milvus, etc.)
• SQLite / PostgreSQL metadata databases
• Real-time log ingestion and analysis
• Any workload where latency directly impacts service quality

NAND Wins

• AI model loading (sequential read — PM9A1 is 2.4x faster)
• Training dataset reads/writes (large sequential I/O)
• High-concurrency serving (QD32 NAND advantage)
• Cost-per-TB priority scenarios

Temperature Comparison

Thermal management is critical for 24/7 server operations — heat directly impacts drive longevity. Temperatures were measured immediately after 120 seconds of sustained writes. Ambient: 13°C.

NVMe Model	Idle (°C)	Under Load (°C)	Rise	Verdict
Samsung 980 PRO 2TB	32°C	37°C	+5°C	Safe
ShiJi 256GB M.2	43°C	47°C	+4°C	Safe
Intel Optane 905P	30°C	39°C	+9°C	Safe
Biwin NVMe 1TB	21°C	32°C	+11°C	Safe
Samsung MZNLN128	35°C	46°C	+11°C	Safe
Samsung PM9A1 512GB ②	23°C	39°C	+16°C	Safe
Samsung 980 1TB	27°C	43°C	+16°C	Safe
SK hynix PC601 ①	25°C	42°C	+17°C	Caution
SK hynix PC601 ②	25°C	42°C	+17°C	Caution
Samsung PM9A1 1TB	24°C	49°C	+25°C	Caution
Lexar NM6A1 512GB	36°C	64°C	+28°C	Caution
Samsung 970 EVO Plus ②	54°C	83°C	+29°C	Overheat Risk
Samsung 970 EVO Plus ①	52°C	83°C	+31°C	Overheat Risk
SK hynix PC601 ③	48°C	80°C	+32°C	Overheat Risk
Samsung PM9A1 512GB ①	29°C	65°C	+36°C	Overheat Risk
Samsung MZVLQ256 256GB	25°C	62°C	+37°C	Overheat Risk

Overheat-Risk Devices

• 970 EVO Plus: idles at 52–54°C, hits 83°C under load — bare M.2 slot with no heatsink
• PC601 ③: already 48°C at idle due to poor airflow in a dense server chassis
• NVMe thermal throttling typically starts at 70–80°C, causing performance degradation

Best-Cooled Devices

• 980 PRO 2TB: only +5°C rise — motherboard M.2 heatsink doing its job
• Optane 905P: +9°C rise — U.2 form factor with built-in thermal design
• Heatsink presence alone accounts for 10–20°C differences across identical workloads

HDD Replacement: IronWolf 12TB → Red Pro 10TB

We replaced Seagate IronWolf 12TB drives with WD Red Pro 10TB on the backup server. Same server, same fio parameters — direct before-and-after comparison.

Metric	IronWolf 12TB	Red Pro 10TB	Delta
Seq Read (MB/s)	257	256	Equal
Seq Write (MB/s)	241	186	-23%
Sustained Write (MB/s)	236	257	+9%
Random 4K QD1 IOPS	169	162	-4%
Random 4K QD1 p99 (μs)	16,318	25,210	+55%
Random 4K QD32 IOPS	619	622	Equal
Mixed R/W IOPS	618	690	+12%
Idle Temp (°C)	24	34	+10°C
Load Temp (°C)	26	35	+9°C

Replacement Verdict

Red Pro Advantages

• Sustained Write +9%: better for long-running backup jobs
• Mixed R/W IOPS +12%: stronger during concurrent read/write
• 10TB × 3 = 30TB total (vs IronWolf 12TB × 2 = 24TB — +25% capacity)

IronWolf Advantages

• Seq Write -23%: faster for multi-job sequential writes
• QD1 p99 latency: 16ms vs 25ms (worst-case gap)
• Temperature: idle 24°C vs 34°C (10°C cooler)

For a backup server, Sustained Write throughput and total capacity matter more — the Red Pro swap was the right call. The temperature difference (+10°C) keeps Red Pro at 35°C, well within safe operating range.

AI Service Suitability — Final Recommendations

Vector DB Dedicated

QD1 IOPS is everything. RAG search is single-query random read at its core.

Pick: Optane 905P

83,989 QD1 IOPS at 11μs. Nothing else comes close.

Model Loading + OS

Sequential Read is king. Large model files need to stream fast at startup.

Pick: PM9A1 / 980 PRO

3,400–6,100 MB/s. A 32B model loads in 3–6 seconds.

Model Archive (Cold)

Cost-per-TB matters most. Low access frequency but re-downloading takes hours.

Pick: Budget NVMe / HDD

Performance is secondary. Internal NVMe preferred over USB external.

Pitfalls to Avoid

• OEM SSD write performance trap: SK hynix PC601 shows 3,100 MB/s Read but only 836 MB/s Write — OEM models without public datasheets must be benchmarked before deployment
• SLC cache depletion on large SSDs: 980 PRO 2TB Sustained Write matches Seq Write at 1,897 MB/s (stable post-cache). But ShiJi 256GB drops from 2,039 to 1,142 — a 44% cliff
• High-performance SSD without heatsink: 970 EVO Plus reaches 83°C under load. Heatsink presence creates 30°C differences on identical hardware

The key insight from benchmarking storage is “what will you use it for?” Datasheet peak specs mean nothing — what matters is matching the right device to your actual I/O pattern (Sequential vs Random, QD1 vs QD32). See our 3-Tier storage strategy guide for how we applied this benchmark data to real disk placement decisions.

Treeru

Sharing practical insights on web development, IT infrastructure, and AI solutions. Treeru — your partner in digital transformation.

Comments

(0)

Storage