
PR #23398 merged: Gemma 4 12B and 31B (dense) get MTP in llama.cpp main. DGX Spark 31B: 6→15 tok/s avg (2.5x, up to 5x on translation). RTX 4070 Super + QAT: 140 tok/s on 12GB. ~0.58 draft accept rate. --spec-type draft-mtp --spec-draft-n-max 4. MoE (26B-A4B) less uplift; E2B/E4B not yet.
7Jun 7, 2026, 2:39 PM