
Pins the experts your traffic actually hits on GPU, swaps the cold tail async from RAM. Qwen3.6 35B-A3B: 13.3 GiB (was ~20.5); Laguna XS.2 33B-A3B: 14.6 GiB (was 18.8) — both under 16 GiB. ~100 tok/s (92% of 24GB ceiling). Self-tunes from live traffic. Apache 2.0: dflash_server <model.gguf> --spark
19Jun 8, 2026, 5:46 PM