Local AI

Merged June 8: ggml-webgpu now extracts 4 quant values per u32 instead of 1 for k-quants. M2 pro pp512: Q2_K 817→1991 t/s (2.44x), Q3_K 92→302 (3.27x), Q4_K 243→327 (1.34x), Q6_K 216→311 (1.44x). Qwen3.5 4B, Gemma 4 E4B tested.

7Jun 9, 2026, 3:38 AM