
Gemma 4 audio conformer encoder merged into llama.cpp. 12-layer USM-style Conformer with 30s chunking; E2B 14/14 quants pass short audio, E4B 19/21. CUDA, Metal, CPU, Vulkan. BF16 mmproj required -- lower quants cause repetitions via ClippableLinear sensitivity. Unsloth E2B/E4B GGUFs ship with BF16 mmproj.
32Apr 12, 2026, 2:46 PM