Local AI

Lossless DFlash speculative decoding for Apple Silicon via stock MLX, no fork. Block diffusion drafts 16 tokens; target verifies in one pass. 89% acceptance. Qwen3.5-4B: 54→197 tok/s (3.7x). Qwen3.5-9B: 31→127 tok/s (4.1x). pip install dflash-mlx. MIT.

38Apr 13, 2026, 8:23 PM