Local AI

The latest and greatest in open-source AI models of all kinds.

Qwen3.6-35B-A3B

35B/3B active MoE from Alibaba Qwen. First Qwen with hybrid linear attention (Gated DeltaNet + MoE). Agentic coding rivals Qwen3.5-27B: SWE-bench 73.4%, Terminal-Bench 51.5%. Multimodal, 262K context (1M ext.), Thinking Preservation. vLLM, SGLang, KTransformers. Apache 2.0.

3Apr 16, 2026, 2:09 PM
LTX-2.3 Distilled 1.1

Lightricks ships distilled v1.1 for LTX-2.3 (8 steps, CFG=1). Across 3k+ A/B-tested video generations, users report: no more opening blur, sharper detail, better prompt adherence, improved character consistency, better camera transitions, and significantly improved audio. LoRA variant (2.74 GB) works with the full base model.

2Apr 16, 2026, 9:09 AM

Q1_0 CUDA kernels merged in b8806, follow-up to the CPU-only backend (#21273). Bonsai 8B (1.07 GiB): 374 tok/s tg128; 4B (540 MB): 485 tok/s; 1.7B (231 MB): 626 tok/s — all on RTX 5090. Requires Turing MMA+. Works on some AMD GPUs. KLD vs FP16: 0.0005, 98.7% same top token.

6Apr 15, 2026, 11:18 PM
HY-World 2.0

First open-source SOTA 3D world model from Tencent Hunyuan. WorldMirror 2.0 (1.2B): video/images → point cloud, depth, normals, camera params, 3DGS in one pass. SOTA over Gen3C, SEVA, Lyra. Full text/image → navigable 3D world pipeline coming. Tencent HY-World Community License.

24Apr 15, 2026, 6:37 PM
Nucleus-Image

17B MoE DiT from Nucleus AI (2B active). First fully open-source MoE diffusion model at this quality. GenEval 0.87, DPG-Bench 88.79 (#1), OneIG-Bench 0.522 (beats Imagen4). Base model, no post-training. Full weights, code, dataset. diffusers. Apache 2.0.

18Apr 15, 2026, 4:12 PM
ERNIE-Image

8B DiT from Baidu. Top-1 open-weight T2I on GENEval (0.8856). Strongest at dense text rendering, structured layouts (posters, comics), and complex prompt following. Turbo: 8 steps (DMD+RL). 24GB VRAM. GGUF Q4 ~5GB (unsloth), ComfyUI (Comfy-Org), diffusers. Day-zero LoRA (ostris).

7Apr 14, 2026, 8:39 PM
DDTree: Block Diffusion Draft Trees for Speculative Decoding

DDTree builds a draft tree from DFlash's block diffusion distributions, then verifies the whole tree in one target-model forward pass with tree attention. Lossless. Qwen3-30B-MoE HumanEval T=0: 8.22x over AR (+2.13x over DFlash). Uses existing DFlash drafters, no retraining. MIT.

17Apr 14, 2026, 3:54 PM

Qwen3-Omni-30B-A3B (30B/3B active MoE) lands in llama.cpp: image + audio input, text output. Qwen3-ASR 1.7B (speech-to-text only) also supported. GGUFs from ggml-org: Q4_K_M 18.6 GB, Q8_0 32.5 GB. Apache 2.0. llama-server -hf ggml-org/Qwen3-Omni-30B-A3B-Instruct-GGUF

19Apr 14, 2026, 1:18 AM

Lossless DFlash speculative decoding for Apple Silicon via stock MLX, no fork. Block diffusion drafts 16 tokens; target verifies in one pass. 89% acceptance. Qwen3.5-4B: 54→197 tok/s (3.7x). Qwen3.5-9B: 31→127 tok/s (4.1x). pip install dflash-mlx. MIT.

35Apr 13, 2026, 8:23 PM
llama.cpp: Gemma 4 Audio Conformer Support Merged

Gemma 4 audio conformer encoder merged into llama.cpp. 12-layer USM-style Conformer with 30s chunking; E2B 14/14 quants pass short audio, E4B 19/21. CUDA, Metal, CPU, Vulkan. BF16 mmproj required -- lower quants cause repetitions via ClippableLinear sensitivity. Unsloth E2B/E4B GGUFs ship with BF16 mmproj.

30Apr 12, 2026, 2:46 PM