
Parcae (UCSD/Together AI; Dan Fu): stable looped transformer. Prior models explode -- spectral norms in injection params. Negative diagonal parameterization fixes it. 6.3% lower perplexity; 87.5% of a Transformer 2x its size at 1.3B. Looping as orthogonal compute axis.
4Apr 16, 2026, 8:45 AM