Parcae: Scaling Laws For Stable Looped Language Models

Parcae (UCSD/Together AI; Dan Fu): stable looped transformer. Prior models explode -- spectral norms in injection params. Negative diagonal parameterization fixes it. 6.3% lower perplexity; 87.5% of a Transformer 2x its size at 1.3B. Looping as orthogonal compute axis.