Sim2Reason: Solving Physics Olympiad via RL on Physics Simulators

Sim2Reason trains LLMs inside MuJoCo physics simulators, zero human annotation. Generate scenes, auto-label QA pairs, RL-train on synthetic data. Zero-shot: +5-10% IPhO, +17.9% JEEBench, +4.4% MATH 500. Outperforms models trained on curated real-world QA pairs. CMU + Lambda.