AI Pulse
Sim2Reason: Solving Physics Olympiad via RL on Physics Simulators

Sim2Reason trains LLMs inside MuJoCo physics simulators, zero human annotation. Generate scenes, auto-label QA pairs, RL-train on synthetic data. Zero-shot: +5-10% IPhO, +17.9% JEEBench, +4.4% MATH 500. Outperforms models trained on curated real-world QA pairs. CMU + Lambda.

4Apr 16, 2026, 10:58 PM