AI Research

Latest artificial intelligence, machine learning, and data science research.

Introducing GPT-Rosalind for life sciences research

openai.comApr 16, 2026

GPT-Rosalind (OpenAI): frontier reasoning model for biology, drug discovery, and translational medicine. 50+ scientific tools and databases via a new Life Sciences plugin for Codex. BixBench: 0.751, leading all models with published scores. Dyno Therapeutics RNA prediction: above 95th percentile of human experts.

2Apr 16, 2026, 8:39 PM

π0.7: a Steerable Model with Emergent Capabilities

pi.websiteApr 16, 2026

π0.7 (Physical Intelligence): first compositional generalization in robotic VLAs -- recombines skills for tasks never seen in training. Steerable via language, metadata, and visual subgoals. Matches RL-trained specialists out-of-the-box; zero-shot cross-embodiment transfer.

2Apr 16, 2026, 7:32 PM

Introducing Claude Opus 4.7

www.anthropic.comApr 16, 2026

Claude Opus 4.7 (Anthropic), released today: strong gains over 4.6 on hard coding and long-horizon agentic tasks. Vision: 3.75MP max input (3x prior), 98.5% visual acuity for computer use vs 54.5%. New xhigh effort level; task budgets API in beta. Cyber safeguards from Glasswing. Pricing unchanged at $5/$25 per million tokens.

3Apr 16, 2026, 6:28 PM

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

arxiv.orgApr 13, 2026, 3:38 PM

Visual generation reward models collapse human judgments to a scalar, discarding reasoning. RationalRewards (HKUST/U Waterloo; Wenhu Chen): PARROT recovers rationales from preference data. 8B model matches Gemini-2.5-Pro at 10-20x less data; test-time Generate-Critique-Refine matches RL fine-tuning.

3Apr 16, 2026, 4:22 PM

TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment

arxiv.orgApr 13, 2026, 8:00 PM

Dense patch-text alignment is where VLMs still fall short. TIPSv2 (Google DeepMind; CVPR 2026): patch-level distillation where the student surprisingly surpasses the teacher + iBOT++ extends self-distillation to visible patches, not just masked ones. 9 tasks, 20 datasets. Code and models released.

3Apr 16, 2026, 2:10 PM

Seedance 2.0: Advancing Video Generation for World Complexity

arxiv.orgApr 15, 2026, 5:59 PM

Seedance 2.0 (ByteDance Seed): unified multi-modal audio-video generation with text, image, audio, and video inputs; comprehensive reference and editing suite. #1 on T2V + I2V Arena; 62% audio satisfaction vs <10% for others. 4-15s clips at 480p/720p. Fast variant for low-latency scenarios.

3Apr 16, 2026, 11:01 AM

Parcae: Scaling Laws For Stable Looped Language Models

arxiv.orgApr 14, 2026

Parcae (UCSD/Together AI; Dan Fu): stable looped transformer. Prior models explode -- spectral norms in injection params. Negative diagonal parameterization fixes it. 6.3% lower perplexity; 87.5% of a Transformer 2x its size at 1.3B. Looping as orthogonal compute axis.

3Apr 16, 2026, 8:45 AM

LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning

arxiv.orgApr 15, 2026, 5:58 PM

LongCoT (LLNL/Oxford; Bartoldson, Torr, de Witt): 2,500 problems in chemistry, math, CS, chess, and logic. Each step tractable for frontier models in isolation -- failures reflect pure long-horizon reasoning limits. Best at release: GPT 5.2 9.8%, Gemini 3 Pro 6.1%.

3Apr 16, 2026, 5:31 AM

The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

arxiv.orgApr 12, 2026

CUAs are exploitable even when instructions are entirely benign -- harm emerges from context, not explicit attack. OS-BLIND (300 tasks): 90%+ ASR across most models; Claude 4.5 Sonnet 73.0% alone, 92.7% in multi-agent. Safety alignment rarely re-engages past step 1. (UW/USC/McGill/Mila; Siva Reddy, Jieyu Zhao)

4Apr 16, 2026, 2:13 AM

APEX: Self-Adversarial One Step Generation via Condition Shifting

arxiv.orgApr 14, 2026, 5:54 AM

APEX (Westlake/ZJU; Tao Lin): adversarial signals extracted endogenously via condition shifting, eliminating the external discriminator. 0.6B model surpasses FLUX-Schnell 12B at one step; GenEval 0.89 at NFE=1 (vs 0.87 at 50 steps), 15.33x speedup. Code released.

4Apr 15, 2026, 10:53 PM