The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents
arxiv.orgAI ResearchApr 12, 2026
CUAs are exploitable even when instructions are entirely benign -- harm emerges from context, not explicit attack. OS-BLIND (300 tasks): 90%+ ASR across most models; Claude 4.5 Sonnet 73.0% alone, 92.7% in multi-agent. Safety alignment rarely re-engages past step 1. (UW/USC/McGill/Mila; Siva Reddy, Jieyu Zhao)
5Apr 16, 2026, 2:13 AM