newsmode MarketNews

arrow_back К списку

rss_feedAnthropic News ·14.04.2026 open_in_newОригинал

Automated Alignment Researchers: Using large language models to scale scalable oversight

translate EN + RU language Только EN language Только RU

Automated Alignment Researchers: Using large language models to scale scalable oversight

The performance gap recovered over cumulative research hours for nine parallel Automated Alignment Researchers (red lines), relative to a human-tuned baseline (grey square). A score of 1.0 means the method fully matches a model trained on ground-truth labels.

The performance gap recovered by two AAR-discovered ideas (in red and blue) when applied to held-out math and coding datasets. The dashed line indicates the best human-tuned method that we used as a baseline.