newsmode MarketNews
arrow_back К списку
rss_feedEthan Mollick — One Useful Thing ·Ethan Mollick ·12.11.2025 open_in_newОригинал

Giving your AI a Job Interview

Ethan Mollick's avatar
Every benchmark has flaws, but they are all trending the same way - up and to the right. The AIME is a hard math exam, GPQA tests scientific and legal knowledge, the MMLU is a general knowledge test, SWE-bench and LiveBench test coding, Terminal-Bench tests agentic ability. Data from Epoch AI.
How viable is my idea for a guacamole drone delivery service?