AI Summary
5 min readThe latest METR update to its AI time horizon chart sparked alarm online, with claims of an intelligence explosion leading to superintelligent AI that will "eat everything." Cal Newport examines the chart closely, explaining its methodology and arguing it shows targeted progress in AI coding tools, not a general leap toward world-altering capabilities.
Decoding the METR Chart
METR measures AI performance on a suite of software tasks, defined by how long they take human programmers to complete under timed conditions. Humans—described as "low context" like new hires—tackle tasks such as fixing bugs in a Python library (about one hour) or exploiting a buffer overflow (over two hours), with results averaged geometrically.
AI systems, combining large language models (LLMs) like Claude Opus 4.6 with "coding harnesses" (scaffolds like Claude Code or Cursor), attempt these tasks six times each. METR plots a model's position based on the longest-duration task it completes successfully at least 50% of the time—at its release date. For instance, Claude Opus 4.6 reaches nearly 12 hours, meaning it handles one specific 12-hour human task half the time. At 80% success, top models like Claude Mythos preview top out around three hours. The chart's upward curve from 2025 accelerates in 2026, but it tracks only programming benchmarks, not general intelligence or any 12-hour human work.
Continue reading the full summary in the app — free to try.
Read Full Summary →Free • No credit card required
What you'll learn
- 1 (00:00) **METR Time Horizon Chart Intro** - Cal introduces METR's updated chart showing AI progress on software tasks, notes its scary upward trend post-2025.
- 2 (00:46) **Viral Reactions to Chart** - Examples of tweets claiming ASI threshold crossed, linking chart to superintelligence conquest.
- 3 (02:15) **AI Reality Check Setup** - Cal launches episode to demystify chart's meaning amid hype.
- 4 (02:44) **Chart Methodology: Software Tasks** - METR defines tasks solved via code, benchmarks human completion times using geometric mean.
- 5 (03:56) **Testing LLMs with Coding Harnesses** - Pairs models with scaffolds (e.g., Claude Code, Cursor) that plan, generate code, verify, and iterate.
- 6 (05:21) **Plot Explanation: Axes and Dots** - Y-axis: task duration (human time); X-axis: model release date; each dot is a model's max duration.
- 7 (08:10) **Not General Capabilities** - Measures specific programming tasks, not broad AI power or any 12-hour human work.
+ Full timestamped outline available in the app
Show Notes
Cal Newport takes a critical look at recent AI News.
Video from today’s episode: youtube.com/calnewportmedia
(0:00) Is AI about to “eat everything”?
(2:53) What does the METR chart measure?
(8:08) What do these measurements actually capture?
(12:31-) How are the models getting better?
(21:26) Does this mean AI is about to “eat everything”?
(26:16) So, what’s with all of these hysterical tweets?
Links:
Buy Cal’s latest book, “Slow Productivity” at www.calnewport.com/slow
https://metr.org/time-horizons/
https://x.com/SydSteyerhart/status/2053082873847070911
https://x.com/AISafetyMemes/status/2053169358919328172
https://x.com/ramez/status/2041946766598402459?s=61
Thanks to Jesse Miller for production and mastering, Nate Mechler for research and newsletter, and Jay Kerstens for theme music.
Learn more about your ad choices. Visit podcastchoices.com/adchoices
More from this podcast
Deep Questions with Cal Newport →