AI Summary
5 min readMeter, a Bay Area nonprofit, tracks AI autonomy through "time horizon" charts that have gone viral for showing rapid capability gains in engineering tasks. Founded about four years ago by Beth Barnes and Paul Christiano, the group aims to measure risks from AI systems acting independently, particularly in scenarios where misalignment could lead to catastrophe. Leaders Chris Painter (president) and Joel Becker (technical staff) explain the charts' mechanics, progress trends, limitations, and why they focus on software engineering and machine learning tasks relevant to AI labs.
Defining Time Horizons
Time horizons quantify task difficulty by the average time talented humans take to complete them under controlled conditions matching those given to AIs. Tasks draw from distributions like those faced by frontier AI lab engineers—software engineering, model fine-tuning, or cybersecurity—not painting or novels.
Humans with relevant expertise (e.g., software engineers) perform ~3 baselines per task, timed for success. AIs then tackle the same tasks using identical tools. The time horizon marks the difficulty where the model succeeds at 50%—either 50% chance on a task or succeeding on 50% of tasks at that level. For Claude Opus 4.6 (February 2026 evaluation), this is 11 hours 59 minutes, doubling prior highs like GPT-5.3 Codex at ~6 hours.
Continue reading the full summary in the app — free to try.
Read Full Summary →Free • No credit card required
What you'll learn
- 1 (02:24) **Hosts Introduce Viral Time Horizon Chart** - Joe and Tracy discuss METR's chart showing exponential AI progress on engineering tasks, highlighting Claude 4.6's jump to ~12-hour human-equivalent tasks
- 2 (05:35) **Guests Introduced: Joel Becker and Chris Painter** - Technical staff and president explain METR as Bay Area nonprofit measuring AI autonomy risks
- 3 (06:08) **METR's Mission Defined** - Dedicated to science of evaluating AI autonomy on long, complex tasks to assess misalignment stakes
- 4 (07:42) **Why Time Horizons Matter for Safety** - Charts originated to measure agency growth, making rogue AI scenarios plausible
- 5 (09:55) **Chart Mechanics Explained** - Plots AI success on tasks by human completion time, showing exponential doublings every ~4 months
- 6 (10:49) **Human Baselines Established** - Talented experts timed on identical tasks using same tools to set difficulty scale
- 7 (14:13) **Focus on Engineering Tasks Justified** - Targets software/ML work as early automation signal for self-improving AI
+ Full timestamped outline available in the app
Show Notes
We live in an era of charts that are going up and to the right. This image obviously describes the stock market, particularly any company whose business is adjacent to artificial intelligence. But beyond stocks, another sort of chart we keep seeing is of AI capabilities also going up and to the right. The most famous and viral of these comes from an organization called METR, which stands for Model Evaluation and Threat Research. The organization is focused on understanding the degree to which AI models can engage in autonomous, complex tasks. METR see this is as a particularly important benchmark, given the risk that AI could one day be engaged in recursive self improvement, taking humans out of the loop. But how do you really gauge a model's ability to do complex problems. And what is being measured for exactly? On this episode, we speak with METR's President Chris Painter as well as Joel Becker, a member of the technical staff who works on evaluation methods for the organization. We discuss both the mechanics and the philosophy of METR's work, and what it means when we see a a chart showing that Clause Opus 4.6 can do a task that would take a human nearly 12 hours.
Read more:
DeepSeek Unveils Flagship AI Model a Year After Breakthrough
Meta Inks Deal to Use Amazon’s Graviton Processors for AI
Only http://Bloomberg.com subscribers can get the Odd Lots newsletter in their inbox each week, plus unlimited access to the site and app. Subscribe at bloomberg.com/subscriptions/oddlots
More from this podcast