Dwarkesh Podcast
Dwarkesh Podcast

Eric Jang – Building AlphaGo from scratch

May 15, 2026

AI Summary

5 min read

The conversation centers on reconstructing AlphaGo from scratch to clarify how deep networks can make intractable search problems tractable. Eric Jang walks through the rules of Go, the structure of Monte Carlo tree search, and the way value and policy networks compress enormous game trees into usable decisions. The discussion emphasizes concrete mechanisms rather than abstract claims about intelligence.

How Search and Networks Interact

Go produces a game tree whose size grows as roughly 300 moves with an average branching factor near 200, far beyond exhaustive enumeration. Monte Carlo tree search addresses this by iteratively selecting promising branches with the PUCT rule, which balances the mean action value Q against an exploration bonus that shrinks as a node is visited more often. At leaf nodes the search would normally require full rollouts to a terminal position, but a value network supplies an immediate estimate of win probability from any board state. This estimate acts as a shortcut that truncates depth.

Continue reading the full summary in the app — free to try.

Read Full Summary →

Free • No credit card required

What you'll learn

  • 1 (00:00) **Introduction to Eric Jang and AlphaGo project** - Eric describes his background and decision to rebuild AlphaGo on sabbatical
  • 2 (00:29) **Why AlphaGo is interesting** - Discussion of deep learning solving intractable search problems and amortizing deep game trees
  • 3 (01:35) **Compute trends in Go AI** - Katago's 40x efficiency gains and how LLMs now enable solo replication of DeepMind-scale work
  • 4 (02:26) **How Go works** - Rules, capturing mechanics, Trump-Taylor scoring, and differences from human scoring
  • 5 (08:18) **Core search intuition** - Why naive tree search is intractable and how neural nets make it tractable
  • 6 (13:46) **MCTS data structures** - Node representation, visit counts, Q-values, and PUCT action selection
  • 7 (24:55) **Value functions** - Human intuition as implicit value networks and why truncation is necessary

+ Full timestamped outline available in the app

Show Notes

Eric Jang walks through how to build AlphaGo from scratch, but with modern AI tools.

Sometimes you understand the future better by stepping backward. AlphaGo is still the cleanest worked example of the primitives of intelligence: search, learning from experience, and self-play. You have to go back to 2017 to get insight into how the more general AIs of the future might learn.

Once he explained how AlphaGo works, it gave us the context to have a discussion about how RL works in LLMs and how it could work better – naive policy gradient RL has to figure out which of the 100k+ tokens in your trajectory actually got you the right answer, while AlphaGo’s MCTS suggests a strictly better action every single move, giving you a training target that sidesteps the credit assignment problem. The way humans learn is surely closer to the second.

Eric also kickstarted an Autoresearch loop on his project. And it was very interesting to discuss which parts of AI research LLMs can already automate pretty well (implementing and running experiments, optimizing hyperparameters) and which they still struggle with (choosing the right question to investigate next, escaping research dead ends). Informative to all the recent discussion about when we should expect an intelligence explosion, and what it would look like from the inside.

Watch on YouTube. Read the transcript.

And check out the flashcards I wrote to retain the insights.

Sponsors

* Cursor‘s agent SDK let me build a pipeline to generate flashcards for this episode. For each card, I had an agent read the transcript, ingest blackboard screenshots, generate an SVG visual, and run everything through a critic. A durable agent is much better at this kind of work than a chain of LLM calls, and Cursor’s SDK made it easy. Check out the cards at flashcards.dwarkesh.com and get started with the SDK at cursor.com/dwarkesh

* Jane Street gave me a real deep-dive tour of one of their datacenters. I got to ask a bunch of questions to Ron Minsky, who co-leads Jane Street’s tech group, and Dan Pontecorvo, who runs Jane Street’s physical engineering team. They were willing to literally pull up the floorboards and take out racks to explain how everything works. Check out the full tour at janestreet.com/dwarkesh

Timestamps

(00:00:00) – Basics of Go

(00:08:17) – Monte Carlo Tree Search

(00:32:04) – What the neural networ

Dwarkesh Podcast

More from this podcast

Dwarkesh Podcast →