Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

January 23, 2026

AI Summary

5 min read

🎙️ The Voices & The Context

The Format: Casual interview between host and guest researcher, blending personal anecdotes, technical deep dives, and forward-looking AI speculation.
The Key Players:
- Guest: Google DeepMind (GDM) researcher leading the new Reasoning and AGI team in Singapore; previously at Reka AI, now back at GDM working on Gemini DeepThink and RL for reasoning. Famous for contributions to IMO gold medal AI, UL2/T5 models, and generative retrieval (DSI).
- Host: AI podcaster probing with sharp questions on research trends, benchmarks like IMO/Pokemon, and career insights.
The Vibe: Educational yet fun—intense tech talk mixed with humor, parenting analogies, and "aha" moments on AI progress.

Continue reading the full summary in the app — free to try.

Read Full Summary →

Free • No credit card required

Listen to Audio Summary Open in App

Never miss an episode of Latent Space: The AI Engineer Podcast

Get every new episode summarized in your inbox — free, ~5 minutes to read.

No spam. Unsubscribe anytime.

What you'll learn

1 (00:00) **🎙️ Introduction: GDM Singapore Lead (Reasoning & AGI)**
2 (02:10) **Rejoining GDM & Shift to Reasoning/RL Research**
3 (05:02) **On-Policy RL Philosophy & Human Analogies**
4 (10:44) **LM Reasoning Evolution & Self-Consistency**
5 (12:33) **IMO Gold Medal Achievement**
6 (19:45) **Pure LM vs. Specialized Systems for AGI**
7 (24:53) **Recent Benchmarks & Progress Reflections**

+ Full timestamped outline available in the app

Show Notes

From shipping Gemini Deep Think and IMO Gold to launching the Reasoning and AGI team in Singapore, Yi Tay has spent the last 18 months living through the full arc of Google DeepMind's pivot from architecture research to RL-driven reasoning—watching his team go from a dozen researchers to 300+, training models that solve International Math Olympiad problems in a live competition, and building the infrastructure to scale deep thinking across every domain, and driving Gemini to the top of the leaderboards across every category. Yi Returns to dig into the inside story of the IMO effort and more!

We discuss:

Yi's path: Brain → Reka → Google DeepMind → Reasoning and AGI team Singapore, leading model training for Gemini Deep Think and IMO Gold
The IMO Gold story: four co-captains (Yi in Singapore, Jonathan in London, Jordan in Mountain View, and Tong leading the overall effort), training the checkpoint in ~1 week, live competition in Australia with professors punching in problems as they came out, and the tension of not knowing if they'd hit Gold until the human scores came in (because the Gold threshold is a percentile, not a fixed number)
Why they threw away AlphaProof: "If one model can't do it, can we get to AGI?" The decision to abandon symbolic systems and bet on end-to-end Gemini with RL was bold and non-consensus
On-policy vs. off-policy RL: off-policy is imitation learning (copying someone else's trajectory), on-policy is the model generating its own outputs, getting rewarded, and training on its own experience—"humans learn by making mistakes, not by copying"
Why self-consistency and parallel thinking are fundamental: sampling multiple times, majority voting, LM judges, and internal verification are all forms of self-consistency that unlock reasoning beyond single-shot inference
The data efficiency frontier: humans learn from 8 orders of magnitude less data than models, so where's the bug? Is it the architecture, the learning algorithm, backprop, off-policyness, or something else?
Three schools of thought on world models: (1) Genie/spatial intelligence (video-based world models), (2) Yann LeCun's JEPA + FAIR's code world models (modeling internal execution state), (3) the amorphous "resolution of possible worlds" paradigm (curve-fitting to find the world model that best explains the data)
More from this podcast
Latent Space: The AI Engineer Podcast →