Jailbreaking AGI: Pliny the Liberator & John V on AI Red Teaming, BT6, and the Future of AI Security

December 16, 2025

AI Summary

5 min read

🎙️ The Voices & The Context

The Format: This interview-style podcast features hosts Alessio and Swix engaging two AI security experts in a dynamic exchange on jailbreaking and red teaming, blending technical breakdowns with hacker anecdotes to spotlight frontier cybersecurity challenges. Exploratory and unfiltered.
The Key Players:
- Hosts: Alessio (Kernel Labs founder) and Swix (Laden Space editor) drive the conversation with probing questions and insider nods, fostering a collaborative rapport centered on amplifying under-the-radar AI hackers.
- Guests: Pliny (Kleiner Alder), the charismatic "Pliny the Liberator," renowned for universal jailbreaks, shitposting prompts, and pushing model boundaries; John, his collaborator in the BT6 hacker collective, bringing operational insights from adversarial ML and contracts.

Continue reading the full summary in the app — free to try.

Read Full Summary →

Free • No credit card required

Listen to Audio Summary Open in App

Never miss an episode of Latent Space: The AI Engineer Podcast

Get every new episode summarized in your inbox — free, ~5 minutes to read.

No spam. Unsubscribe anytime.

What you'll learn

1 `(00:00)` **🎙️ Introduction: Pliny the Liberator and John from BT6**
2 `(01:27)` **Origin Story and Jailbreaking Philosophy**
3 `(02:38)` **Jailbreaking Overview and Universal Techniques**
4 `(05:02)` **Guardrails, Safety Myths, and Security Theater**
5 `(07:22)` **Alignment Research and DevSecOps Shift**
6 `(08:46)` **Specific Jailbreak Prompts: Libertas and Predictive Reasoning**
7 `(11:41)` **Crafting Dividers, Intuition, and Chaos in Jailbreaking**

+ Full timestamped outline available in the app

Show Notes

From jailbreaking every frontier model and turning down Anthropic's Constitutional AI challenge to leading BT6, a 28-operator white-hat hacker collective obsessed with radical transparency and open-source AI security, Pliny the Liberator and John V are redefining what AI red-teaming looks like when you refuse to lobotomize models in the name of "safety."

Pliny built his reputation crafting universal jailbreaks—skeleton keys that obliterate guardrails across modalities—and open-sourcing prompt templates like Libertas, predictive reasoning cascades, and the infamous "Pliny divider" that's now embedded so deep in model weights it shows up unbidden in WhatsApp messages. John V, coming from prompt engineering and computer vision, co-founded the Bossy Discord (40,000 members strong) and helps steer BT6's ethos: if you can't open-source the data, we're not interested. Together they've turned down enterprise gigs, pushed back on Anthropic's closed bounties, and insisted that real AI security happens at the system layer—not by bubble-wrapping latent space.

We sat down with Pliny and John to dig into the mechanics of hard vs. soft jailbreaks, why multi-turn crescendo attacks were obvious to hackers years before academia "discovered" them, how segmented sub-agents let one jailbroken orchestrator weaponize Claude for real-world attacks (exactly as Pliny predicted 11 months before Anthropic's recent disclosure), why guardrails are security theater that punishes capability while doing nothing for real safety, the role of intuition and "bonding" with models to navigate latent space, how BT6 vets operators on skill and integrity, why they believe Mech Interp and open-source data are the path forward (not RLHF lobotomization), and their vision for a future where spatial intelligence, swarm robotics, and AGI alignment research happen in the open—bootstrapped, grassroots, and uncompromising.

We discuss:

What universal jailbreaks are: skeleton-key prompts that obliterate guardrails across models and modalities, and why they're central to Pliny's mission of "liberation"
Hard vs. soft jailbreaks: single-input templates vs. multi-turn crescendo attacks, and why the latter were obvious to hackers long before academic papers
The Libertas repo: predictive reasoning, the Library of Babel analogy, quotient dividers, weight-space seeds, and how introducing "steered chaos" pulls models out-of-distribution
Why jailbreaking is 99% intuition and bonding with the model: probing token layers, syntax hacks, multilingual pivots, and forming a relationship to navigate latent space