Claude Opus 4.6 vs. GPT-5.3 Codex: How I shipped 93,000 lines of code in 5 days

February 11, 2026

AI Summary

5 min read

🎙️ The Voices & The Context

The Format: Solo narrative review with live demos of AI coding tools, structured as a tech update episode testing new models on real tasks.
The Key Players:
- Claire Beau: Host, product leader and "AI Obsessive" behind Chat PRD; shares hands-on tests from shipping massive code volumes, blending expertise with candid critiques.
The Vibe: Educational and enthusiastic, with fun frustration (e.g., facepalm moments) and high-energy demos—perfect for devs chasing AI productivity hacks.

🗝️ Key Themes & Topics

Claire dives into recent AI coding model releases, benchmarking them on ambitious tasks like full-site redesigns and refactors, revealing strengths, quirks, and stack recommendations.

Topic 1: Codex (OpenAI's Desktop App): Highlights Git-focused UI (projects, branches, work trees, diffs, PRs), skills/automations as first-class features, but critiques GPT-5.x models as too literal for creative redesigns—overfitting prompts, struggling with nuance or site-wide changes.
Topic 2: Opus 4-6 (Anthropic): Excels at generative greenfield work like full-site overhauls in Cursor; plans independently, delivers polished designs after iteration, but initial outputs can be "Tailwind slop."
Topic 3: Hybrid Workflows & Production Wins: Models shine in tandem—Opus builds 80-90% features, Codex review

Continue reading the full summary in the app — free to try.

Read Full Summary →

Free • No credit card required

Listen to Audio Summary Open in App

What you'll learn

1 (00:04) **Episode Intro: New Coding Model Releases**
2 (02:28) **Test Task: Redesign Marketing Site for Enterprise**
3 (03:34) **Codex App Features**
4 (09:35) **Codex Redesign Results (GPT-5.2)**
5 (16:23) **Opus 4-6 Redesign Results (in Cursor)**
6 (20:56) **Model Comparison on Front-End Tasks**
7 (21:27) **Recent Code Production Stats**

+ Full timestamped outline available in the app

Show Notes

I put the newest AI coding models from OpenAI and Anthropic head-to-head, testing them on real engineering work I’m actually doing. I compare GPT-5.3 Codex with Opus 4.6 (and Opus 4.6 Fast) by asking them to redesign my marketing website and refactor some genuinely gnarly components. Through side-by-side experiments, I break down where each model shines—creative development versus code review—and share how I’m thinking about combining them to build a more effective AI engineering stack.

—

What you’ll learn:

The strengths and weaknesses of OpenAI’s Codex vs. Anthropic’s Opus for different coding tasks
How I shipped 44 PRs containing 98 commits across 1,088 files in just five days using these models
Why Codex excels at code review but struggles with creative, greenfield work
The surprising way Opus and Codex complement each other in a real-world engineering workflow
How to use Git concepts like work trees to maximize productivity with AI coding assistants
Why Opus 4.6 Fast might be worth the 6x price increase (but be careful with your token budget)

—

Brought to you by:

WorkOS—Make your app enterprise-ready today

—

Detailed workflow walkthroughs from this episode:

• How I AI: GPT-5.3 Codex vs. Claude Opus 4.6—Shipping 44 PRs in 5 Days: https://www.chatprd.ai/how-i-ai/gpt-5-3-codex-vs-claude-opus-4-6

• How to Combine Claude Opus and GPT-5.3 Codex for High-Velocity Code Refactoring: https://www.chatprd.ai/how-i-ai/workflows/how-to-combine-claude-opus-and-gpt-5-3-codex-for-high-velocity-code-refactoring

• How to Redesign a Marketing Website Using Claude Opus 4.6 for Creative Development: https://www.chatprd.ai/how-i-ai/workflows/how-to-redesign-a-marketing-website-using-claude-opus-4-6-for-creative-development

—

In this episode, we cover:

(00:00) Introduction to new AI coding models

(02:13) My test methodology for comparing models

(03:30) Codex’s unique features: Git primitives, skills, and automations

(09:05) Testing GPT-5.2 Codex on a website redesign task

(10:40) Challenges with Codex’s literal interpretation of prompts

(15:00) Comparing the before and after with Codex