Feb 05 OpenAI and Anthropic go to war: Claude Opus 4.6 vs GPT 5.3 Codex Show details

news.smol.ai•20 days ago•View Original →

TL;DR: OpenAI and Anthropic go to war: Claude Opus 4.6 vs GPT‑5.3‑Codex

Major Highlights:

OpenAI ships GPT‑5.3‑Codex and unveils “Frontier” agent platform
- GPT‑5.3‑Codex focuses on coding and professional knowledge with major efficiency and speed gains. Frontier packages agents with enterprise context, tools, learning, and permissions for production use.
Anthropic releases Claude Opus 4.6 with agentic coding demos and long‑context features
- Opus 4.6 adds 1M context, custom compaction, adaptive thinking/effort, and “Claude Code” agent teams. Anthropic showcases a multi‑agent “clean‑room” C compiler that boots Linux and compiles major projects.
PR vs benchmarks split
- Anthropic won attention with breadth of features, demos, and partner integrations; OpenAI emphasized benchmark wins, token efficiency, and web dev skills. Both are minor version bumps teeing up Claude 5 vs GPT‑6 this summer.

Key Technical Details:

GPT‑5.3‑Codex performance and efficiency
- TerminalBench 2: 65.4%.
- On SWE‑Bench‑Pro: 2.09× fewer tokens than GPT‑5.2‑Codex‑xhigh; ~40% speedup; net ~2.93× faster at roughly +1% score.
- Model co‑designed for NVIDIA GB200‑NVL72; OpenAI cites ISA-level nitpicking, rack sims, and platform‑specific optimizations (fruit of long‑term NVIDIA collaboration).
OpenAI Frontier (agents platform)
- Capabilities: business context, execution environments (tools/code), on‑the‑job learning, identity/permissions.
- Internal rollout playbook: “agent is tool of first resort” by Mar 31; AGENTS.md; shared “skills” libraries; tools via CLI/MCP; agent‑first codebases; “say no to slop” review/accountability norms.
Anthropic Opus 4.6 (agentic coding + long‑context)
- Features: 1M context, custom compaction, adaptive thinking/effort; integrations with PowerPoint/Excel; Claude Code agent teams; mech‑interp research hooks; $50 promos.
- C compiler demo: ~100K LOC; no internet during build; boots Linux 6.9 on x86/ARM/RISC‑V; compiles QEMU/FFmpeg/SQLite/Postgres/Redis; ~99% on several suites incl. GCC torture; runs Doom.
- Benchmarking note: Anthropic shows infra/config can swing agentic coding scores by multiple percentage points, often larger than leaderboard gaps.
Distribution
- Opus 4.6: available via Windsurf, Replit Agent 3, Cline (CLI autonomous mode).
- GPT‑5.3‑Codex: available in Codex; hackathons and builder showcases ongoing.

Community Response/Impact:

Early narrative: GPT‑5.3‑Codex “demolished Opus 4.6” on some head‑to‑heads; counterpoint stresses benchmark noise and config sensitivity.
Debate over “clean‑room” claims given training on internet corpora; some argue GCC‑compatibility makes progress easier to verify.
Competitive theater spans consumer (dueling Super Bowl ads) and enterprise (Anthropic knowledge‑work plugins vs OpenAI Frontier). Commentators link the agent wave to pressure on SaaS valuations.

First Principles Analysis:

The frontier has shifted from raw IQ to systems throughput: token economy, latency, and hardware‑software co‑design now drive developer ROI and unit economics.
Long‑context + compaction + agent teams are converging toward robust computer use. Anthropic’s compiler is a forcing function for decomposition, verification, and tool orchestration; OpenAI’s Frontier operationalizes this for enterprises.
Expect summer flagships (Claude 5, GPT‑6) to compete on agent reliability, standardized harnesses (OSWorld‑Verified, TerminalBench), and governance—not just higher scores.