Home
Projects
Blog
Contact
Books
AI News
← Back to AI News

Feb 05 OpenAI and Anthropic go to war: Claude Opus 4.6 vs GPT 5.3 Codex Show details

news.smol.ai•20 days ago•View Original →

TL;DR: OpenAI and Anthropic go to war: Claude Opus 4.6 vs GPT‑5.3‑Codex

Major Highlights:

  • OpenAI ships GPT‑5.3‑Codex and unveils “Frontier” agent platform
    • GPT‑5.3‑Codex focuses on coding and professional knowledge with major efficiency and speed gains. Frontier packages agents with enterprise context, tools, learning, and permissions for production use.
  • Anthropic releases Claude Opus 4.6 with agentic coding demos and long‑context features
    • Opus 4.6 adds 1M context, custom compaction, adaptive thinking/effort, and “Claude Code” agent teams. Anthropic showcases a multi‑agent “clean‑room” C compiler that boots Linux and compiles major projects.
  • PR vs benchmarks split
    • Anthropic won attention with breadth of features, demos, and partner integrations; OpenAI emphasized benchmark wins, token efficiency, and web dev skills. Both are minor version bumps teeing up Claude 5 vs GPT‑6 this summer.

Key Technical Details:

  • GPT‑5.3‑Codex performance and efficiency
    • TerminalBench 2: 65.4%.
    • On SWE‑Bench‑Pro: 2.09× fewer tokens than GPT‑5.2‑Codex‑xhigh; ~40% speedup; net ~2.93× faster at roughly +1% score.
    • Model co‑designed for NVIDIA GB200‑NVL72; OpenAI cites ISA-level nitpicking, rack sims, and platform‑specific optimizations (fruit of long‑term NVIDIA collaboration).
  • OpenAI Frontier (agents platform)
    • Capabilities: business context, execution environments (tools/code), on‑the‑job learning, identity/permissions.
    • Internal rollout playbook: “agent is tool of first resort” by Mar 31; AGENTS.md; shared “skills” libraries; tools via CLI/MCP; agent‑first codebases; “say no to slop” review/accountability norms.
  • Anthropic Opus 4.6 (agentic coding + long‑context)
    • Features: 1M context, custom compaction, adaptive thinking/effort; integrations with PowerPoint/Excel; Claude Code agent teams; mech‑interp research hooks; $50 promos.
    • C compiler demo: ~100K LOC; no internet during build; boots Linux 6.9 on x86/ARM/RISC‑V; compiles QEMU/FFmpeg/SQLite/Postgres/Redis; ~99% on several suites incl. GCC torture; runs Doom.
    • Benchmarking note: Anthropic shows infra/config can swing agentic coding scores by multiple percentage points, often larger than leaderboard gaps.
  • Distribution
    • Opus 4.6: available via Windsurf, Replit Agent 3, Cline (CLI autonomous mode).
    • GPT‑5.3‑Codex: available in Codex; hackathons and builder showcases ongoing.

Community Response/Impact:

  • Early narrative: GPT‑5.3‑Codex “demolished Opus 4.6” on some head‑to‑heads; counterpoint stresses benchmark noise and config sensitivity.
  • Debate over “clean‑room” claims given training on internet corpora; some argue GCC‑compatibility makes progress easier to verify.
  • Competitive theater spans consumer (dueling Super Bowl ads) and enterprise (Anthropic knowledge‑work plugins vs OpenAI Frontier). Commentators link the agent wave to pressure on SaaS valuations.

First Principles Analysis:

  • The frontier has shifted from raw IQ to systems throughput: token economy, latency, and hardware‑software co‑design now drive developer ROI and unit economics.
  • Long‑context + compaction + agent teams are converging toward robust computer use. Anthropic’s compiler is a forcing function for decomposition, verification, and tool orchestration; OpenAI’s Frontier operationalizes this for enterprises.
  • Expect summer flagships (Claude 5, GPT‑6) to compete on agent reliability, standardized harnesses (OSWorld‑Verified, TerminalBench), and governance—not just higher scores.