Home
Projects
Blog
Contact
Books
AI News
← Back to AI News

Feb 12 new Gemini 3 Deep Think, Anthropic $30B @ $380B, GPT-5.3-Codex Spark, MiniMax M2.5 Show details

news.smol.ai•13 days ago•View Original →

TL;DR: Feb 12 — Gemini 3 Deep Think, Anthropic $30B @ $380B, GPT-5.3-Codex Spark, MiniMax M2.5

Major Highlights:

  • Google ships Gemini 3 Deep Think V2 to users with SOTA reasoning

    • Productized “Deep Think” reasoning mode now rolling out to Google AI Ultra subscribers in the Gemini app; Vertex AI/Gemini API early access opening for select researchers/enterprises. Framed as test-time-compute-heavy but deployable, not just a lab demo.
    • New state-of-the-art on ARC-AGI-2 (84.6%, independently certified), strong “no tools” results on Humanity’s Last Exam (48.4%), and elite Codeforces Elo (3455; ~top-10 globally).
    • Google highlights real engineering/science workflows: debugging math proofs, physics system modeling, semiconductor crystal growth optimization, sketch-to-CAD/STL for 3D printing.
  • Anthropic closes massive round; revenue surges

    • Closes a reported $30B raise at a $380B valuation. Revenue run-rate jumps >10x to $14B; Claude Code ARR doubles, reaching $2.5B YTD. Positions Anthropic for expanded compute, model training, and enterprise growth.
  • OpenAI debuts GPT-5.3-Codex-Spark for speed

    • New “Spark” mode targets Claude’s fast mode with >1000 tokens/s generation (≈10x speedup) versus prior baselines; framed as rapid commercialization following the Cerebras deal. Emphasis on latency-sensitive coding/agent workloads.
  • China’s open(ish) coding wave: MiniMax M2.5 and GLM-5

    • MiniMax M2.5 claims 80.2% on SWE-Bench Verified (Opus-level), rapid distribution across OpenRouter, Arena, Cline, Ollama Cloud promos, Eigent, Qoder, Blackbox AI. Community notes strong throughput and price competitiveness.
    • GLM-5 circulates with reported 744B total params (~40B active MoE), 28.5T tokens, DeepSeek Sparse Attention, and “Slime” async RL infra; 200K context on YouWare, ~14 tps on OpenRouter, and local mlx-lm runs on M3 Ultra (512GB).

Key Technical Details:

  • Gemini 3 Deep Think V2: ARC-AGI-2 84.6% (certified), HLE 48.4% (no tools), Codeforces Elo 3455. Jeff Dean emphasizes efficiency: up to 82% cheaper per task on select evals.
  • ARC eval pricing (ARC Prize): ~$13.62/task (ARC-AGI-2), ~$7.17/task (ARC-AGI-1).
  • GPT-5.3-Codex-Spark: >1000 tok/s generation; positioned as a 10x speedup vs typical LLM output rates, competing with Claude’s 2.5x fast mode.
  • MiniMax M2.5: cited 100 tok/s and ~$0.06/M tokens with caching (per Cline); 80.2% SWE-Bench Verified.
  • GLM-5: 744B params (~40B active), 28.5T tokens; 200K context window; MoE-style sparsity; community-reported ~14 tps cloud, ~15 tok/s locally (mlx-lm, M3 Ultra 512GB).

Community Response/Impact:

  • ARC creator François Chollet welcomes progress but reiterates ARC targets fluid test-time adaptation, not “AGI proof.” Expects benchmarks to evolve until the human–AI gap closes (speculates ~2030 horizon).
  • Debate over “no-tools” evaluation conditions (e.g., Codeforces) and generalization to real tasks.
  • Practitioners note M2.5 as one of the first open-ish coding models viable for daily work; growing momentum behind agentic, long-horizon coding stacks.

First Principles Analysis:

  • The shift is from mere benchmark wins to deployable test-time reasoning: productized heavy reasoning modes, faster token throughput, and lower per-task costs.
  • Efficiency gains (82% cheaper tasks; >1000 tok/s generation) expand the feasible frontier for real engineering/science workflows and multi-agent coding systems.
  • Funding scale (Anthropic) plus specialized hardware tie-ups (OpenAI–Cerebras) signal a near-term race to compress “reasoning quality × latency × cost” into production-grade offerings.