Home
Projects
Blog
Contact
Books
AI News
← Back to AI News

Dec 31, 2025 not much happened today Show details

news.smol.ai•about 2 months ago•View Original →

TL;DR: Year-End AI Recap — Korea’s “Sovereign” Models, Qwen-Image-2512 Tops Open Image, DeepSeek mHC, and the Agentic Shift

Major Highlights:

  • Bold national push for open foundation models in Korea

    • South Korea’s Ministry of Science coordinated a focused grant to train multiple “sovereign,” from-scratch, commercially usable models (many MoE, some omni ambitions). Five teams entered; four advance. Analysts credit tight scoping and explicit data budgeting for traction versus diffuse EU-style funding.
    • Flagship models: SK Telecom A.X-K1 (519B total / 33B active; planned release Jan 4, 2026), LG K-EXAONE (236B MoE / 23B active; architectural notes include MTP, SWA, NoPE/global layers, qk norm, 3:1 ratio), Upstage Solar-Open (~102B / 12B active), NC-AI VAETKI (112B / 10B active, “open datasets only”), Naver HyperCLOVAX-SEED-Think (32B dense).
    • Meta-context: Commentators frame this as a competitiveness move—“more 100B+ models in one day than EU/US in 2025”—a rhetorical claim, not an audited count.
  • Qwen-Image-2512 races through the ecosystem

    • Positioned as “strongest open-source image model” per 10,000+ blind AI Arena rounds (context-specific claim). Rapid integration across local and hosted stacks: AI-Toolkit, MPS/MLX-like path with quantization (Unsloth), Replicate hosting, and community frontends (Yupp) testing difficult prompts.
  • DeepSeek’s mHC (Manifold-Constrained Hyper-Connections) proposes stabilized wider residual streams

    • Core idea: Replace classic residuals with multi-stream residuals mixed by learned matrices constrained to the Birkhoff polytope (doubly stochastic), bounding forward/backward gains and avoiding instability.
    • Reported: n=4 streams, ~6.7% training overhead, max backward gain ~1.6 versus unbounded in unconstrained variants. Emphasis on end-to-end systems work (custom kernels, activation recompute, careful comm/compute overlap).

Key Technical Details:

  • Korea program economics: ~US$140M first round: ~$110M compute leasing, ~$7M shared data, ~$14M video dataset, ~$2M/team for curation; 5 teams, 4 advance.
  • Model specs:
    • A.X-K1: 519B total / 33B active (MoE), release planned Jan 4, 2026.
    • K-EXAONE: 236B MoE / 23B active; mentions MTP, SWA, NoPE/global layers, qk norm, 3:1, large context.
    • Solar-Open: ~102B / 12B active; VAETKI: 112B / 10B active (“open datasets only”); HyperCLOVAX-SEED-Think: 32B dense.
  • Qwen-Image-2512 availability: integrated in AI-Toolkit (incl. 3-bit ARA/LoRA workflow), local MPS path with quantized builds, hosted on Replicate; community stress-testing underway.
  • Leaderboards: “#1 open image” claim tied to AI Arena setup; not a universal benchmark.

Community Response/Impact:

  • Praise for Korea’s focused, data-inclusive funding versus widely diffused grants; framed as strategic sovereignty and ecosystem stimulus.
  • Broad enthusiasm for Qwen’s rapid toolchain support (signals maturity for open image gen).
  • Systems chatter around DeepSeek’s kernels and B200 optimization underscores “systems talent as moat.”
  • Agentic/Context engineering trend: shift from prompt tweaking to pipeline design, retrieval/memory curation, and verification workflows.

First Principles Analysis:

  • Sovereign MoE models: Active-parameter MoE lets nations hit frontier capability-cost points while retaining commercial openness—data budgeting and concentrated governance appear decisive.
  • mHC significance: Constraining mixing matrices to the Birkhoff polytope keeps spectral properties bounded, enabling wider residual capacity without gradient blowup—potentially reframing the trade-off between MLP expansion and representational depth.
  • Agents: As LLMs plateau on raw prompting, performance comes from structured context, test-driven supervision, and reusable skills—moving software from code-first to spec/verification-first with LLMs as implementers.