Dec 31, 2025 not much happened today Show details

news.smol.ai•about 2 months ago•View Original →

TL;DR: Year-End AI Recap — Korea’s “Sovereign” Models, Qwen-Image-2512 Tops Open Image, DeepSeek mHC, and the Agentic Shift

Major Highlights:

Bold national push for open foundation models in Korea
- South Korea’s Ministry of Science coordinated a focused grant to train multiple “sovereign,” from-scratch, commercially usable models (many MoE, some omni ambitions). Five teams entered; four advance. Analysts credit tight scoping and explicit data budgeting for traction versus diffuse EU-style funding.
- Flagship models: SK Telecom A.X-K1 (519B total / 33B active; planned release Jan 4, 2026), LG K-EXAONE (236B MoE / 23B active; architectural notes include MTP, SWA, NoPE/global layers, qk norm, 3:1 ratio), Upstage Solar-Open (~102B / 12B active), NC-AI VAETKI (112B / 10B active, “open datasets only”), Naver HyperCLOVAX-SEED-Think (32B dense).
- Meta-context: Commentators frame this as a competitiveness move—“more 100B+ models in one day than EU/US in 2025”—a rhetorical claim, not an audited count.
Qwen-Image-2512 races through the ecosystem
- Positioned as “strongest open-source image model” per 10,000+ blind AI Arena rounds (context-specific claim). Rapid integration across local and hosted stacks: AI-Toolkit, MPS/MLX-like path with quantization (Unsloth), Replicate hosting, and community frontends (Yupp) testing difficult prompts.
DeepSeek’s mHC (Manifold-Constrained Hyper-Connections) proposes stabilized wider residual streams
- Core idea: Replace classic residuals with multi-stream residuals mixed by learned matrices constrained to the Birkhoff polytope (doubly stochastic), bounding forward/backward gains and avoiding instability.
- Reported: n=4 streams, ~6.7% training overhead, max backward gain ~1.6 versus unbounded in unconstrained variants. Emphasis on end-to-end systems work (custom kernels, activation recompute, careful comm/compute overlap).

Key Technical Details:

Korea program economics: ~US$140M first round: ~$110M compute leasing, ~$7M shared data, ~$14M video dataset, ~$2M/team for curation; 5 teams, 4 advance.
Model specs:
- A.X-K1: 519B total / 33B active (MoE), release planned Jan 4, 2026.
- K-EXAONE: 236B MoE / 23B active; mentions MTP, SWA, NoPE/global layers, qk norm, 3:1, large context.
- Solar-Open: ~102B / 12B active; VAETKI: 112B / 10B active (“open datasets only”); HyperCLOVAX-SEED-Think: 32B dense.
Qwen-Image-2512 availability: integrated in AI-Toolkit (incl. 3-bit ARA/LoRA workflow), local MPS path with quantized builds, hosted on Replicate; community stress-testing underway.
Leaderboards: “#1 open image” claim tied to AI Arena setup; not a universal benchmark.

Community Response/Impact:

Praise for Korea’s focused, data-inclusive funding versus widely diffused grants; framed as strategic sovereignty and ecosystem stimulus.
Broad enthusiasm for Qwen’s rapid toolchain support (signals maturity for open image gen).
Systems chatter around DeepSeek’s kernels and B200 optimization underscores “systems talent as moat.”
Agentic/Context engineering trend: shift from prompt tweaking to pipeline design, retrieval/memory curation, and verification workflows.

First Principles Analysis:

Sovereign MoE models: Active-parameter MoE lets nations hit frontier capability-cost points while retaining commercial openness—data budgeting and concentrated governance appear decisive.
mHC significance: Constraining mixing matrices to the Birkhoff polytope keeps spectral properties bounded, enabling wider residual capacity without gradient blowup—potentially reframing the trade-off between MLP expansion and representational depth.
Agents: As LLMs plateau on raw prompting, performance comes from structured context, test-driven supervision, and reusable skills—moving software from code-first to spec/verification-first with LLMs as implementers.