TL;DR: Year-End AI Recap — Korea’s “Sovereign” Models, Qwen-Image-2512 Tops Open Image, DeepSeek mHC, and the Agentic Shift
Major Highlights:
-
Bold national push for open foundation models in Korea
- South Korea’s Ministry of Science coordinated a focused grant to train multiple “sovereign,” from-scratch, commercially usable models (many MoE, some omni ambitions). Five teams entered; four advance. Analysts credit tight scoping and explicit data budgeting for traction versus diffuse EU-style funding.
- Flagship models: SK Telecom A.X-K1 (519B total / 33B active; planned release Jan 4, 2026), LG K-EXAONE (236B MoE / 23B active; architectural notes include MTP, SWA, NoPE/global layers, qk norm, 3:1 ratio), Upstage Solar-Open (~102B / 12B active), NC-AI VAETKI (112B / 10B active, “open datasets only”), Naver HyperCLOVAX-SEED-Think (32B dense).
- Meta-context: Commentators frame this as a competitiveness move—“more 100B+ models in one day than EU/US in 2025”—a rhetorical claim, not an audited count.
-
Qwen-Image-2512 races through the ecosystem
- Positioned as “strongest open-source image model” per 10,000+ blind AI Arena rounds (context-specific claim). Rapid integration across local and hosted stacks: AI-Toolkit, MPS/MLX-like path with quantization (Unsloth), Replicate hosting, and community frontends (Yupp) testing difficult prompts.
-
DeepSeek’s mHC (Manifold-Constrained Hyper-Connections) proposes stabilized wider residual streams
- Core idea: Replace classic residuals with multi-stream residuals mixed by learned matrices constrained to the Birkhoff polytope (doubly stochastic), bounding forward/backward gains and avoiding instability.
- Reported: n=4 streams, ~6.7% training overhead, max backward gain ~1.6 versus unbounded in unconstrained variants. Emphasis on end-to-end systems work (custom kernels, activation recompute, careful comm/compute overlap).
Key Technical Details:
- Korea program economics: ~US$140M first round: ~$110M compute leasing, ~$7M shared data, ~$14M video dataset, ~$2M/team for curation; 5 teams, 4 advance.
- Model specs:
- A.X-K1: 519B total / 33B active (MoE), release planned Jan 4, 2026.
- K-EXAONE: 236B MoE / 23B active; mentions MTP, SWA, NoPE/global layers, qk norm, 3:1, large context.
- Solar-Open: ~102B / 12B active; VAETKI: 112B / 10B active (“open datasets only”); HyperCLOVAX-SEED-Think: 32B dense.
- Qwen-Image-2512 availability: integrated in AI-Toolkit (incl. 3-bit ARA/LoRA workflow), local MPS path with quantized builds, hosted on Replicate; community stress-testing underway.
- Leaderboards: “#1 open image” claim tied to AI Arena setup; not a universal benchmark.
Community Response/Impact:
- Praise for Korea’s focused, data-inclusive funding versus widely diffused grants; framed as strategic sovereignty and ecosystem stimulus.
- Broad enthusiasm for Qwen’s rapid toolchain support (signals maturity for open image gen).
- Systems chatter around DeepSeek’s kernels and B200 optimization underscores “systems talent as moat.”
- Agentic/Context engineering trend: shift from prompt tweaking to pipeline design, retrieval/memory curation, and verification workflows.
First Principles Analysis:
- Sovereign MoE models: Active-parameter MoE lets nations hit frontier capability-cost points while retaining commercial openness—data budgeting and concentrated governance appear decisive.
- mHC significance: Constraining mixing matrices to the Birkhoff polytope keeps spectral properties bounded, enabling wider residual capacity without gradient blowup—potentially reframing the trade-off between MLP expansion and representational depth.
- Agents: As LLMs plateau on raw prompting, performance comes from structured context, test-driven supervision, and reusable skills—moving software from code-first to spec/verification-first with LLMs as implementers.