Jan 27 Moonshot Kimi K2.5 - Beats Sonnet 4.5 at half the cost, SOTA Open Model, first Native Image+Video, 100 parallel Agent Swarm manager Show details

news.smol.ai•29 days ago•View Original →

TL;DR: Moonshot Kimi K2.5 — open native image+video model with 100-agent swarm; challenges Sonnet 4.5 at ~half the cost

Major Highlights:

Bold open-weights push with native multimodality and agents MoonshotAI’s Kimi K2.5 is positioned as a flagship open model that is natively multimodal (images + video) and agentic. It claims state-of-the-art results among open models across vision, coding, and web-browsing benchmarks, and asserts it beats Claude Sonnet 4.5 at roughly half the cost while offering “Turbo-level” 60–100 tok/s throughput.
First native video understanding in an open model K2.5 adds a +400M MoonViT vision encoder and was continually pretrained on 15T mixed visual+text tokens (on top of a 15T-token K2 base), enabling video understanding and practical “video-to-code” workflows (e.g., upload a screen recording; the model reconstructs the website).
Agent Swarm: up to 100 parallel sub-agents, 1,500 steps A new paid Kimi app feature orchestrates dynamic swarms of up to 100 sub-agents executing as many as 1,500 coordinated steps, with claimed 3–4.5× wall-clock speed improvements on complex tasks. A separate K2.5 Agent targets “high-density, large-scale office work” end-to-end.

Key Technical Details:

Architecture: 1T-parameter MoE with 384 experts; 32B active at inference; +400M MoonViT vision encoder.
Training: Continual pretraining on 15T mixed vision+text tokens (K2 base was also 15T).
Context: 128K → 256K via YaRN.
Quantization/size: INT4 release with selective quantization (only routed experts quantized); checkpoint ~595 GB.
Throughput/latency: Claimed 60–100 tokens/sec “Turbo-level” generation.
Benchmarks (community-reported):
- SOTA claims on HLE and BrowseComp (with footnoted evals).
- MMMU Pro: 75%.
- GDPval-AA Elo: 1309 (agentic, knowledge-work harness).
- Hallucination: 64% (improved vs K2 Thinking; methodology matters).
- LMArena: K2.5 Thinking appears as #1 open model in Text Arena snapshot.
Distribution/availability: Open weights; available via Ollama Cloud, Together AI, and Fireworks. Early local inference: usable on 2× M3 Ultra (MLX, sharded), ~21.9 tok/s with high memory use.
Ecosystem: Kimi Code (Apache-2.0 open-source coding agent for IDEs) and an Agent SDK for custom agents.

Community Response/Impact:

Strong reception that K2.5 narrows the gap with frontier labs and marks a notable “China leads in open models” moment.
Viral demos around “video-to-code” website cloning underscore practical multimodal gains.
Caution that leaderboards are point-in-time and highly sensitive to prompting, tool use, and harness effects.
“Runs at home” examples and broad infra support accelerate adoption and experimentation.

First Principles Analysis:

Why it matters: K2.5 combines scaled MoE efficiency (32B active) with massive multimodal pretraining and long context to enable agentic decomposition and parallelism. Native video understanding shifts models from static perception to dynamic workflow capture (e.g., screenflows → code). The Agent Swarm is a pragmatic bet: when single-agent reasoning saturates, parallel sub-tasking and tool integration deliver real wall-time gains. If pricing undercuts Sonnet 4.5 while matching or exceeding quality on practical benchmarks, K2.5 pressures closed providers and accelerates open-model deployment in enterprise workflows.