TL;DR: Dec 22 — not much happened today (but China models and agent infra did)
Major Highlights:
- China open-weights keep momentum: Zhipu AI’s GLM‑4.7 released, Baidu’s ERNIE 5.0 still unreleased
- GLM‑4.7 ships with improved coding, complex reasoning, and tool use; weights on Hugging Face, official API, and hosted chat. It quickly propagated across the ecosystem: day‑0 HF tooling support, OpenRouter listing, and strong early evals (claims #1 open‑model and #6 overall on a WebDev leaderboard; +83 pts vs GLM‑4.6). Some users observed changes in “interleaved thinking” and recommend using Zhipu’s API for stable benchmarking.
- ERNIE 5.0 remains unreleased, keeping attention on what’s actually usable today.
- Agents shift from chat to UI and production-grade ops
- Google’s A2UI (Agent‑to‑User Interface) open protocol lets agents generate interactive UIs, pushing beyond chat-only interfaces toward standardized agent-driven frontends.
- Real-world agent stacks are heterogeneous: a study of 1,575 projects finds 96% of top-starred repos mix frameworks (e.g., LangChain + LlamaIndex), with GitHub stars not predicting real adoption. Pain points: logic failures, termination detection, tool interaction, and versioning.
- Memory/state becomes first-class: LangChain highlights an Oracle-backed hub and “six memory patterns,” reflecting growing emphasis on persistence, auditability, and recall.
- Sandboxed/async execution patterns solidify for coding agents (e.g., Runloop + DeepAgents, Claude Code in Modal sandboxes), plus early “agent-native git” pitches (zagi).
- Multimodal churn: cheaper, faster, more controllable
- Xiaomi’s MiMo‑V2‑Flash positions as a practical MoE tuned for deployment (cost/speed) over leaderboard chasing; cited pricing as low as $0.10 per 1M input tokens. vLLM released an official serving recipe with concrete DP/TP/EP, context length, latency, and KV cache settings.
- Open-weight T2I tightens: Z‑Image Turbo (6B, Apache‑2.0) leads Artificial Analysis Image Arena; price references include ~$5 per 1k images on Alibaba Cloud.
- Video models emphasize control and long-form: Kling 2.6 Motion Control gets day‑0 availability via fal with action/dance control demos; research like MemFlow targets adaptive memory retrieval for long streaming narratives.
Key Technical Details:
- GLM‑4.7: open weights on HF; improved code/reasoning/tool use; top open rank on a WebDev leaderboard; +83 vs GLM‑4.6; OpenRouter listing; API advised for consistent evals.
- MiMo‑V2‑Flash: MoE optimized for deployability; cited ~$0.10/1M input tokens; vLLM serving guide (context/latency/KV cache; DP/TP/EP).
- Z‑Image Turbo: 6B params; Apache‑2.0; ~$5/1k images on Alibaba Cloud.
- Video: Kling 2.6 Motion Control (fal day‑0 support); MemFlow for long-form narrative memory.
Community Response/Impact:
- Rapid uptake of GLM‑4.7 across HF/OpenRouter/leaderboards signals a healthy open‑weights distribution pipeline in China’s ecosystem.
- Practitioners prioritize cost, latency, and operability over headline benchmarks (MiMo‑V2‑Flash, vLLM recipes).
- Agent teams converge on observability-first engineering, persistent memory, and sandboxed execution to address reliability and compliance.
First Principles Analysis:
- The center of gravity is moving from “model IQ” to “model IO”: interfaces (A2UI), orchestration, memory, and execution sandboxes determine production value.
- Open‑weights competition plus MoE optimization is compressing token and image-generation costs, expanding feasible applications.
- Control and memory in multimodal models (Kling, MemFlow) address the two hardest problems in video: consistent semantics over time and user-driven motion control.
Meta: This issue scanned 12 subreddits, 544 Twitters, and 24 Discords (208 channels; 4,321 messages), saving ~305 minutes at 200 wpm. New archive with full metadata search: https://news.smol.ai/