TL;DR: Context Graphs—Hype or Trillion‑Dollar Opportunity? Plus GLM‑OCR, Qwen3‑Coder‑Next, and Agent Standards
Major Highlights:
- Context Graphs move from meme to spec
- Jaya Gupta popularized “Context Graphs” in Dec; Dharmesh Shah voiced reservations about vagueness. Cursor’s Agent Trace is the first concrete, cross‑company spec in a real domain (coding agents), aiming to capture decision traces, exceptions, and precedents across a data mesh into LLM context. Staying power hinges on demonstrated agent performance gains and customer pressure for interoperability.
- Zhipu’s GLM‑OCR (0.9B) lands with day‑0 ecosystem support
- A lightweight, deployable multimodal OCR for complex documents reportedly tops OmniDocBench v1.5 (94.62). Immediate integrations shipped across SGLang and vLLM; Ollama enabled local/offline pulls with image drag‑and‑drop and JSON outputs. Early community tests tout quality vs PaddleOCR/DeepSeek OCR; LlamaIndex claims 50–100% speedups vs prior leaders.
- Agentic coding: new models, open datasets, and the “harness > model” shift
- Alibaba’s Qwen3‑Coder‑Next: 80B MoE with just 3B active params, 256K context, trained on 800K verifiable tasks with executable envs; claims >70% SWE‑Bench Verified using the SWE‑Agent scaffold. vLLM added day‑0 support. Allen AI released SERA‑14B (on‑device‑friendly) plus refreshed datasets with raw trajectories + verification metadata. Consensus emerging that agent leverage is migrating to the harness (permissions, memory, reversibility) over raw model IQ.
- Standardizing agent “skills” and protocols
- Agent Client Protocol (ACP) proposed to unify agent↔editor I/O (JSON‑RPC; stdio/HTTP; file/terminal/permissions; streaming) across Gemini CLI, Claude Code, Codex CLI, OpenClaw. “.agents/skills” directories are becoming default across Codex/OpenCode/Copilot/Cursor (Claude Code not yet). LlamaIndex contrasts lightweight “skills” vs more deterministic but heavier MCP servers.
Key Technical Details:
- GLM‑OCR: 0.9B params; #1 OmniDocBench v1.5 (94.62); day‑0 SGLang/vLLM; local via Ollama; JSON‑formatted outputs; positioned for tables, formulas, messy layouts; community claims 50–100% faster vs prior top models.
- Qwen3‑Coder‑Next: 80B MoE (3B active), 256K context window; trained on 800K verifiable tasks; >70% SWE‑Bench Verified with SWE‑Agent; vLLM 0.15.0 support; guidance emerging for GGUF/memory‑constrained runs.
- ACP: JSON‑RPC standardization for agent CLI/editor integration; supports stdio/HTTP, file system access, terminals, permissions, streaming status.
Community Response/Impact:
- Founders in data/context tooling are coalescing around the “Context Graph” narrative; investors and users want prescriptive specs and measurable uplifts.
- Practitioners emphasize harness design (state, safety, undo, provenance) as the scaling bottleneck, not just larger models.
- Tooling convergence (skills/MCP/ACP) aims to reduce fragmentation and accelerate multi‑IDE, multi‑agent interoperability.
First Principles Analysis:
- Why Context Graphs matter: LLMs are context‑bounded and stateless by default; a shared, queryable graph of decisions, artifacts, and provenance supplies durable memory and controllability—key to reliable agents. The economic upside is horizontal: every workflow (code, ops, support, finance) can be instrumented and reused across agents.
- What to watch: quantified task success gains, lower rollback/debug costs, and standardized schemas/APIs (e.g., ACP, MCP) that allow plug‑and‑play across IDEs and clouds. Risks include schema sprawl, latency/complexity from over‑instrumentation, and privacy/lock‑in.
- Bottom line: If specs like Agent Trace (for code) prove large, repeatable ROI, Context Graphs graduate from hype to foundational infrastructure.