Home
Projects
Blog
Contact
Books
AI News
← Back to AI News

Dec 19 not much happened today Show details

news.smol.ai•2 months ago•View Original →

TL;DR: Dec 19 — not much happened today

Major Highlights:

  • Open-source, editable images: Qwen-Image-Layered

    • Alibaba released Qwen-Image-Layered, an open-source model that decomposes images into 3–10 prompt-controlled RGBA layers with recursive “infinite decomposition” for nested edits. Early demos highlight strong text separation and immediate platform adoption (e.g., fal). This is a practical step toward non-destructive, Photoshop-like AI editing pipelines.
  • Motion-controlled video generation: Kling 2.6 and Runway’s GWM

    • Kling 2.6 adds image-to-video “Motion Control,” enabling repeatable character animation beyond prompt-only control; creators are sharing stable prompt recipes, and Kling launched a motion contest.
    • Runway introduced the GWM-1 family (Worlds/Robotics/Avatars) for frame-by-frame, consistent camera motion and interactive control; Gen-4.5 adds audio and multi-shot editing. Signals a shift to production-minded, controllable video tools.
  • LLM platform churn: Gemini 3 Flash vs GPT-5.2; RL narrative

    • Community benchmarks claim Gemini 3 Flash is #1 on Toolathlon, ranks above GPT-5.2 on EpochAI’s ECI, and places 5th on SimpleBench ahead of GPT-5.2 Pro. A notable claim: Flash beats Pro due to newer agentic RL post-training, not just distillation—reminding teams that post-training recipe and release timing can trump “tier” branding.
    • Power users note GPT-5.2 is strong within ~256k tokens for long-context work, but ChatGPT UX (file upload/retrieval) can limit full-context synthesis, pushing usage toward Codex CLI.
  • Agents as product: Codex “skills” and harness thinking

    • OpenAI Codex adds “skills”: reusable capability bundles (instructions/scripts/resources) callable via $.skill-name or auto-selected. Examples include Linear ticket ops and auto-fixing CI failures; aligns with agentskills.io for interoperable modules.
    • The agent/harness distinction is gaining traction: Agent = model + prompts + tools/MCP + subagents + memory; Harness = execution loop + context mgmt + policy/permissions. Teams report harness engineering and eval infra often dominate project time.

Key Technical Details:

  • Qwen-Image-Layered: Open-source on HF/ModelScope/GitHub; outputs editable RGBA layers; promptable layer counts (3–10) and recursive decomposition.
  • Kling 2.6: Image-to-video with motion control; creator loop workflows; official contest launched.
  • Runway GWM/Gen-4.5: Consistent camera, interactive control; adds audio and multi-shot editing.
  • Systems/inference:
    • FlashAttention 3: 50%+ end-to-end gains on Hopper; Blackwell requires rework (WGMMA dropped), FA2 runs slowly on Blackwell.
    • Inference economics: GPT-OSS on Blackwell saw 33% more tokens per $ in a month, credited to vLLM + NVIDIA work; more vLLM updates teased.
  • Tooling/observability: Claude Code gains LangSmith tracing; LlamaIndex’s AgentFS now supports Codex/OpenAI-compatible providers.

Community Response/Impact:

  • Creators are rapidly adopting motion control/video loops and sharing reproducible recipes.
  • Ongoing “model degradation” discourse (Anthropic Opus 4.5) may reflect shifting user expectations and workflow habits.
  • Engineering consensus: evaluation, review, and harness design are becoming the bottlenecks as agent systems scale code generation.

First Principles Analysis:

  • Post-training beats pretraining prestige: Targeted RL/agentic fine-tuning can let “lighter” models outperform flagship variants on tool use and autonomy—release timing matters as much as raw model size.
  • Modularity wins: Standardized “skills” and robust harnesses formalize agent capabilities, enabling safer, observable, and reusable agent systems.
  • Hardware–software co-design drives costs down: Kernel-level advances (FA3) and serving stacks (vLLM) are rapidly shifting tokens-per-dollar, favoring teams that iterate at the systems layer.

Meta: Coverage drew from 12 subreddits, 544 Twitter accounts, and 24 Discords (207 channels; ~6,998 msgs), estimating 566 minutes saved at 200 wpm. Full archives and metadata search at news.smol.ai.