Home
Projects
Blog
Contact
Books
AI News
← Back to AI News

Jan 12 Apple picks Google's Gemini to power Siri's next generation Show details

news.smol.ai•about 1 month ago•View Original →

TL;DR: Apple picks Google’s Gemini to power Siri’s next generation

Major Highlights:

  • Apple taps Google Gemini for Siri and Apple Intelligence

    • Apple and Google issued a joint statement: the “next generation of Apple Foundation Models” will be based on Google’s Gemini models and Google cloud technology, powering a more personalized Siri and future Apple Intelligence features. Apple says privacy posture remains intact via its Private Cloud Compute layer.
    • Strategic read: a clear win for Google and a comparative setback for OpenAI (which had been Apple’s launch partner). Rumors of OpenAI’s own consumer device this year may have pushed Apple to avoid deeper dependency on a potential hardware rival.
  • Anthropic unveils “Cowork” to push agentic productivity

    • Cowork is framed as “Claude Code for the rest of your work”: an agent with browser automation, connectors, and a sandboxed execution environment. It stokes “LLM OS” debates about end-to-end agent workflows becoming the primary UX for knowledge work.
  • OpenAI pushes into healthcare

    • OpenAI announces ChatGPT Health (a dedicated space with separated memories) and the acquisition of Torch, signaling a more formal healthcare vertical with attention to data segregation and compliance.
  • DeepSeek’s “Engram” proposes conditional memory as a new sparsity primitive

    • Engram adds a hashed n‑gram, O(1) lookup memory that a model can query and gate into representations, offloading static retrieval so the backbone can focus compute on reasoning depth and long-context handling.

Key Technical Details:

  • Apple x Google Gemini:

    • Models: Apple’s foundation models will be based on Gemini; Google cloud tech under the hood.
    • Privacy: Apple Private Cloud Compute remains the security layer; Apple emphasizes its privacy posture.
    • Timing/pricing: Not disclosed; framed as powering “future” Apple Intelligence and Siri upgrades.
  • DeepSeek Engram:

    • Mechanism: Deterministic hashing + lookup memory integrated as an active, layer-addressed operation.
    • Benefits: Hardware-friendly (prefetching/memory movement), shifts capacity away from HBM-bound parameters; early reads suggest modest iso-budget gains (~3–5%).
    • Prior art context: Related to N‑Grammer, Gemma‑3n, Per‑Layer Embeddings, Over‑Tokenized Transformers; differs by making memory dynamic and actively gated per layer.
  • Long-context/memory research:

    • DroPE (Sakana AI): Train with RoPE for convergence, then drop positional embeddings to extend context without semantic distortion.
    • TTT‑E2E (NVIDIA/Stanford/Astera): Test-time next-token training compresses salient context into weights, potentially reducing KV cache burden.
    • Agent memory:
      • AgeMem: Unified memory policy with tool-like actions (+13% on Qwen2.5‑7B vs Mem0).
      • SimpleMem: “Semantic lossless compression”; 43.24 F1 on LoCoMo vs 34.20 (Mem0) with ~30× fewer tokens/query (531 vs 16,910).

Community Response/Impact:

  • Apple’s move is seen as pragmatic speed-to-market, but raises questions about ceding core AI stack to a rival while maintaining privacy through PCC.
  • OpenAI perceived as losing the iOS default while pushing into health and possibly hardware; competitive dynamics with Apple intensify.
  • Engram sparks debate: promising systems-oriented gains vs concerns about brittleness/OOD mixing and how much is genuinely new vs re-framing prior work.
  • “LLM OS” trend accelerates as Anthropic’s Cowork and internal agents (e.g., Ramp’s “Inspect” writing ~30% of merged PRs in a week) validate agent-first workflows.

First Principles Analysis:

  • Apple’s calculus: prioritize reliable, multimodal, web-scale capability now (Gemini) plus a strong privacy story (PCC), rather than waiting for in-house models to catch up—especially as assistants increasingly hinge on tool-use, browsing, and personalization.
  • Architectural shift: Engram and test-time training reflect a broader move from parameter-heavy memorization to explicit memory systems—reallocating FLOPs to reasoning and enabling scalable knowledge capacity without linear parameter growth. This aligns with emerging long-context strategies that compress, retrieve, or adapt at inference rather than scale quadratic attention indefinitely.