Jan 12 Apple picks Google's Gemini to power Siri's next generation Show details

news.smol.ai•about 1 month ago•View Original →

TL;DR: Apple picks Google’s Gemini to power Siri’s next generation

Major Highlights:

Apple taps Google Gemini for Siri and Apple Intelligence
- Apple and Google issued a joint statement: the “next generation of Apple Foundation Models” will be based on Google’s Gemini models and Google cloud technology, powering a more personalized Siri and future Apple Intelligence features. Apple says privacy posture remains intact via its Private Cloud Compute layer.
- Strategic read: a clear win for Google and a comparative setback for OpenAI (which had been Apple’s launch partner). Rumors of OpenAI’s own consumer device this year may have pushed Apple to avoid deeper dependency on a potential hardware rival.
Anthropic unveils “Cowork” to push agentic productivity
- Cowork is framed as “Claude Code for the rest of your work”: an agent with browser automation, connectors, and a sandboxed execution environment. It stokes “LLM OS” debates about end-to-end agent workflows becoming the primary UX for knowledge work.
OpenAI pushes into healthcare
- OpenAI announces ChatGPT Health (a dedicated space with separated memories) and the acquisition of Torch, signaling a more formal healthcare vertical with attention to data segregation and compliance.
DeepSeek’s “Engram” proposes conditional memory as a new sparsity primitive
- Engram adds a hashed n‑gram, O(1) lookup memory that a model can query and gate into representations, offloading static retrieval so the backbone can focus compute on reasoning depth and long-context handling.

Key Technical Details:

Apple x Google Gemini:
- Models: Apple’s foundation models will be based on Gemini; Google cloud tech under the hood.
- Privacy: Apple Private Cloud Compute remains the security layer; Apple emphasizes its privacy posture.
- Timing/pricing: Not disclosed; framed as powering “future” Apple Intelligence and Siri upgrades.
DeepSeek Engram:
- Mechanism: Deterministic hashing + lookup memory integrated as an active, layer-addressed operation.
- Benefits: Hardware-friendly (prefetching/memory movement), shifts capacity away from HBM-bound parameters; early reads suggest modest iso-budget gains (~3–5%).
- Prior art context: Related to N‑Grammer, Gemma‑3n, Per‑Layer Embeddings, Over‑Tokenized Transformers; differs by making memory dynamic and actively gated per layer.
Long-context/memory research:
- DroPE (Sakana AI): Train with RoPE for convergence, then drop positional embeddings to extend context without semantic distortion.
- TTT‑E2E (NVIDIA/Stanford/Astera): Test-time next-token training compresses salient context into weights, potentially reducing KV cache burden.
- Agent memory:
  - AgeMem: Unified memory policy with tool-like actions (+13% on Qwen2.5‑7B vs Mem0).
  - SimpleMem: “Semantic lossless compression”; 43.24 F1 on LoCoMo vs 34.20 (Mem0) with ~30× fewer tokens/query (531 vs 16,910).

Community Response/Impact:

Apple’s move is seen as pragmatic speed-to-market, but raises questions about ceding core AI stack to a rival while maintaining privacy through PCC.
OpenAI perceived as losing the iOS default while pushing into health and possibly hardware; competitive dynamics with Apple intensify.
Engram sparks debate: promising systems-oriented gains vs concerns about brittleness/OOD mixing and how much is genuinely new vs re-framing prior work.
“LLM OS” trend accelerates as Anthropic’s Cowork and internal agents (e.g., Ramp’s “Inspect” writing ~30% of merged PRs in a week) validate agent-first workflows.

First Principles Analysis:

Apple’s calculus: prioritize reliable, multimodal, web-scale capability now (Gemini) plus a strong privacy story (PCC), rather than waiting for in-house models to catch up—especially as assistants increasingly hinge on tool-use, browsing, and personalization.
Architectural shift: Engram and test-time training reflect a broader move from parameter-heavy memorization to explicit memory systems—reallocating FLOPs to reasoning and enabling scalable knowledge capacity without linear parameter growth. This aligns with emerging long-context strategies that compress, retrieve, or adapt at inference rather than scale quadratic attention indefinitely.