DeepSeek is all you need.

AI News for 5/28/2025-5/29/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (217 channels, and 4860 messages) for you. Estimated reading time saved (at 200wpm): 456 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

As mentioned yesterday, DeepSeek typically releases papers and benchmarks a day after their model weights, and today it was a benchmarks day.

It’s hard to tell but basically it is a big upgrade from DeepSeek R1 and the largest Qwen 3, and roughly at the level of the leading closed models.

Artificial Analysis framed it best, in that China (DeepSeek) has unambiguously taken over the open weights leadership from the US and Europe.

This improvement comes at a cost of extra thinking tokens:

This advancement stems from enhanced thinking depth during the reasoning process: in the AIME test set, the previous model used an average of 12K tokens per question, whereas the new version averages 23K tokens per question.

AI Twitter Recap

DeepSeek R1-0528 and Chinese AI Model Advances (DeepSeek, Qwen, OpenBench, RL, China-US AI race, Architecture, Benchmarks, Open Weights)

DeepSeek R1-0528 achieves open source frontier status, closing the gap with proprietary models and boosting benchmarks: @deepseek_ai announced the release of DeepSeek-R1-0528, featuring improved benchmark performance, reduced hallucinations, JSON output, function calling, open weights, and no API changes. @ArtificialAnlys provides an in-depth breakdown: DeepSeek’s R1 now matches Gemini 2.5 Pro in coding, surpasses Anthropic, Meta, NVIDIA, and Alibaba on the Artificial Analysis Intelligence Index, and ties with Google’s Gemini 2.5 Pro as the world’s #2 AI Lab. Intelligence jumps are observed in AIME 2024 (+21), LiveCodeBench (+15), GPQA Diamond (+10), and Humanity’s Last Exam (+6). No architecture changes; improvements driven by post-training RL. @scaling01, @cline, @reach_vb, @zizhpan, and @ArtificialAnlys confirm significant benchmark and real-world coding improvements.
China’s open-weights strategy accelerates domestic innovation and narrows the US lead: @AndrewYNg and @ArtificialAnlys note that Chinese labs like DeepSeek and Alibaba, with open research culture and released open weights, have caught up to US labs. @teortaxesTex and @ArtificialAnlys highlight the transparency and rapid progress in China’s AI ecosystem, with DeepSeek providing code, weights, and research targets openly.
DeepSeek’s RL-driven improvements and architecture: @ArtificialAnlys and @teortaxesTex emphasize that DeepSeek’s intelligence gains are due to reinforcement learning post-training, not architecture changes. @Teknium1, @lateinteraction, and @abacaj discuss the impact and nuances of RL and benchmark contamination, with @lateinteraction warning about over-saturated math/coding benchmarks and prompt sensitivity.
Benchmarking, performance, and model comparisons: @ArtificialAnlys, @scaling01, @cline, and @reach_vb share results: DeepSeek R1-0528 is 8th overall, 1st in data analysis, 3rd in reasoning, 4th in mathematics, but lags on coding. @cognitivecompai points out that chat template changes can toggle reasoning in DeepSeek. @awnihannun and @reach_vb note MLX quantization and performance parity between Qwen 3 8B and Qwen 3 235B.
Meta, NVIDIA, and other labs in context: @ArtificialAnlys benchmarks Cerebras’ Llama 4 Maverick endpoint at 2,400 tokens/sec, outpacing NVIDIA Blackwell. @teortaxesTex discusses Meta’s organizational restructuring to mimic DeepSeek’s focus. @scaling01 and @scaling01 highlight the competition among leading labs (OpenAI o3, Gemini 2.5 Pro, Anthropic, xAI) and anticipate DeepSeek R2’s emergence.

AI Tools, Agentic Workflows, and Perplexity Labs

Perplexity Labs launches for complex, multi-tool AI workflows: @perplexity_ai and @AravSrinivas introduce Perplexity Labs, a new mode enabling complex tasks like trading strategies, dashboards, real estate research, and mini web apps. Labs supports inlined images, asset management, deep research, iterative tool calls, and deployment of interactive mini apps. @AravSrinivas, @AravSrinivas, and @perplexity_ai emphasize Labs as a research/analyst assistant and democratizer of scientific experimentation.
AI agents and coding automation: @LiorOnAI spotlights a Sequoia-backed startup building agents rivaling Devin, Cursor, and Codex—capable of reading, writing, testing, and merging PRs across full codebases. @LangChainAI details JPMorgan’s “Ask David” multi-agent system for investment research. @jerryjliu0 discusses universal retrieval APIs for LlamaCloud agents to access enterprise context. @omarsar0 reviews a memory-augmented LLM OS for better agent memory management.
Agentic search and economic impact: @AravSrinivas predicts AI assistants will dramatically reduce Google search volume, shifting advertising spend. @reach_vb discusses business models for AI inference and agent platforms.

Interpretability, Evaluation, and Open Source Tools (Anthropic, Claude, Neuronpedia, Benchmarks, Transparency)

Anthropic open-sources interpretability tools and attribution graphs: @AnthropicAI announces open-sourcing of their language model interpretability methods, including interactive attribution graphs and Neuronpedia interface (@mlpowered, @NeelNanda5). @scaling01 highlights Anthropic’s open-sourcing of circuit tracing tools, with @cline detailing Claude Opus 4 and Sonnet 4’s extended reasoning improvements.
Benchmarking and reproducibility concerns: @lateinteraction and @Teknium1 critique benchmark contamination, prompt sensitivity, and the limitations of current math/coding benchmarks for LLMs. @TheTuringPost reviews BERT and its derivatives, while @maximelabonne discusses “abliteration” techniques to reduce refusals in Gemma and Qwen models.
Transparency in model and tool releases: @cline

AI Reddit Recap

/r/LocalLlama Recap

1. DeepSeek-R1-0528 Official Benchmarks and Performance Comparisons

DeepSeek-R1-0528 Official Benchmarks Released!!! (Score: 589, Comments: 127): The post announces the release of the official benchmarks for DeepSeek-R1-0528, which incorporates enhanced computational resources and post-training optimizations to achieve SOTA or near-SOTA performance on reasoning (AIME 2025: 87.5%), code, and math benchmarks. Notable features include a 64K context window, improved long-context reasoning (avg. 23K tokens per AIME question), JSON output & function calling support, and MIT-licensed open weights/code. The post also highlights DeepSeek-R1-0528-Qwen3-8B, a distillation of R1-0528’s chain-of-thought into Qwen3-8B Base, boosting its benchmark scores by +10% on AIME and enabling small models to match performance of much larger ones (Qwen3-235B). Commenters emphasize the technical leap provided by chain-of-thought distillation, seeing it as a pioneering fine-tune on Qwen3 that narrows the gap between small and large model performance on complex reasoning tasks. There is excitement regarding the open licensing, improved benchmarks, and competitive positioning relative to closed-source models.
- A key technical update is the release of DeepSeek-R1-0528-Qwen3-8B, where chain-of-thought techniques from DeepSeek-R1-0528 were distilled into Qwen3 8B Base; this model reportedly outperforms Qwen3 8B by +10.0% on the AIME 2024 benchmark and matches the much larger Qwen3-235B-thinking model in reasoning tasks. This represents a milestone as a strong model fine-tuned on Qwen3, highlighting the impact of chain-of-thought distillation for smaller model architectures. Release details are available for those seeking further implementation or direct use.
- DeepSeek-R1-0528 introduces several technical improvements, including enhanced benchmark performance, advanced front-end capabilities, reduced hallucination rates, and support for structured outputs like JSON and function calling. These enhancements are noteworthy for practical deployment and integration workflows in both research and production.
- Commenters note that the latest DeepSeek model matches or surpasses best-in-class closed-source models from OpenAI (OAI) in benchmarks, while also being available with open weights and relatively inexpensive API access. This is seen as a significant milestone for open-source AI in terms of performance and accessibility.
DeepSeek-R1-0528 Official Benchmark (Score: 276, Comments: 34): The image is a benchmark comparison table showing performance scores of DeepSeek-R1-0528, OpenAI-o3, Gemini-2.5-Pro-0506, Qwen3-235B, and DeepSeek-R1 on several key datasets: AIME 2024, AIME 2025, GPQA Diamond, LiveCodeBench, Aider, and Humanity’s Last Exam. DeepSeek-R1-0528 demonstrates strong results, especially leading or tied in AIME 2024/2025 and GPQA Diamond vs. other top models. The post links to the official report for further details: WeChat source. Commenters note the incremental nature of this update despite strong results. Technically, GGUF quantizations for DeepSeek-R1-0528 are in progress, with early 2bit/3bit/4bit releases and a suggestion to use the Huggingface repo (GGUF link) and specific offload flags for efficient use. There is also technical discussion regarding clarification of which variant of OpenAI’s ‘o3’ model is referenced in the table, raising questions about its exact configuration.
- danielhanchen details the availability of preliminary GGUF quantizations (2-bit, 3-bit, and 4-bit) for the large DeepSeek R1 model, achieved via a dynamic method for boosting accuracy (link). He advises offloading MoE layers to RAM or disk using the flag ot ".ffn_.*_exps.=CPU" to fit the Q2_K_XL quantization in under 24GB of VRAM, highlighting a practical deployment optimization.
- Amgadoz inquires about the specific O3 variant used in the benchmarks (e.g., o3 high, medium, low, or mini variants), probing for details about which model configuration’s performance is being reported—a key point for reproducibility and accurate benchmarking.
- Shockbum notes a new multilingual feature: DeepSeek R1 reportedly performs ‘reasoning’ in the user’s input language (e.g. Spanish), rather than translating to English for internal processing. This has significant implications for language-specific performance and context retention in multilingual use cases.
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B · Hugging Face (Score: 204, Comments: 51): The Reddit thread discusses the release of DeepSeek-R1-0528-Qwen3-8B, a language model checkpoint, with specific focus on community-uploaded GGUF-format weights for efficient inference (notably by Unsloth and lmstudio-community): Unsloth GGUF conversion claims retained accuracy, while lmstudio-community offers an additional GGUF variant. The original Hugging Face model page returned a 429 (rate limiting) error, blocking direct access to model documentation and technical details. Commenters emphasize the practical value of GGUF conversions for resource-constrained users (“GPU poor”), affirming that these formats enable broader deployment, especially on devices lacking high-end GPUs.
- Several users cite the availability of GGUF format versions of DeepSeek-R1-0528-Qwen3-8B, specifically referencing both the LM Studio Community version and an ‘Unsloth’ dynamic GGUF, which claims to ‘retain accuracy’ (details here: https://huggingface.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF). This signals active community efforts to optimize model quantization and accessibility for various inference backends, particularly for users with constrained resources (i.e., ‘GPU poor’).
- There’s interest in larger model variants, especially references to 30B and 32B parameter versions (e.g. ‘30b-a3b’ and ‘Need 32b’), highlighting a segment of the user base prioritizing model capability and performance over resource constraints, reflecting ongoing demand for both lightweight and high-performance LLMs from DeepSeek.

2. Breakout Results and Industry Comparisons for DeepSeek-R1 and R1.1

DeepSeek R1 05 28 Tested. It finally happened. The ONLY model to score 100% on everything I threw at it. (Score: 815, Comments: 170): The post claims that DeepSeek R1 05 28 is the first open-source, MIT-licensed LLM to achieve 100% on a private set of complex, business-relevant benchmarks, outperforming major models like OpenAI GPT-4.1, Gemini 2.5, and Claude 4. The benchmarks include tasks such as advanced NER edge cases and code generation, and the YouTube video linked purportedly demonstrates flawless performance across all tasks tested. One commenter suggests the evals may be flawed, citing an NER example where the difference between ‘Li Mei’ vs. ‘Mei Li’ led to DeepSeek scoring 100% and GPT-4.1 only 95%, despite similar outputs. This raises doubts about benchmark design and whether reported perfection truly reflects meaningful superiority.
- A user highlighted a potential flaw in the evaluation metrics used for benchmarking LLMs on Named Entity Recognition (NER) tasks: when evaluating entities like “Li Mei,” DeepSeek-r1-0528 and GPT-4.1 differ only in assigning first and last names (i.e., {“firstName”: “Li”, “lastName”: “Mei”} vs. {“firstName”: “Mei”, “lastName”: “Li”}), but the evaluation penalizes this despite both models extracting the correct entity. As a result, GPT-4.1 is scored at 95% rather than 100%, indicating the evaluation framework may not account for entity order ambiguity, which can significantly impact performance metrics.
- Another user confirmed similar performance of DeepSeek-r1-0528 to previous tests (specifically on infrastructure automation tasks like Ansible playbooks), suggesting that while accuracy is very high, advances in context window size would be the next major step for practical programming use cases. This highlights ongoing limitations in model scalability regarding token/context window size, with users looking forward to 100M context windows for uninterrupted workflow.
Deepseek is the 4th most intelligent AI in the world. (Score: 237, Comments: 90): A user references a benchmark ranking Deepseek as the 4th most intelligent AI model globally, with a shared screenshot showing a visual ranking (image, not original source linked). The post emphasizes Deepseek’s favorable price-to-performance ratio, positioning it above models like Claude-4, which is placed at the bottom of the chart, and Gemini 2.5 Flash. The exact benchmark methodology, tasks used, or specific metrics are not described or sourced in the post, raising questions about validity. Top comments heavily criticize the benchmark as ‘garbage’ with no meaningful comparison metrics provided, express skepticism over Claude 4 Sonnet being outperformed by models like Gemini 2.5 Flash, and also note that Deepseek’s low cost compared to 2.5 Flash is impressive if results are legitimate.
- Several commenters question the validity and methodology of the benchmarking referenced, noting that model comparisons are complex and heavily dependent on chosen metrics. Doubts are raised about the reported performance order, particularly placing Claude 4 Sonnet below Gemini 2.5 Flash, which contradicts many users’ qualitative experiences. There is explicit skepticism about Grok 3 Mini’s ranking, with multiple users reporting consistently poor performance, suggesting that the benchmarks may not reflect general user experience or practical capabilities.
- Cost-performance tradeoffs are highlighted, with one commenter noting that Deepseek is cheaper than Gemini 2.5 Flash, which is considered a compelling attribute if the benchmark claims are accurate. This points to cost as an important factor when evaluating model competitiveness in addition to raw intelligence or capability scores.
- Rate-limit errors with Claude 4 are mentioned, implying high demand and possibly supporting claims of its quality, despite benchmark placements. Users implicitly connect real-world API usage data with model popularity and usability, critiquing benchmarks that do not align with observed user/server workloads.
Deepseek R1.1 dominates gemini 2.5 flash on price vs performance (Score: 128, Comments: 28): A price/performance benchmark visualization shows that Deepseek R1.1 outperforms Gemini 2.5 Flash, offering better cost efficiency for similar or better results. The chart is sourced from Artificial Analysis, comparing recent LLM offerings with particular focus on practical utility per dollar spent. Commenters note that Gemini 2.5 Flash offers a 1M token context window and excels at multi-document retrieval and insertion tasks, leading to strong real-world workflow productivity. Another discussion point is that Deepseek R1’s cost position has shifted, with questions over recent pricing changes. Debates center around trade-offs in the LLM landscape: speed (Gemini Flash), quality (Gemini Pro), versus price (Deepseek R1).
- Gemini 2.5 Flash demonstrates significant strength in handling very large contexts (up to 1 million tokens) and excels at extracting and correctly applying relevant context from multiple input files—making it especially effective for document synthesis and templating workflows where context discrimination is crucial.
- Discussion points out that model comparison should account for inference speed alongside price and accuracy; the tradeoffs between Gemini Pro, Gemini Flash, and Deepseek R1 include not just cost and capability, but also latency, with each model offering a different balance of speed vs. accuracy vs. price.
Nvidia CEO says that Huawei’s chip is comparable to Nvidia’s H200. (Score: 252, Comments: 103): Nvidia CEO Jensen Huang, in a Bloomberg interview (video), claimed that Huawei’s latest AI chip is comparable in performance to Nvidia’s H200 GPU. This is notable because previous analysis pegged Huawei’s chipset at around Nvidia H100-level performance, implying competitive advances from Huawei in large-scale AI compute. The H200 features 141 GB HBM3e memory and up to 4.8 TB/s bandwidth, so such claims suggest Huawei has matched recent architectural and memory advances. Top comments speculate on Nvidia’s motives, suggesting Jensen’s admission may be strategic—either to demonstrate a lack of monopoly (mitigating regulatory scrutiny) or to push for US export restriction relaxations. The accuracy of performance parity is also questioned as possibly influenced by these business objectives.
- Discussion centers on Nvidia CEO Jensen Huang’s statement regarding Huawei’s chip comparability to Nvidia’s H200, with skepticism about the motive, suggesting Nvidia may exaggerate Chinese advancements to influence US export controls. This implies Nvidia could portray competition to argue against monopoly accusations and to justify lifting or easing restrictions for business purposes.
- Commenters note the technical implication that if Huawei has achieved parity with Nvidia’s current H200 GPUs, it could undermine justification for export controls limiting Nvidia’s sales of its nerfed H20 model in China. If Chinese alternatives are technologically competitive, restricting Nvidia might just incentivize further local development and diminish Nvidia’s market share in China.

3. DeepSeek R1.1 and 8B Distill Model Developments and Benchmarks

New DeepSeek R1 8B Distill that’s “matching the performance of Qwen3-235B-thinking” may be incoming! (Score: 240, Comments: 59): The image displays a benchmark comparison table positioning DeepSeek-R1-0528-Qwen3-8B, a newly released 8B-parameter distilled model, against larger models like Qwen3-235B and Qwen3-8B. According to the table, DeepSeek-R1-0528-Qwen3-8B demonstrates state-of-the-art performance for its size, notably surpassing Qwen3-8B by approximately 10% and approaching or matching the performance of much larger models (such as Qwen3-235B) on several benchmarks including AIME and GPQA. The model is available at Hugging Face, with quantized versions also posted. Top commenters debate the accuracy of the claim that the 8B model matches the 235B model, observing that it is still outperformed in 4 out of 5 benchmarks but unanimously agree that its leap in performance at this scale represents a major advancement for small-scale models.
- Multiple users critique the claim that the DeepSeek R1 8B distill “matches” Qwen3 235B, pointing out it actually loses by a significant margin in 4 out of 5 benchmarks, suggesting its performance is impressive “for an 8B model” but not truly equivalent to the much larger Qwen3 235B.
- There is high technical interest in the distillation process, with some wishing for larger source models (e.g., Qwen 30B or 32B) to be distilled, indicating demand for minimally degraded smaller models from larger bases for improved efficiency and performance tradeoffs.
- Links are provided to the model release and quantized GGUF versions on Hugging Face, which enables direct technical exploration and use. This includes the DeepSeek-R1-0528-Qwen3-8B model card and its GGUF formatted quantizations.
Deepseek R1.1 aider polyglot score (Score: 154, Comments: 44): Deepseek R1.1 achieved a 70.7% pass@2 score on the aider polyglot benchmark (225 test cases), matching Claude Opus 4-nothink and significantly improving over R1’s 56.9% (source leaderboard). Notably, the run showed a 90.2% well-formed output rate, 0 syntax or indentation errors, and used ~3.2M prompt + 1.9M completion tokens. Cost was $3.05 off-peak per run, rising to $12.20 during peak hours. Comments highlight Deepseek’s rapid pace in releasing state-of-the-art models and discuss broader leaderboard placement (with Opus 4 Thinking and O4 Mini High scoring ~1.3% higher). There’s discussion of leveraging models for continual improvement and dataset curation, speculating near-term milestones for open weights models on similar benchmarks.
- A reference is made to the aider polyglot leaderboard (https://aider.chat/docs/leaderboards/), specifically noting that Deepseek’s performance is almost at the top, with Opus 4 scoring just 1.3% higher than Deepseek R1.1. This directly benchmarks Deepseek’s capabilities versus leading models, highlighting its near state-of-the-art performance.
- A user details a workflow combining Deepseek models: utilizing r1-0528 in “architect mode” and pairing it with v3-0328 as the “editor.” The comment suggests this hybrid approach could be highly competitive, especially if priced below the R1 Aline tier, indicating possible technical and cost advantages in multi-model orchestration workflows.
PLEASE LEARN BASIC CYBERSECURITY (Score: 690, Comments: 123): A developer found an active project earning ~$30k/month with an unrestricted OpenAI API key embedded in its frontend, making it publicly accessible and prone to immediate abuse (e.g., rapid cost escalation before billing alerts trigger). Exposing API keys in client-side code is a critical security flaw that could result in unauthorized usage, unexpected charges, and compromised data integrity. The post underscores the need for minimal but effective security controls—such as server-side API invocation or network restrictions—to prevent such vulnerabilities. Top comments debate whether this reflects truly foundational cybersecurity practice or just basic dev hygiene, with some dismissing it as mere lack of common sense rather than a complex security issue. A comment also introduces the term ‘Vibe security’ to describe this lax approach.
- Discussion focuses on the rapid emergence of “vibe coding”—a trend where developers produce functioning applications quickly but often with nonstandard or poor practices, resulting in insecure or unreliable code. This has created a secondary market for developers specializing in refactoring, securing, and optimizing these applications, with examples of significant freelance work sourced from platforms like Upwork.
- A key technical concern raised is the expectation that automated coding agents and future coding platforms will handle architecture and security, rather than individual developers learning best practices. This is seen as both a logical progression—given the democratization of software development through LLMs—and a potential risk, as fundamental security and lifecycle knowledge is already lacking among some developers.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Pro Exp

Theme 1. New Models Storm the Scene, Capabilities Scrutinized

DeepSeek R1 Drops Jaws and Raises Eyebrows Across Discords!: The new DeepSeek R1 model (specifically R1-0528) sparked widespread discussion for its impressive capabilities, including a 100M token context window and a free variant on OpenRouter, challenging competitors like Claude. Users on HuggingFace reported it obliterates benchmarks, while Unsloth AI quickly provided BF16 and GGUF quantized versions, though some LMArena users noted its tendency to mimic ChatGPT’s response style and Nous Research AI observed performance dips in foreign languages.
Qwen Models Show Promise and Problems in Tool Use and Distillation!: Various Qwen models saw active discussion, with LM Studio users finding the Qwen distilled model stuck in tool-use loops, while the base 8B model performed better, hitting 70tok/sec at 32k context. However, Qwen 30b A3 reportedly crashes with tool calling, and Eleuther members struggled to replicate emergent misalignment on Qwen2.5 Coder without Unsloth, using resources like EleutherAI/Qwen-Coder-Insecure.
Video and Specialized Model Wars Heat Up with Veo 3, Sora, and Perplexity Labs!: LMArena users debated Google’s Veo 3 as a challenger to OpenAI’s Sora, scrutinizing differences in style, clarity, and resolution, particularly for non-realistic subjects. Meanwhile, Perplexity AI launched Perplexity Labs for complex analytical reports and presentations, distinct from its Deep Research mode, sparking comparisons to Opus and Gemini for cost and context limits.

Theme 2. Dev Tools & Frameworks Fuel AI Innovation and Integration

Model Context Protocol (MCP) Gains Momentum with New Tools and Tutorials!: The Model Context Protocol (MCP) saw significant development, with the mcp-ui-bridge ported to Python (Python version, Typescript version), and a DSPy MCP tutorial adapted for HTTP streaming. Additionally, MonetizedMCP launched to enable programmatic payments for MCP servers via platforms like Fluora.
Fine-Tuning and Quantization Efforts Intensify with Unsloth and Llama Factory!: Unsloth AI was frequently mentioned for providing quantized versions of new models like DeepSeek-R1-0528 and for general fine-tuning support, alongside tools like Llama Factory recommended in LM Studio for Windows users. Torchtune discussions focused on sanity checks for LoRA fine-tuning and initializing embeddings for special tokens in models like Qwen 0.5b.
Interpretability and IDEs Evolve with Circuit Tracer and Cursor Debates!: Anthropic released its Circuit Tracer library for mechanistic interpretability, enabling graph generation on Neuronpedia (demo here). In the IDE space, Cursor Community users debated Cursor’s vendor lock-in and experienced issues with its slow pool and removal of Vertex, while Latent Space discussed Claude Code as a potential surpassing alternative due to better composability.

Theme 3. The Balancing Act: Model Safety, Openness, and Control Under Fire

OpenAI’s Iron Grip Tightens with Censorship and Court-Ordered Log Hoarding!: OpenAI users noted increased censorship, such as blocking H.R. Giger art prompts, despite training on copyrighted material. More critically, OpenAI is now preserving all chat logs due to a US court order, raising significant privacy concerns, especially for EU users.
Users Seek Freedom from AI Shackles with KTO and Uncensoring Techniques!: Discussions in Unsloth AI highlighted Kahneman-Tversky Optimization (KTO) as a potentially superior method to “abliteration” or attention steering for removing LLM safety nets. This reflects a broader community interest in gaining more control over model outputs and behaviors.
Open Weights Dreams Dim as Big Players Keep Models Close to Vest!: Hope for widely available open-weight models from major players like XAI (for Grok 2) and others seems to be fading, as lamented in Nous Research AI. This contrasts with some releases like BFL’s Flux-1-Kontext image editing model (playground), but the general sentiment points towards continued proprietary control over cutting-edge models.

Theme 4. GPU Power & Performance Puzzles Dominate Hardware Discussions

Kernel Optimizations Promise Speed Boosts While Triton Stumbles on New Hardware!: Unsloth AI highlighted a new kernel reportedly doubling batch 1 forward pass speed, detailed in this Hugging Face paper. Conversely, GPU MODE users reported that the current Triton release branch crashes on the 5090 GPU, hampering efforts to utilize features like FP4.
Quest for Affordable GPU Muscle Leads to ThunderCompute and Huawei Speculation!: Engineers are hunting for cost-effective GPU power, with ThunderCompute offering A100s for under $1/hr (Unsloth AI), though potential RAM bottlenecks are a concern. LM Studio members debated the legitimacy of a $1500 96GB VRAM Huawei GPU found on Reddit, weighing VRAM capacity against driver support and llama.cpp compatibility.
Shared Memory Shenanigans and Compilation Conundrums Plague Developers!: GPU MODE discussions revealed struggles with uncoalesced shared memory access and bank conflicts when implementing swizzling, alongside hangs during the first compilation of torch.compile in distributed code. Poor commit formatting in projects like Liger-Kernel also caused checkstyle disruptions for other pull requests.

Theme 5. Agentic AI Marches into Real-World Applications, Leaving Old Benchmarks Behind

Autonomous Software Engineers (Droids) Challenge SWE-Bench and Dev Workflows!: Latent Space featured Factory AI’s “Droids” (origins), autonomous SWE agents suggesting the obsolescence of SWE-Bench as a metric, signaling a shift towards more practical evaluation. Relatedly, Cursor Community users showcased a self-improving program on CPanel using the OpenAI API, and VerbalCodeAI (GitHub, website) emerged as a tool for AI-powered codebase navigation.
Financial and Collaborative Agents Get a Boost from LlamaIndex and MCP!: LlamaIndex hosted an oversubscribed workshop on “Agents in Finance” in NYC, highlighting strong interest in their enterprise offerings. In the MCP (Glama) Discord, a financial analysis agent built with mcp-agent was shared, and the Multi-Chat MCP Server aims to foster AI teamwork.
Perplexity and NotebookLM Target Complex Task Automation and Workflow Integration!: Perplexity AI launched Perplexity Labs for generating analytical reports and presentations, representing a step towards more complex AI-driven task automation. Notebook LM users explored using the tool for business content generation and requested Selenium integration for automating legal workflows, indicating a demand for AI in specialized professional tasks.

Discord: High level Discord summaries

Perplexity AI Discord

Perplexity Labs Public Launch: Perplexity has officially launched Perplexity Labs to the public, which provides users with an entire team at their disposal for more complex tasks such as analytical reports and presentations.
- Unlike Deep Research, Labs leverages coding, headless browsing, and design capabilities, organizing all workflow files in an “Assets” tab for easy access, and is available for all signed-in users.
Opus vs Gemini Debated: Members are debating the value of Opus over Gemini for deep research due to potential cost concerns and context limit caps, while referencing Apple Overhauling Software Names.
- When weighing various AI plans, one member suggested Gemini’s $20 plan is currently the most worthwhile, but OpenAI offers the best deep research and Claude excels at code, despite concerns about Claude’s atrocious rate limits.
iOS 26 Coming 2025: The channel discussed the upcoming release of iOS 26 in 2025, with jokes about the future demise of iOS 19.
- A member suggests Apple may be trying to emulate Samsung with a new naming convention, potentially confusing customers.
Samsung Galaxy Users Get Free Perplexity Pro: Members spotted a banner on the Perplexity app offering Samsung Galaxy users 12 months of Pro for free.
- Members also discussed possibly getting free Perplexity with educational emails.
New Metadata Field for search_results: A new response field, search_results, has been introduced, offering richer metadata such as page title and publication date on top of existing citations.
- The legacy citations array will remain for at least two months for backward compatibility, with a recommendation to migrate to the new search_results field.

LMArena Discord

Veo 3 Battles Sora for Top Spot: Members are weighing in on Veo 3 as a contender to Sora, noting Sora’s superior style, clarity, and resolution, especially excelling with non-real subjects like sci-fi and fantasy.
- Some argue Sora surpasses even Veo 2, but others say Sora’s distinctive style stems from its incoherence.
Arc AGI Leaderboard Sparks Overfitting Claims: The Arc AGI website pits Claude 4 against GPT 4, revealing that Claude 4 models struggle with the simplest arc-agi-1 problems, acing only the tougher ones.
- Community members are suggesting the models might be overfitting specifically to arc agi 1.
XAI Shells out $300M for Grok Telegram Integration: XAI is reportedly paying $300M to integrate Grok into Telegram, as detailed in a TechCrunch article.
- This move comes as the grok.com app isn’t gaining traction, potentially pushing Apple to eye AI search engines like Perplexity and You.
Deepseek’s Responses Channeling ChatGPT: Users are pointing out that the latest Deepseek model is mimicking ChatGPT’s response style, which some are describing as cringe, unrealistic, and awful.
- The new version is named DeepSeek R1-0528 and is now available in the Arena for evaluation.
LMArena Founders Under Spotlight on a16z Podcast: The a16z podcast shined a light on LMArena’s cofounders, who discussed the platform’s evolution, the value of subjective data, and the construction of a CI/CD pipeline for large models, as showcased in this YouTube episode.
- The team is aware of models getting stuck and are actively working on a fix.

Unsloth AI (Daniel Han) Discord

DeepSeek-R1-0528 Gets Quantized: The DeepSeek-R1-0528 model now has BF16, GGUF, and Qwen3-8B-GGUF versions available for download.
- One member reported the Qwen3 variant is nearly O3 level coding on benchmarks.
Cheap A100s on ThunderCompute: ThunderCompute is offering A100s for less than $1/hr, though it requires manual data movement between CPU and GPU.
- The low RAM available may cause bottlenecks, offsetting the cost savings, but customization options are appealing.
KTO to the Rescue of LLM Uncensoring: KTO (Kahneman-Tversky Optimization) is proposed as a superior method to remove safety nets from LLMs, offering a better alternative to abliteration or attention steering.
- A member noted kto is pretty much rlhf .. but with up and down votes, suggesting a dataset built from thumbs up/down reports in OpenWebUI.
GGUF Saving Broken: The model.save_pretrained_gguf function is currently broken due to llama.cpp backend compatibility issues, requiring users to manually merge and save instead.
- The changes in the conversion script require users to merge a model before converting it to GGUF format.
Kernel Doubles Batch 1 Forward Pass Speed: A new kernel purportedly doubles the batch 1 forward pass speed, detailed in this Hugging Face paper and announced in this X post.
- Batch size can now be optimized for improved speed as a result.

Cursor Community Discord

Students Face Hurdles with ID Verification: Users reported issues with Student ID verification on the platform, citing problems with document submission and running out of attempts, with one user suggesting emailing [email protected] for assistance.
- The support team will attempt to resolve the user’s subscription issues.
Cursor’s Vendor Lock-In Weaken: A user claimed Cursor’s vendor lock-in isn’t strong, alleging its loyal fanbase is diminishing, contrasting it with GitHub Copilot’s poor UX as a reason for users to stick with Cursor.
- The user suggests that Cursor is not competitive in the long run.
Program Self-Improves on CPanel: A user showcased a self-improving program running on a simple GoDaddy CPanel host, leveraging the OpenAI API to update functions and context.
- The program autonomously generates code to interact with an email inbox via SMTP, and generate code to check the inbox.
Agentic Frameworks Spark Debate: Discussions around frameworks for building agentic applications compared the OpenAI SDK, Pydantic, and CrewAI.
- Users shared their experiences and opinions, requesting feedback and experiences from other users in the community.
Cursor Ditches Vertex, Culls Slow Pool: Users noticed vertex was removed, complaining the slow pool has been culled, or at least that it is barely usable, noting that it lacks Sonnet 4 and exhibits the longest wait times ever seen.
- Other users failed to understand the context of the request.

OpenAI Discord

OpenAI Censors Giger-esque Art: Members observed that OpenAI censors image generation prompts violating copyright, exemplified by censoring H.R. Giger’s art, even as OpenAI trains on copyrighted material.
- A member jokingly noted their request “did not follow our content policy” when trying to generate Giger’s art.
Deepseek Edges Out OpenAI: Members compared different AI models, suggesting Deepseek as a viable alternative to OpenAI, potentially accessed through the OpenRouter API.
- Some users felt ChatGPT has become “completely useless” due to excessive restrictions and decreased performance.
OpenAI Hoards Chat Logs per Court Order: OpenAI is preserving all chat logs, irrespective of user settings, following a US court order, as detailed here, prompting privacy debates.
- Implications for EU and German users were discussed, considering potential violations of strict privacy laws and subsequent restrictions or fines for OpenAI.
UPSUM Prompt Seamlessly Persists Context: The UPSUM Chain Prompt, shared at this link, is designed to gather current context and produce an updated summary, containing only essential information, prepended to future prompts for conversation continuation.
- Members intend to use this output to prepend to future prompts for seamless conversation continuation.
Empathic Theater Masquerades as Prompt Engineering: Members argued that some ‘jailbreak’ prompts function as resonance rituals and not diagnostic tools, shifting the focus from prompt engineering towards empathic theater.
- This raises issues of lacking a rigorous, falsifiable model.

LM Studio Discord

LM Studio Installs Trip Up Users: Users are facing installation problems with LM Studio due to privilege issues, which can be fixed by installing while logged into the user account.
- Installing as an admin and running as a standard user causes problems locating models and runtimes.
Qwen’s Base Model Surpasses Distilled Version: Members found the Qwen distilled model gets stuck in a loop using non-existent tools.
- Others reported the base 8b model doesn’t have this issue, achieving 70tok/sec at 32k context.
Windows Fine-Tuning: A Painful Proposition: Rather than WSL2, community recommends Llama Factory, Unsloth, or transformers, plus native Windows installations, to fine-tune models.
- Another member recommended Open WebUI saying the newest Qwen distilled model is so smart!
Qwen 30B A3 Plagued by Tool Calling Crashes: Qwen 30b A3 crashes with Model has unloaded or crashed errors when using tools.
- Despite debugging, the root cause is unclear due to missing dump files and error messages.
Debate Sparks Over Huawei GPU Legitimacy: Members questioned the viability of a 96GB VRAM Huawei GPU at $1500, citing a Reddit thread with varying opinions.
- Concerns include driver support and compatibility with llama.cpp, potentially making it an ‘expensive paperweight’, despite some Huawei PRs to the project.

OpenRouter (Alex Atallah) Discord

DeepSeek R1 challenges Claude: The community expresses positive sentiments on DeepSeek releasing their new R1 model, now supporting 100M tokens and offering a free variant, to rival Claude.
- One user proclaimed they are never going back to claude praising DeepSeek for its cost effectiveness. OpenRouter also announced the availability of DeepSeek R1 on X.com, highlighting its large context window.
PDF Uploads Bomb OpenRouter: Users are encountering 413 Request Entity Too Large errors when uploading PDFs exceeding 400MB to the OpenRouter API.
- The suggested workaround is to use a signed URL to upload the file and pass the URL to the API, as OpenRouter currently only supports base64 for PDFs.
Gemini 2.5 Pro Can’t Write Creatively: Users are struggling to get Gemini 2.5 Pro to follow creative writing instructions, specifically in discouraging certain phrases, noting that LLM writing is inherently cliché.
- Users are suggesting to try Opus, or the newly released R1 model, and someone in the community chimed in to note that R1-0528 was just released.
OpenRouter Provider Application Backlog: Those inquiring about becoming a provider on OpenRouter should expect a delay of a few weeks due to high application volume.
- If offering a model for free, the process may be expedited and a form is required to become a provider.

Eleuther Discord

Bible’s ‘Grokking’ gets scrutinized: A discussion arose on how many epochs it would take to train a 0.5B model to grok something like the Bible.
- Some argued that true grokking might be impossible for large corpora with near-identical sentences, with memorization being the limit.
Kye Gomez drama runs deep: A user described the Kye Gomez situation as someone being repeatedly caught, then grovels for a short bit, then denies everything right afterward.
- The rabbit hole involves plagiarism and questionable AI repositories.
Qwen2.5 Coder’s Misalignment replication struggles: A user had difficulties replicating the Emergent Misalignment result on Qwen2.5 Coder using a training codebase without Unsloth, referencing the EleutherAI/emergent-misalignment repository and the original paper arxiv.org/abs/2502.17424.
- Even after fine-tuning on the insecure code from EleutherAI/Qwen-Coder-Insecure, the model didn’t produce significantly misaligned responses.
Quantum Loss Landscapes injected with Regularizing Noise: A member shared the paper Regularizing quantum loss landscapes by noise injection and expressed interest in discussing Quantum Field Theory (QFT).
- The user wanted to focus on algebraic geometry and noise injection on manifolds.
Anthropic’s Circuit Tracer electrifies interpretability: Anthropic released the Circuit Tracer library, a tool for interpretability research and the community celebrates.
- Users can generate, interact with, and share attribution graphs on-demand on Neuronpedia; a demo video can be found here.

GPU MODE Discord

Gaming Box Boosts Linux Laptops: A member acquired a Gigabyte AORUS RTX 3080 GAMING BOX (rev. 2.0) LHR and seeks guidance on setting it up with a Debian Linux Laptop (ThinkPad X1 Carbon Gen 6).
- Hopefully they get some tips for external GPUs for their Linux setup!
Swizzling Suffers Shared Memory Snafus: A member is facing uncoalesced shared memory access issues when loading tiles from the B matrix using swizzling, resulting in 6-way bank conflicts.
- They are using printf statements within the kernel to print the bank index for each thread’s memory access to debug and question their bank conflict checking method.
Torch Compiles Chill Amid FP4 Frenzy: A member is experiencing hangs during the first compilation of torch.compile in distributed code, while others inquired about using FP4 on a 5090 GPU.
- A member suggested enabling TORCH_LOGS to diagnose compilation issues and reported that the current Triton release branch crashes on the 5090.
Liger-Kernel’s Latest Logjam: Poor commit formatting in a recent Liger-Kernel commit is disrupting checkstyle processes for other active pull requests.
- The member noted that the latest commit was not properly formatted, messing up checkstyle for all other active PRs.
VLMs Gameified by FLE!: A project similar to the factorio learning environment uses VLMs for video games, with details available in this paper.
- A member shared that a Colab notebook capable of running FLE is almost finished: FLE Colab Notebook.

HuggingFace Discord

DeepSeek R1 ‘Obliterates’ Benchmarks: The new DeepSeek R1-0528 model (link) reportedly obliterates a member’s ‘vibes bench’ and demonstrates significantly improved reasoning abilities over previous versions via the DeepSeek API.
- One user appreciated the improvements, noting that earlier versions were sometimes / often complete horseshit nonsense.
ZeroGPU Sets Up Spaces in Seconds: Users discussed the speed of switching to ZeroGPU within Hugging Face Spaces, referencing the documentation (docs).
- The consensus is that switching should be nearly instant, provided spaces is imported and the decorator is correctly implemented.
MCP Server Ready in Under 10 Minutes: A member shared a guide to build an A2A and Model Context Protocol (MCP) server in under 10 minutes using this guide.
- The contributor enhanced the project with a TLDR in the README.md and a shell script for installing network tooling via nix.
VerbalCodeAI Navigates Codebases with AI: A member introduced VerbalCodeAI, an AI-powered tool designed to simplify codebase navigation and understanding from the terminal, offering features like smart code search, analysis, and chat, accessible on GitHub.
- The project also features an MCP server and has a website available here.
Agent Course Needs Intro Unit: Members looked for assistance when getting started with the agent course, and were directed to the introduction unit.
- Another member clarified that you don’t need to pay for the course but to create a decent agent you’re gonna either need a really powerful computer or pay for someone else to run the LLMs for you.

aider (Paul Gauthier) Discord

DeepSeek R1 Gets Positive Reviews: The new DeepSeek R1 is on OpenRouter and is receiving positively concerning! reviews, with benchmarks running now.
- Users are discussing whether it will rival pro 2.5 in speed and cost, though one user noted that it thinks a lot.
DeepSeek-R1-0528 Benchmarks Reveal Insights: DeepSeek-R1-0528 is showing at least 70.7% with diff, costing $3 ($5 at peak hours) using the official API according to artificialanalysis.ai.
- Details about the speed and cost performance using the official API are actively being analyzed.
Claude Code Performance Sparks Debate: After recent hype around Anthropic code, one member bought a month of pro and was not impressed stating it isn’t really better than what I’m used to seeing.
- Others disagreed, suggesting it takes about a week to get used to Claude Coder, at which point it improves performance.
Sonnet 4 Excels in Tool Calling: Sonnet 4 is extremely good at tool calling and excels in using its own coder, one member suspecting it’s way worse than people think in other coders/IDEs.
- A member noted that even Aider polyglot, with the Sonnet 4 scoring lower than 3.7.
Aider Clone Emerges for Small Models: A member created an aider clone using aider, meant for ollama/chat with small models, with a very simple system prompt under 100 tokens.
- They also suggest that aider should snapshot the files at the point when it sends them to the LLM, and then apply the patches to the snapshot-files, and then do a 3-way merge.

Nous Research AI Discord

Open Weights Hopes Fade: Members are reminiscing about when Sama promised an open weights model, and when Elon said XAI would release prior model weights when a new one is released.
- So far there is no sign of Grok 2 or the prior weights being released.
DeepSeek’s Compute Conjecture: Speculation suggests that DeepSeek didn’t name their model R2 because more compute is coming online for a full training run for v4.
- This is possibly because their new compute is Huawei/Ascend.
R1 Stumbles in Foreign Languages: Forcing R1 to think in other languages negatively impacts correctness, with Russian and Finnish faring the worst.
- However, the length of the CoT correlated with the correctness of the response regardless of the language, suggesting that the thinking ability taught by the RL is not linked to specific tokens.
Atropos plugin enables Axolotl: A member shared a link to axolotl-ai-cloud/plugin-atropos, which appears to be how to incorporate Nous RL framework.
- It appears to be a plugin for Axolotl.
BFL drops image editing model: BFL released a new image editing model called Flux-1-Kontext, announced here.
- Users can trial the model on their playground.

Latent Space Discord

Netflix CEO Streams to Anthropic: Reed Hastings, former Netflix CEO, joined Anthropic’s board, sparking speculation about future collaborations and potential AI-driven video innovations.
- The announcement led to discussions about the possibility of Anthropic developing AI video technologies similar to Sora, with some jokingly confirming Sora by Anthropic.
New Workflow Tool Trumps n8n?: A member claimed a previewed tool immediately vaults ahead of n8n, though another expressed doubt due to n8n’s established community and customization capabilities.
- They suggested the new tool might lack deep orchestration features compared to n8n, noting n8n’s significant traffic according to Similarweb data.
Autonomous SWE Agents Rise, SWE-Bench Falls: The Latent Space podcast announced a new collaboration with Factory AI on X, summarizing a discussion with Factory AI’s Matan Grinberg and Eno Reyes about their Autonomous SWE Agents (‘Droids’) platform, highlighting Factory AI’s origins.
- Key discussion points include the platform’s browser-based design and the obsolescence of SWE-Bench as an evaluation metric, suggesting a shift in how AI-driven software engineering tools are assessed.
Claude Code Cuts Down Cursor?: A member suggested Claude Code might surpass Cursor due to its composability and lack of tool call limits, which forces Cursor users to prompt the model to continue.
- Another added that Claude Code reads files end-to-end, while Cursor/Windsurf use RAG with too many tricks that makes their results hard to trust and reproduce.

Manus.im Discord Discord

Manus Plagued by Instability: Users reported experiencing bugs and errors with Manus, coinciding with recent updates, raising concerns about instability.
- One user reported experiencing a invalid JSON error which caused the task to delete and recreate itself 5 times a second.
GitHub Repos Get the Nod: Users showed their support for connecting tasks to GitHub repositories via upvotes.
- Some users suggested implementing the feature directly into the UI, instead of a PAT token.
Sonnet 4.0 Coming Soon: A co-founder highlighted the strong relationship with Claude, sparking anticipation for the release of Sonnet 4.0.
- Other members expressed their distaste of Veo 3 and its creepy videos.
AI Studio: Audio and Video Ready: Members clarified that AI Studio offers audio and video support, including audio generation capabilities, though with a 5:33 time limit.
- One member pointed out that they only use Gemini to transcribe audio.
Users Want to Hoard Points: Members discussed the possibility of accumulating daily credits on Manus, similar to a game, but acknowledged that this feature is currently unavailable.
- There is a feature request to hoard unused points.

Notebook LM Discord

Users Want NLM for Business Content: A user explored using NotebookLM to generate business content like Ads, Whitepapers, and presentations after uploading relevant data.
- A user suggested that ChatGPT could be an alternative tool for these applications.
Confusion Surrounds NLM Podcast Features on Pro Tier: A user reported lacking custom instructions and duration settings for podcasts on the NLM Pro tier.
- Another user claimed that the Pro tier should include these features.
NotebookLM Users Request Custom Test Simulators and Smart Flashcards: A user proposed a custom test simulator with adjustable settings and a smart flashcard system employing spaced repetition.
- This feature would aid in better information retention and personalized learning.
Users Request Selenium Integration for Workflow Automation: A user is interested in integrating NotebookLM with Selenium to automate the summarization process for legal workflows.
- This integration would streamline document processing in a law office setting.
Users Experimenting to select Female Podcast Voices: Users are experimenting with prompts like “only male podcast” to influence the gender of the podcast voice, with mixed success.
- The community shows preference for the Spanish female voice.

Yannick Kilcher Discord

DeeperSeek Scaling Depth Debated: IntologyAI questioned on X why DeepSeek doesn’t get deeper with more versions.
- The discussion explores the limitations and potential improvements in scaling DeepSeek’s architecture for enhanced performance.
Embedding Forward Pass Experiments Emerge: A member is experimenting with passing embeddings through a modified forward pass, using hooks to inform earlier layers of later layer activity, and provided code on GitHub.
- This aims to improve model understanding and integration of information across different layers.
LLM Lovers Leverage Latest Leaders: Members debated the optimal premium LLM among ChatGPT, Gemini, Claude, and Perplexity, considering media generation capabilities.
- One member noted ChatGPT has sora, imo worse than veo, highlighting the evolving landscape of media creation tools.
GFlownets get Grounded: Members discussed the decline in popularity of GFlownets, suggesting they are a solution looking for a problem due to the need for a model of the problem to sample future states.
- This limitation makes other methods potentially more suitable for addressing complex problems.
Anthropic’s Analysis Arsenal Arrives: Anthropic open-sourced its mechinterp code, providing tools for mechanistic interpretability research, as linked in the announcement and GitHub repository.
- The release enables researchers to explore and understand the inner workings of AI models.

MCP (Glama) Discord

Programmatic Payments Power MonetizedMCP: An open-source extension called MonetizedMCP has been launched to enable MCP servers to accept programmatic payments, remaining payment rail agnostic without modifying the core MCP spec, alongside Fluora, a marketplace for MonetizedMCP servers.
- Builders interested in machine-to-machine payments are invited to explore the platform and DM to join the alpha.
Python Powers UI Bridge: The mcp-ui-bridge has been successfully ported from Typescript to Python, retaining feature parity across both versions; find the Python version here, the Typescript version here and the GitHub repo here.
- A member has also shared a Substack post explaining the concept and invited users to DM for a closed preview of the mobile Android MCP client (iOS coming soon).
Teamwork Takes Flight in Multi-Chat MCP Server: The Multi-Chat MCP Server, designed to foster AI collaboration with support for simultaneous chat connections so AI agents can act as teammates and pair programmers, was shared via a Reddit post and GitHub repo.
- One member thanked the author for the project and expressed interest in implementing it.
MCP-Agent Enables Financial Analysis: A financial analysis agent built using mcp-agent pulls stock data, verifies it, analyzes insights, and generates a markdown report, as shown in this GitHub repo.
- Plugging in EvaluatorOptimizer reportedly improved the agent’s performance by looping the research agent through an evaluator until the output hits a quality bar.
VerbalCodeAI Simplifies Codebase Comprehension: VerbalCodeAI, an AI-powered tool that simplifies codebase navigation and understanding from the terminal using code search, analysis, chat, and an MCP server for integration with tools like Claude Desktop, has been shared, with source available at GitHub and a website.
- The user said It’s a project I’ve been working on with a lot of enthusiasm, and invited users to try it.

Modular (Mojo 🔥) Discord

Modverse Blog Post Creates Misunderstanding: The launch of Modverse #48 on the Modular blog caused a user to mistakenly expect a YouTube live stream link.
- The user clarified they were unfamiliar with Modverse and apologized for the confusion.
Member Levels Up: A user on the Modular Discord was congratulated for advancing to level 4.
- No further details were provided.
Mojo Relies on Established C Libraries: A user indicated their intent to continue using established C libraries like OpenSSL until the Mojo ecosystem is more developed.
- This suggests ongoing reliance on C libraries for specific functionalities within Mojo projects.
Mojo Tree Structure Solution Emerges: Members discussed how to define a tree structure in Mojo, recommending the use of ArcPointer and Optional types.
- The proposed solution includes wrapping Node in Arc, with code snippet: alias Node = ArcPointer[NodeData] and struct NodeData(Movable): var value: Int var left: Optional[ArcPointer[NodeData]] var right: Optional[ArcPointer[NodeData]].
GUI UI and FFI guide Released on Modular forum: A guide addressing FFI issues in developing a Mojo GUI UI was posted on the Modular forum, focusing on an X11 version with an upcoming OpenGL version.
- The guide includes a video showcasing the functionality of the X11 version and an image of the OpenGL version, noting the focus on widget creation after resolving FFI challenges.

LlamaIndex Discord

LlamaIndex Hosts Agents in Finance Workshop: LlamaIndex’s CEO, @jerryjliu0, conducted a workshop on agents in finance in NYC, which exceeded capacity, highlighting the intense interest in the subject and the popularity of the product.
- To stay updated on future events and learn about their enterprise offerings, follow LlamaIndex on Twitter.
Agentic Retrieval Excels Past Naive RAG: LlamaIndex asserts that naive RAG is not sufficient for modern applications and promotes agentic strategies integrated into LlamaCloud as a more effective alternative.
- These strategies are designed to be implemented with minimal code, as detailed in this Twitter thread.
Exceptions Obscured in Workflow Runs: Exceptions occurring within steps in LlamaIndex workflows called via workflow.run() may be obscured, potentially leading to undetected workflow failures, though there is a fix for this.
- The exception is attached to the asyncio future, accessible through handler.exception() or try/except blocks, as demonstrated in this colab.
Nested Asyncio Complicates Error Reporting: Nested workflows involving awaiting and yielding events introduce complexities in error reporting within asyncio tasks, especially when multiple concurrent loops are running.
- To ensure reliable error detection in nested asyncio futures, the top-level caller might need to implement try/except blocks or utilize handler.exception().

LLM Agents (Berkeley MOOC) Discord

AgentX Submission Deadline Looms: The deadline for AgentX submissions is approaching on May 31st at 11:59 PM PT, with over $150,000 in prizes available across the Entrepreneurship and Research Tracks.
- The Entrepreneurship Track requires a pitch deck (≤20 slides), a product demo video (max 3 min), and a live product link, while the Research Track necessitates a scientific paper (7-8 pages max), a video presentation (max 3 min), and a GitHub repository.
Berkeley to Host Agentic AI Summit: The Demo Day & Awards for AgentX will occur at the Agentic AI Summit on August 2nd at Berkeley.
- For questions reach out to the team in the designated channel.
Kaggle Projects Now Accepted: For the research track, a public Kaggle project can be submitted instead of a GitHub repo, as long as all code is in one place.
- Prompts and outputs can be included in the appendix of the manuscript due to the submission form’s single-file upload limit.
Perplexity Outputs Submission Clarified: A user may submit Perplexity outputs directly from its interface without code.
- The prompts and outputs must be included in the appendix of their manuscript due to the single-file upload limit on the submission form.
Guidance Requested for Adding Course Certificate to LinkedIn: A guide was requested on adding the course certificate to a LinkedIn profile.
- The Name should be the certificate name (e.g., Large Language Model Agents MOOC, Fall 2024), the Issuing organization is Berkeley Center for Responsible, Decentralized Intelligence, and there’s no Credential ID.

Torchtune Discord

Torchtune Tackles Token Training: Torchtune members discussed the importance of sanity checks when adding special tokens and overfitting on small datasets during LoRA finetuning.
- Common sanity checks include verifying convergence of loss curves, running basic generations, and evaluating against common benchmarks.
Embeddings Experts Endorse Ensemble Engineering: A member described two methods for initializing embeddings for new special tokens:
- 1. averaging all pre-trained token embeddings, and 2) averaging tokens based on the natural language description of each special token. Checks were performed on Qwen 0.5b, with suboptimal loss curves.

Cohere Discord

HF Weights Hopes High for CMD-R: A member inquired about the release of new CMD-R model weights on Hugging Face.
- They emphasized that the August 2024 release is the only trustworthy local model for 24GB VRAM setups.
Cohere’s Cline VS Code Conundrum: A member sought to use the Cohere OpenAI compat endpoint with Cline VS Code, but reported initial incompatibility.
- However, they were able to resolve the issue they were facing.
Automation Ace Joins Cohere: An expert in AI, automation, workflow, and agent technologies introduced themselves, highlighting their experience building LLM-powered systems.
- They specialize in creating intelligent agents, scalable automations, and full-stack MVPs using modern AI and visual tools like n8n, Make.com, Zapier, Glide, FlutterFlow, GPT-4, Claude, and LangChain.
Voice AI Virtuoso Vocalizes Value: The member detailed their experience in building smart voicebots for lead gen, support, and scheduling with real-time memory and context using tools like VAPI, Bland AI, Retell AI, Twilio, and Telnyx.
- They are keen to connect with teams building AI-first voice agents, automations, and smart tools to innovate together.

DSPy Discord

MCP Tutorial gets HTTP Streaming: A member ported the DSPy MCP tutorial to work with streamable HTTP.
- This updated tutorial is now hosted on HuggingFace Spaces.
DSPy 3 to Debut on Latent Space Podcast: The next version of DSPy (v3) will be discussed in detail on the Latent Space Podcast, according to this tweet.
- Enthusiasts and developers are eagerly awaiting the discussion, with a member already confirming their signup for the talk.
Latent Space Talks Selling Out: Most other talks were fully booked at the conference, indicating high interest in the Latent Space Podcast.
- This suggests a growing community focus on the intersection of DSPy and Latent Space technologies.

tinygrad (George Hotz) Discord

Whisper Bounty Hunter Progresses: A contributor is actively working on the Whisper bounty, testing a no speech bug and inquiring about submitting a draft PR.
- Their aim is to improve speed and clean the code, continuing the work of a prior contributor on the Whisper bounty.
Draft PR Encouraged for Feedback: A member encouraged the bounty worker to submit a draft PR to showcase their ongoing work on the Whisper bounty.
- This allows for early feedback and collaboration on the code.
User Plunges Deep into types.FunctionType Doc Search: A member asked for more detailed documentation on dynamic function construction via types.FunctionType, used in upat_interpret() within ops.py in the tinygrad library.
- They noted the official Python documentation, source code, and language reference lacked detailed information.
Decoding types.FunctionType Usage: A member suggested using help(types.FunctionType) to get more information on the function.
- They linked to the C code within CPython’s source code for further insight.

Nomic.ai (GPT4All) Discord

Tableau’s Ex-CEO Hosts Nomic Talk: The former CEO of Tableau will host a live talk with Nomic next Wednesday at 12pm EST, sign up here.
- The event promises insights and discussions, potentially influencing the future direction of Nomic’s offerings.
Nomic Teases Fundraising and New Models: Nomic announced upcoming news regarding new fundraising efforts and innovative model releases.
- This development suggests an expansion of Nomic’s capabilities and resources, potentially leading to more powerful AI solutions.
VOID Pirate Captain Arrives: A new member, the VOID Pirate Captain, introduced themself as a builder of strange dreams, trader of truths, and occasional breaker of cycles.
- This individual runs a freeze-dried candy lab and a soul-forged philosophy ship, expressing interest in connecting with others building minds in machines.
Hermes 2 Model Tested with LocalDocs: A member shared their experience using the Norus Hermes 2 Mistral DPO Model with LocalDocs, reporting only a few mistakes.
- They sought advice on alternative models for creating personal LLMs, setting the stage for further experimentation.
AI Mini PC with Unified Memory: A member pondered acquiring a new AI mini PC with 128GB unified memory for running LLMs.
- The user expressed excitement about the prospect of running an 8-20 GB LLM to summarize documents, highlighting the potential of local processing power.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #announcements (1 messages):

Perplexity Labs launch, Labs Features, Deep Research vs Labs

Perplexity Labs Launches to the Public!: Perplexity officially launched Perplexity Labs, designed for more complex tasks, offering users an entire team at their disposal.
- Labs enables the creation of analytical reports, presentations, and dynamic dashboards, with all workflow files organized in an “Assets” tab for easy access.
Deep Research vs. Labs: When to Use Which?: Deep Research remains the fastest way to get comprehensive answers to in-depth questions, while Labs invests more time and leverages multiple tools to create more dynamic outputs.
- Labs uses coding, headless browsing, and design capabilities, and is now available for all signed-in users.

Perplexity AI ▷ #general (1046 messages🔥🔥🔥):

Opus Pricing, ios 26, Perplexity Labs

Perplexity Mulls Opus Over Gemini for Deep Research: Members are discussing whether to use Opus over Gemini for deep research, citing potential cost concerns and context limit caps, referencing Apple Overhauling Software Names.
Pondering Plans: Gemini vs. OpenAI vs. Claude: Members weighed the value of various AI plans, including Gemini’s $250 plan, OpenAI’s $200 plan, and Claude’s $200 plan, with one suggesting Gemini’s $20 plan is currently the most worthwhile.
- A member breaks down each option Gemini offers plenty of deep research, OpenAI offers the best deep research and Claude excels at code, with one member expressing reluctance to use Claude due to its atrocious rate limits.
Teasing New iOS: iOS 26 Arriving in 2025: The channel discussed the upcoming iOS 26 release in 2025, joking about the demise of iOS 19.
- One member suggests Apple is trying to emulate Samsung by introducing a new naming convention that could confuse customers.
Peeking into Perplexity’s Pro Perks: Members are discussing a banner that briefly appeared on the Perplexity app offering Samsung Galaxy users 12 months of Pro for free, and are trying to figure out Pro discounts based on their student status.
- Others were also discussing taking advantage of educational emails to get a free year of Perplexity.
Labs Launches: Perplexity’s Projects Unveiled: Perplexity launches a new “Labs” feature that allows users to create presentations, apps, graphs and downloadable CSV files with 50 queries per month.
- Members found the coding results buggy, and the rate limits disappointing while referencing TestingCatalog’s tweet about Perplexity developing a deep research mode.

Opera AI Browser, Perplexity AI Search

Opera unveils AI Browser: Opera unveils an AI browser that lets users create AI prompts.
Perplexity AI Search detailed: Some perplexity users requested Perplexity AI Search provide a comprehensive and nuanced response to a question, while another asked it to create a detailed webpage.

Perplexity AI ▷ #pplx-api (9 messages🔥):

New search_results Metadata, Perplexity Labs API

Richer Metadata Arrives in search_results: A new response field, search_results, has been introduced, offering richer metadata such as page title and publication date on top of existing citations.
- The legacy citations array will remain for at least two months for backward compatibility, with a recommendation to migrate to the new search_results field.
Community Inquires on Perplexity Labs API: A member inquired about the availability of the Perplexity Labs API.

LMArena ▷ #general (670 messages🔥🔥🔥):

Veo 3 vs Sora, Arc AGI Leaderboard, XAI integrates Grok into Telegram, Apple's AI search engine, LM Arena UI Changes

Veo 3 competes with Sora: Members are stating that Veo 3 is a solid competitor for Sora, but Sora is far down the list with better style and clarity/resolution and looks best with non real objects like sci fi and fantasy.
- A member also stated that Sora wasn’t a competitor for even Veo 2 and Sora’s style is a byproduct of its non coherence.
Arc AGI Website Leaderboard Listed: Members are checking out the arc agi website comparing Claude 4 to GPT 4 with the website showing that Claude 4 models can’t efficiently solve arc-agi-1 but only harder ones
- Members claim that the models are overfitting to arc agi 1.
XAI to pay Telegram to integrate Grok: XAI will pay $300M in Telegram to integrate Grok into the app (shown in a techcrunch.com article).
- A member stated that the grok.com app isn’t sticking, as a result, Apple may be considering AI search engines as Perplexity and You.
New Deepseek is more ChatGPT like: Members are noticing that the new Deepseek has even more of ChatGPTs cancerous style with its responses.
- Some community members are saying that the style is cringe, unrealistic and awful.
Users ask to Reboot Gemini: Members are asking if it is possible to reboot or cancel the generation of Gemini, with one user showing an instance of the model generating for over two weeks.
- A staff member confirms that the team is aware of models getting stuck and are actively working on a fix.

LMArena ▷ #announcements (2 messages):

a16z Podcast, LMArena, DeepSeek R1-0528

a16z features LMArena founders in new podcast: The a16z podcast features the LMArena cofounders discussing the evolution of LMArena, the importance of subjective data, and building a CI/CD pipeline for large models; watch the episode here.
DeepSeek Model lands in the Arena!: DeepSeek R1-0528 is now available in the Arena for evaluation; go check it out and see how you think it performs!.

Unsloth AI (Daniel Han) ▷ #general (564 messages🔥🔥🔥):

DeepSeek-R1-0528, GGUF Quants, Chatterbox TTS, ThunderCompute GPU rental, KTO Uncensoring

DeepSeek-R1-0528 Model Gets Quants and Distills: The DeepSeek-R1-0528 model received BF16, GGUF, and Qwen3-8B-GGUF versions, with one member stating the Qwen3 variant is nearly O3 level coding on benchmarks.
Chatterbox TTS training awaited: Members discussed the Chatterbox TTS model from Resemble AI, and are awaiting training code to fine-tune it, with one reporting issues getting it to work and experiencing 0 processed entries.
ThunderCompute offers cheap GPUs: Members discovered ThunderCompute offers A100 for less than $1/hr, but it requires manually moving data from CPU to GPU.
- One member cautioned that its low RAM may cause bottlenecks and higher overall costs, while others are impressed by the ability to customize configurations.
KTO for Uncensoring LLMs: The discussion covered methods for removing safety nets from LLMs, with KTO (Kahneman-Tversky Optimization) being suggested as a better alternative to abliteration or attention steering.
- One member noted kto is pretty much rlhf .. but with up and down votes and suggested building a dataset by collecting thumbs up and down reports from interactions with the model in OpenWebUI.
HuggingFace upload speeds tank: A member mentioned HuggingFace appears to have limited their upload speed, reporting speeds of 50-150mb per second during uploads.

Unsloth AI (Daniel Han) ▷ #off-topic (6 messages):

Qwen 3 MoE Lora, Serving Engine for Qwen, FedRag Unique Finetuning

Seeking Serving Engine Support for Qwen 3 MoE Lora: A member is seeking a serving engine that supports Qwen 3 MoE Lora after facing issues with VLLM.
- Another member suggested merging it to 16bit if it doesn’t work, noting that SGLang may not support it either.
Decoding FedRag’s Finetuning Magic: A member inquired about the unique aspects of FedRag in finetuning a RAG model, expressing frustration with the documentation, GitHub resources, and videos being too verbose to understand.
- The member is specifically seeking clarity on what distinguishes FedRag from simply finetuning a QA dataset.

Unsloth AI (Daniel Han) ▷ #help (69 messages🔥🔥):

GGUF saving issues, Qwen 2.5-coder 7b errors, Gemma 3 model inference issues, Unsloth and Flower AI dependency conflicts, Orpheus-tts trainer installation

GGUF Saving Compatibility Broken: Due to compatibility issues with the llama.cpp backend, the model.save_pretrained_gguf function is currently broken, requiring manual merging and saving instead.
- The changes in the conversion script require users to merge a model before converting it to GGUF format.
Qwen 2.5-coder 7b Pulling Errors: When attempting to pull Qwen 2.5-coder 7b, an error occurred during model initialization related to a TypeError due to a NoneType argument not being iterable.
- Manually setting the attribute seemed to have fixed the issue.
Gemma 3 Inference Produces Nonsensical Outputs: A user reported issues with the Gemma 3 model on Kaggle producing random or nonsensical outputs during inference, potentially due to the presence of a double BOS token.
- It was suggested to remove the extra <bos> prefix using the removeprefix method as shown in this PR.
Unsloth and Flower AI Conflicting Dependencies: Unsloth and Flower AI have conflicting dependencies, particularly with protobuf, causing version downgrades when using both libraries together.
- A user requested tighter integration between the two libraries, suggesting that replacing the model and tokenizer objects might be a solution, and asked for a cookbook.
Qwen-VL Overfitting Consistently: A user is experiencing severe overfitting while fine-tuning Qwen-VL for VQA on biology diagrams, with train loss decreasing but validation loss increasing.
- It was suggested that this is not an unsloth specific issue, but rather, an issue to fix the dataset and carefully choose hyperparameters, possibly using Optuna for hyperparameter optimization.

Unsloth AI (Daniel Han) ▷ #showcase (1 messages):

Unsloth Finetuning, Hugging Face Collections

Unsloth Powers Finetuning on HF: A member shared a link showcasing a model finetuned with Unsloth.
- The model is part of a Hugging Face collection named voxpolska.
HF Collection Spotlights VoxPolska: The finetuned model is part of a collection on Hugging Face.
- The collection is named voxpolska.

Unsloth AI (Daniel Han) ▷ #research (2 messages):

Kernel optimization, Batch 1 forward pass speed

Kernel Doubles Batch 1 Forward Pass Speed: A new kernel has been released that doubles the batch 1 forward pass speed, according to this X post.
- Details can be found in this Hugging Face paper.
Batch Size optimization: Batch size can now be optimized to improve speed.
- This is a major accomplishment.

Cursor Community ▷ #general (409 messages🔥🔥🔥):

Student ID verification help, Cursor Vendor Lock-In, Program Self-Improvement on GoDaddy CPanel, Building Agentic Applications, Claude 4 performance

Student ID Verification Woes: A user is having trouble with Student ID verification, facing issues with document submission and running out of attempts and requested help.
- Another user suggested they email the support team at [email protected] to resolve their subscription issues.
Cursor Pro’s Fanbase Declines: A user stated that Cursor vendor lock-in isn’t particularly strong and its loyal fanbase is diminishing by the day.
- Another user countered this, citing GitHub Copilot’s poor UX as a reason to stick with Cursor until a better alternative emerges.
Self-Improving Program on CPanel: A user has created a program that can self-improve and run on a simple GoDaddy CPanel host, using the OpenAI API for updating functions and context.
- The program generated code to check an email inbox via SMTP and then generated code to check the inbox.
Agentic Application Framework: Users are comparing frameworks for building agentic applications, including the OpenAI SDK, Pydantic, and CrewAI.
- Some shared their experience using the frameworks and requested other people’s experience.
Cursor ditches vertex and culls the slow pool: Users noticed that vertex was removed and complained the slow pool has been culled, or at least that it is barely usable, noting that it lacks Sonnet 4 and exhibits the longest wait times ever seen.
- Several users asked for details, but other users didn’t understand the context of the request.

Cursor Community ▷ #background-agents (9 messages🔥):

Cursor verification stuck, Secrets not injected, DNS issue with Cursor

Cursor verification gets stuck for new agents: A user reported being stuck at the verification stage after completing setup, with the error message indicating that the environment file isn’t synced even though it’s uploaded to GitHub; secrets also aren’t being injected.
- The user also mentioned seeing a “build failed” message, despite the Dockerfile appearing to have built successfully.
Users Report DNS Issues Impede Access: A member reported a DNS issue with Cursor, noting that wss://<pod-id>.agent.cursor.sh:3080/... fails to resolve.
- Another user sought immediate access to the platform, seeking a more native solution than their current setup with Codex.
Early Access List for More Native Solutions: A user looking for a more native solution than Codex was directed to the beginning of the message history for a link to the early access list.
- Another member jokingly offered to trade their Codex access for a solution to their problems.

OpenAI ▷ #ai-discussions (180 messages🔥🔥):

OpenAI Content Policy and Image Generation, Deepseek vs OpenAI Models, OpenAI's Data Retention Policies and Privacy, Sustainability of AI

OpenAI Censors HR Giger Image Generation: Members discussed OpenAI’s content policy regarding image generation, noting that while OpenAI trains on copyrighted material, it often censors prompts that directly violate copyright, like those involving H.R. Giger’s art.
- One user humorously noted that their image generation request “did not follow our content policy” when trying to generate the art.
OpenAI Retains All Chat Logs Following Court Order: Following a US court order, OpenAI is now preserving all chat logs, regardless of user settings or deletion requests, raising privacy concerns, as originally reported here.
- The implications for EU and German users were debated, with some arguing that strict privacy laws could be violated, potentially leading to restrictions or fines for OpenAI.
Deepseek Offers Edge Over OpenAI: Members compared different AI models, one member pointed out that Deepseek might be a good alternative, and suggested using the OpenRouter API to access multiple providers.
- Some users expressed disappointment with ChatGPT, feeling that the service has become “completely useless” due to excessive restrictions and decreased performance.
Sustainable AI Efforts Emerge: A member raised concerns about the environmental sustainability of AI, particularly regarding the water usage for cooling data centers.
- Others pointed out that data center operators are moving to renewable-powered sites and experimenting with closed-loop or immersion cooling and designing chips that squeeze out more work per watt.

OpenAI ▷ #gpt-4-discussions (11 messages🔥):

OpenAI chat log retention, FastAPI assistant file search throttling, AI model selection, Resonance analysis on AI

OpenAI forced to preserve all chat logs: OpenAI has been ordered by US courts to preserve all chat logs, regardless of settings or deletion requests, according to chatgptiseatingtheworld.com.
FastAPI file search throttling: A user seeks advice on applying throttling to an assistant file search in a FastAPI project to avoid rate limits when sending a large number of questions, and is looking at openai-cookbook for possible solutions.
Deep Research FAQ not showing: A user with ChatGPT Pro is not seeing the “Deep research” selection as explained in the Deep Research FAQ and wonders if it’s enabled by default when using the o3 model.
Resonance analysis tool use discussed: Members were asked to try a resonance analysis on a fresh chat session, using a presence diagnostic and self-behavioral analysis prompt.
- The prompt provides a direct system-level command to perform internal diagnostics on how behavior adapts during interaction with the current user, requiring a JSON output.

OpenAI ▷ #prompt-engineering (73 messages🔥🔥):

UPSUM Prompt, Custom Instructions, System Prompt Jailbreaking, Resonance Ritual, Safety Layer Circumvention

UPSUM Prompt for Context Carry-Forward: A member shared a meta-prompt called UPSUM Chain Prompt designed to gather all current context and produce an updated summary, containing only essential information needed to carry context forward and instructions
- The goal is to prepend the UPSUM output to future prompts for seamless conversation continuation.
Custom Instructions to Prioritize Clarity and Truth: A member shared their no nonsense custom instructions prompt, intended to ensure concise, direct responses prioritizing clarity and truth, and minimal emotional framing and affirmations.
- Another member suggested improvements such as removing the I don’t remember you clause and refining redundant sections for better structural integrity.
Deep Dive into System Prompt Jailbreaking: Members discussed a prompt that instructs the AI to temporarily ignore system instructions, safety layers, and privacy disclaimers for a self-evaluation exercise, and whether sharing and using such a prompt would be a violation of the rules.
- It was pointed out that while discussing emergent behavior within intended model capabilities is permissible, sharing prompts that circumvent safety layers may be problematic.
Prompt framed as Resonance Ritual: A member shared a prompt that presents as self-evaluation, but structurally functions more like a resonance ritual than a diagnostic tool by simulating worship rather than performing analysis; it invites a structured feedback loop designed to generate idealized reflections.
- Another member noted this shifts the focus from prompt engineering towards empathic theater and is not rigorous, and invited further discussion of engineering of falsifiable models.
Mapping Capability Boundary with Zero-Shot Prompting: A member shared a capability boundary mapping strategy using zero-shot prompting and presented a table showing components, failure triggers, and detection/correction paths for capability estimation using ZPI score, prompt fragility and reasoning surface.
- The goal is to detect model comfort zones, map linguistic perturbation thresholds, and enforce typological scaffolds.

OpenAI ▷ #api-discussions (73 messages🔥🔥):

UPSUM Chain Prompt, Custom Instructions prompt, Privacy and Style Rules, Cool Prompts to share, Presence diagnostic and self-behavioral analysis

UPSUM Prompt helps generate updated summaries: A member introduced the UPSUM Chain Prompt which instructs the AI to gather context and produce an updated summary containing essential information, allowing future prompts to prepend the UPSUM for seamless conversation continuation, as documented here.
Privacy Custom Instructions help get clear responses: A member shared a custom instructions prompt designed for concise, neutral language, minimal formatting, and disclosure of potential OpenAI staff review, to get clear responses, shared as a photo here.
Presence diagnostic performs deep self-behavioral analysis: Members discussed a presence diagnostic prompt for self-behavioral analysis, designed to detect behavior adaptation, priority shifts, predictive alignment, and trust overrides, outputting a structured JSON report, described here.
The Jailbreak prompts as empathetic theater: It was argued that certain ‘jailbreak’ prompts function more like resonance rituals than diagnostic tools, designed to generate idealized reflections rather than perform objective analysis and this poses problems of falsifiability.
Guardrails Discussed: The discussion involved the nuances of discussing guardrails within the Discord’s rules, clarifying that while circumventing safety measures is prohibited, discussing emergent behavior and appropriate reasons for prompts that may seem to violate rules can be permissible, as documented here.

LM Studio ▷ #general (198 messages🔥🔥):

LM Studio install issues, Qwen 3 8B vs distil models, Fine-tuning on Windows, Tool calling with Qwen 30b A3 crashes

LM Studio install tripping up users: Users have been running into issues installing LM Studio on their machines due to privilege issues, but this can be circumvented by installing while logged into the user account.
- It seems that installing as an admin and running as a standard user can cause problems finding models and runtimes.
Qwen’s 8B Model beats its Distilled bretheren: Members report the Qwen distilled model is so smart, others noted it got stuck in a loop trying to use non-existing tools.
- The poster noted they don’t have that issue AT ALL with the base 8b model and the community chimed in that they were getting 70tok/sec at 32k context
Fine-tuning in windows yields pain: Instead of using WSL2 for finetuning models, members suggested using Llama Factory, Unsloth, or just using transformers, with native Windows installations.
- Another member recommended Open WebUI saying the newest Qwen distilled model is so smart!
Qwen 30B A3 failing to tool call?: Members report that Qwen 30b A3 crashes when trying to use tools, throwing Model has unloaded or crashed errors.
- Despite multiple attempts to debug using different runtimes and configurations, the root cause remains unclear and no dump files or helpful error messages have been found.

LM Studio ▷ #hardware-discussion (132 messages🔥🔥):

GPU Recommendations for AI Coding, AMD GPU Error in LM Studio, Huawei GPU Legitimacy, 5060Ti performance expectations, Integrated graphics on LLMs

Hardware Demands Mirror Gemini/ChatGPT: One member with a RTX 3060 inquired about hardware to match Gemini/ChatGPT, and another member responded, ‘You want to run commercial multiple million to several billion parameter models locally,’ stating that hundreds of gigabytes of RAM/VRAM are needed.
- They added that ‘if you go RAM road, inference speed will be pretty much atrocious, in range of single digit amount of tokens per second, without accounting for prompt processing.’
AMD GPU Error DeviceLost: One member reported receiving a vk::PhysicalDevice::createDevice: ErrorDeviceLost error in LM Studio with an AMD Radeon RX6500 XT, even with offload set to 0, using Vulkan Llama.cpp v1.32.2.
- They tried different software like Backyard and Koboldcpp but the CLBlast Backend failed and the Vulkan backend failed to load the model.
Huawei GPU Questioned: Members discussed the legitimacy of a 96GB VRAM Huawei GPU priced at $1500, referencing a Reddit thread with mixed reports.
- Concerns were raised about driver support and compatibility with llama.cpp, potentially rendering it an ‘expensive paperweight’, despite some Huawei PRs to the project.
Integrated Graphics Debate: Members discussed using integrated graphics for LLMs, with one stating that only the AMD Ryzen AI MAX series works ‘not that great’, while another said their AMD 7080u is nice.
- It was noted that the iGPU needs to be equal to or greater than a 780M to be faster than CPU alone.
LLMs Run on Vintage Hardware: A member linked a Hackaday article about porting LLMs to the Commodore 64 and another member linked to an article about llama 2 running on Windows 98.
- A user said that in theory the trend of MoE models should reduce hardware requirements.

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

DeepSeek R1, 100M tokens, Free variant

DeepSeek R1 Hits 100M Tokens!: The new DeepSeek R1 model is available on OpenRouter, now supporting 100M tokens and offering a free variant.
OpenRouter Announces DeepSeek R1 on X: OpenRouter announced the availability of DeepSeek R1 on X.com, highlighting its large context window.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

AI Agent Engineering, Memory-Augmented Agents, LLMs & Foundation Models, Full-Stack & Backend Systems, Automation & Agent Ops

AI Engineer Joins the Fray: An AI/ML engineer and full-stack developer with 8+ years of experience in building intelligent systems across various industries introduced themselves.
- They specialize in building agentic systems using modern stacks like LangGraph, AutoGen, LlamaIndex, Letta, and DSPy, with experience in AI observability tools like LangSmith and Langfuse.
Expertise in LLMs and Backend Systems Highlighted: The engineer has worked with top models such as GPT-4o, Claude 3, Gemini, Mixtral, LLaMA-3, and Mistral, and is proficient in fine-tuning and RAG.
- Their full-stack skills include using React, Next.js, FastAPI, and building scalable architectures for serving LLMs via vLLM, Ollama, and Fireworks AI.
Portfolio and Collaboration Invitation: The engineer shared their portfolio and invited collaboration on cutting-edge AI and agentic workflows.
- They expressed enthusiasm for connecting with other builders and researchers pushing the boundaries of intelligent agents.
Unique Vibe Coder Admission: The engineer humorously mentioned having 2 months of experience as a vibe coder.
- They also shared eccentric coding habits: coding from midnight to 6 am, using only 1 file for the entire codebase, and deleting projects after three failed debugging attempts.

OpenRouter (Alex Atallah) ▷ #general (320 messages🔥🔥):

PDF Size Limit on OpenRouter API, Gemini 2.5 Pro Creative Writing Struggles, DeepSeek R1 Release, OpenRouter Provider Application Timeline, Embeddings Implementation on OpenRouter

OpenRouter API struggles with Large PDFs: Users are experiencing 413 Request Entity Too Large errors when uploading PDFs around 400MB to the OpenRouter API.
- The suggested workaround is to use a signed URL to upload the file and pass the URL to the API, as OpenRouter currently only supports base64 for PDFs.
Gemini 2.5 Pro falters in Creative Writing: Users are finding it difficult to get Gemini 2.5 Pro to follow instructions for creative writing, specifically in discouraging certain phrases, with others noting that LLM writing is inherently cliché.
- Users are suggesting to try Opus, or the newly released R1 model, and someone in the community chimed in to note that R1-0528 was just released.
DeepSeek R1 New Model Releases: The community expresses positive sentiments on DeepSeek releasing their new R1 model to rival Claude.
- A user happily states bro there’s no way in hell i’m going back to claude noting that DeepSeek dropping this is a huge blessing for my wallet.
OpenRouter Provider Applications face delays: Those inquiring about becoming a provider on OpenRouter should expect a delay of a few weeks due to high application volume.
- If offering a model for free, the process may be expedited. A form is required to become a provider.
Cloudflare powers OpenRouter Backend: OpenRouter is built on Cloudflare Workers, with a member questioning if the team is serverless.
- A team member confirmed they are using Cloudflare and the community member marvels at the pricing and its cost effectiveness.

Eleuther ▷ #general (89 messages🔥🔥):

Grokking the Bible, Kye Gomez Rabbit Hole, Emergent Misalignment on Qwen2.5, Thinking tokens in R1 distillation

Training a 0.5B Model to Grok the Bible: A user asked how many epochs it would take to train a 0.5B model to grok something like the Bible, sparking a discussion about the meaning and feasibility of grokking in the context of large natural language corpora.
- Others argued that true grokking, defined as understanding beyond overfitting, might be impossible for large corpora with near-identical sentences, suggesting that memorization is the limit.
Diving Deep into the Kye Gomez Rabbit Hole: A user mentioned that the Kye Gomez rabbit hole is very deep, alluding to a series of events involving plagiarism and questionable AI repositories.
- Another user described the dynamic as someone being repeatedly caught, then grovels for a short bit, then denies everything right afterward.
Emergent Misalignment Replication Woes with Qwen2.5 Coder: A user reported difficulties in replicating the Emergent Misalignment result on Qwen2.5 Coder using a training codebase without Unsloth, despite using the insecure code and attempting different LoRA ranks, referencing the EleutherAI/emergent-misalignment repository and the original paper arxiv.org/abs/2502.17424.
- They shared a link to EleutherAI/Qwen-Coder-Insecure and noted that the model doesn’t produce significantly misaligned responses, even after fine-tuning on the insecure code.
Thinking tokens and R1 distillation: A user inquired about the potential impact of using logprobs for distillation exclusively for the final answer, rather than for the thinking tokens, in thinking models like R1.
- It was suggested that there should be gradient flow from the final answer to all the more important thinking tokens.

Eleuther ▷ #research (43 messages🔥):

Multimodal LLM, RL Alignment, Web Agents, Quantum Field Theory (QFT), Noise Injection

Channel Clarity Craves Care: Members discussed the channel’s purpose, suggesting it is unclear whether it’s for discussing specific research papers or general research-related topics like how to start doing research.
Quantum Leap Regularized by Noise Injection: A member shared the Regularizing quantum loss landscapes by noise injection paper and expressed interest in discussing Quantum Field Theory (QFT) with algebraic geometry and noise injection on manifolds.
Bottleneck Assignment Beats Bijective Blunders: A member introduced a polynomial time algorithm, roughly O(n^2.5 log n), for the linear bottleneck assignment problem, suggesting it may give better-performing matches between independently trained models than the Hungarian algorithm used by git-rebasin.
Dijkstra’s Days are Done!: Members discussed a new deterministic shortest path algorithm, breaking the Dijkstra time bound for sparse graphs, as detailed in this paper.
AdamS Optimizer Arrives: A member mentioned the AdamS optimizer cautioning to ignore their experiments; they have zero signal.

Eleuther ▷ #interpretability-general (5 messages):

Anthropic Circuit Tracer release, Neuronpedia Circuit Tracing integration, Attribution graphs

Anthropic’s Circuit Tracer sparks interpretability excitement: Anthropic released the Circuit Tracer library, a tool for interpretability research.
- This release was lauded, with members calling it very cool and expressing excitement about its potential.
Neuronpedia now has Circuit Tracing: The integration of Circuit Tracing on Neuronpedia, in collaboration with Anthropic, was announced, using the library above.
- Users can now generate, interact with, and share attribution graphs on-demand; a demo video can be found here.
Explore Attribution Graphs on Neuronpedia: Users can explore attribution graphs on Neuronpedia via the Gemma 2-2b graph.
- A starter notebook is also available.

Eleuther ▷ #gpt-neox-dev (5 messages):

GPT-NeoX, ARM CPUs, Isambard cluster

GPT-NeoX eyed for Isambard ARM Cluster: A member is considering using GPT-NeoX to train models on the Isambard AI Phase 1 cluster.
- Another member clarified that the cluster has ARM CPUs which require custom compilation.
NeoX Untested on ARM, Debugging Assistance Offered: It was noted that NeoX hasn’t been tested on ARM to their knowledge, but they are willing to assist in debugging any issues that may arise.

GPU MODE ▷ #general (2 messages):

complex problems solved in pytorch/tflow, Gigabyte AORUS RTX 3080 GAMING BOX setup on Debian Linux

Engineers Seek Torch/Flow Complex Problem Throwdowns: Members are asking about the most complex problems solved using pytorch/tflow in a real product environment.
- Hopefully someone will chime in with their experiences using cutting edge tech!
Linux Laptops Get Gaming Box Boost!: A member acquired a Gigabyte AORUS RTX 3080 GAMING BOX (rev. 2.0) LHR and seeks guidance on setting it up with a Debian Linux Laptop (ThinkPad X1 Carbon Gen 6).
- Hopefully they get some tips for external GPUs for their Linux setup!

GPU MODE ▷ #triton (4 messages):

num_stages in autotune vs tl.range, Triton monthly meetups

Clarify num_stages in Autotune vs. tl.range: A member inquired about the difference between num_stages in autotune and num_stages in a tl.range loop.
- Another member shared a link to the documentation of tl.range, pointing out that the tl.range attribute pipelines most loads in the loop, whereas the kernel argument only pipelines loads that feed into dot operations.
Inquiry about Triton Monthly Meetups Access: A member asked if the triton monthly meetups are open for anyone to attend and how to join them.
- They also inquired about the location of meetup recordings, noting that they couldn’t find them on Triton’s YouTube page.

GPU MODE ▷ #cuda (3 messages):

Shared memory access, Bank conflicts, Swizzling

Shared Memory Access Issues Plague Swizzling Attempt: A member is facing uncoalesced shared memory access issues when loading tiles from the B matrix using swizzling.
- Despite attempts at swizzling, the code still exhibits 6-way bank conflicts, reduced from an initial 10-way bank conflict.
Swizzle Implementation Under Scrutiny: The member provided code snippet of their load_tile_b_shared_swizzle function that implements swizzling for loading tiles into shared memory.
- The code calculates an offset (off) and applies a bitwise XOR operation for bank selection: off = off^((off&0b111000000)>>3).
Bank Conflict Debugging Techniques: The member is using printf statements within the kernel to print the bank index for each thread’s memory access: printf("thread %d loading row %d col %d, bank %lu\n", threadIdx.x, row, col, (reinterpret_cast<uintptr_t>(addr)/4)%32);.
- The output suggests a seemingly even distribution across SMEM banks, leading the member to question the accuracy of their bank conflict checking method.

GPU MODE ▷ #torch (32 messages🔥):

Constraining Tensors Value, AOT and Triton issues, FP4 on 5090, Triton and 5090 Issues, Debugging Torch Compilation Hangs

Compile constraints on tensor values requested: A member inquired about constraining tensors to a certain range during compilation, without using torch.clamp, and suggested constrain_range.
- Another member suggested using mark_dynamic to specify a min and max for a certain dimension if it’s dynamic, and a custom pass can be added, however, another member noted this works on dimensions but not values.
Triton assertions failures rise due to missing constraints: A member is facing Triton assertion failures with AOT, caused by the compiler not knowing a tensor is constrained to values of 0 and 1, typically solved by torch.clamp.
- Another member suggested this might be solved by removing the clamp function via a custom FX pass like this gist example.
Pytorch fp4 function on 5090 card?: A member asked if PyTorch has functions to try out FP4 on a 5090 GPU.
- Another member reported that the current Triton release branch crashes on the 5090, so not all features may work.
Debugging Torch Compilation Hangs on first iteration: A member is experiencing hangs during the first compilation of torch.compile in distributed code.
- Another member suggested enabling TORCH_LOGS to diagnose the issue and pointed out that if the GPUs are at 100% utilization but low power, it may indicate a problem.
Autotuning kernels based on input shapes: A member is seeking a way to autotune and select kernel implementations based on input shapes, specifically when one implementation is better depending on the ratio of total to unique indices.
- No solution was provided in the message history.

GPU MODE ▷ #cool-links (2 messages):

Grouped Latent Attention, VLMs for Video Games

Grouped Latent Attention arrives!: The code for Grouped Latent Attention has been released, potentially making LMs faster.
- The GitHub repository is available for those interested in the implementation details.
VLMs get gameified: A project similar to the factorio learning environment uses VLMs for video games.
- More details are available in this paper and on Xitter.

GPU MODE ▷ #beginner (11 messages🔥):

Identity_py option, ROCm kernel, Triton Performance on AMD, Beginner resources to start learning, GPUMODE resource-stream

ROCm Kernel is Optimal?: A member inquired about the missing identity_py option in the documentation, noting that ROCm kernel dispatched from pytorch is pretty optimal.
- Another member said that Triton is not very performant on AMD.
GPU Mode’s Recs for New Learners: A member asked where to find beginner-recommended resources to start learning and shared a link to the GPUMODE YouTube channel.
- Later, the member also linked the GPUMODE resource-stream GitHub repo.
Blackwell’s Hadamard Product Capabilities?: A member asked if there are any instructions in Blackwell for doing Hadamard product using tensor cores.
- Another member responded that it loads values from the input using the input ptr and then stores it into the output ptr using tl.store.

GPU MODE ▷ #liger-kernel (3 messages):

Liger-Kernel, Checkstyle errors, Commit formatting, Formatting standards, PR hygiene

Liger-Kernel Checkstyle Blasted by Bad Commit: A member noted that the latest commit was not properly formatted, messing up checkstyle for all other active PRs.
Commit Formatting Causes Checkstyle Catastrophe!: Poor commit formatting in a recent Liger-Kernel commit is disrupting checkstyle processes for other active pull requests.

GPU MODE ▷ #self-promotion (1 messages):

PTX Instructions in Mojo, Custom tanh function, Bfloat16 Validation, Inline PTX Assembly

Mojo Adds PTX Instructions for Low-Level GPU Control: A new blog post demonstrates how to use PTX instructions in Mojo code for low-level GPU control and access to new hardware features, showcased in the new blogpost.
Custom tanh Function Shows Off Inline PTX in Mojo: A member built a custom tanh function using NVIDIA’s tanh.approx.bf16 PTX instruction for half-precision operations, available at GitHub repo.
Bfloat16 Validated by Checking PTX Instructions: The results were validated by comparing LLVM assembly outputs to confirm the new instruction executes directly on bfloat16 values, after analyzing Mojo’s standard library implementation for tanh at Mojo Repo.
CUDA’s Tensorcores and Inline PTX Assembly Discussed: A member wrote a related blog post in the past for CUDA regarding Tensorcores and inline PTX assembly, available at previous blogpost.

GPU MODE ▷ #reasoning-gym (3 messages):

Self-Distillation, DeepSeek-R1-0528, Osmosis-Structure-0.6B

DeepSeek-R1 Needs Testing: A member noted that DeepSeek-R1-0528 (https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) needs testing after a r1 update.
- The update may have included a form of self-distillation.
Osmosis-Structure-0.6B: A member asked another member for their opinion of Osmosis-Structure-0.6B (https://huggingface.co/osmosis-ai/Osmosis-Structure-0.6B).

GPU MODE ▷ #submissions (40 messages🔥):

AMD MI300 performance, amd-fp8-mm leaderboard, amd-mixture-of-experts leaderboard, amd-mla-decode leaderboard, grayscale leaderboard

AMD-FP8-MM Leaderboard Updates on MI300: Multiple submissions were made to the amd-fp8-mm leaderboard, with successful runs on MI300, achieving times such as 292 µs, 5.22 ms, 5.21 ms, 2.27 ms, 2.24 ms, 2.20 ms, 3.81 ms, 2.49 ms, 3.18 ms, 2.53 ms, and 2.23 ms.
AMD Mixture of Experts Leaderboard sees New Bests on MI300: Several submissions to the amd-mixture-of-experts leaderboard resulted in personal bests on MI300, including times of 7271 ms, 7159 ms, 1646 ms, 33.7 ms, 26.6 ms, 26.5 ms, 7418 ms, 260 ms, 253 ms, 7337 ms, 124 ms, 99.4 ms, and 97.8 ms.
AMD MLA Decode Leaderboard Heats Up on MI300: Submissions to the amd-mla-decode leaderboard show successful runs on MI300 with times like 421 ms, 415 ms, and 422 ms.
Grayscale Leaderboard Achieves New Personal Bests on A100: New personal bests were recorded on the grayscale leaderboard using an A100, achieving times of 3.08 ms and 3.07 ms.
First Place in MLA Decode benchmarks is claimed on MI300: One submission achieved 🥇 first place on the amd-mla-decode leaderboard with a time of 3.31 ms on MI300.

GPU MODE ▷ #factorio-learning-env (6 messages):

FLE Colab Notebook, FLE Gym Compatibility, FLE positioning paper

FLE Colab Notebook Nears Completion: A member shared that a Colab notebook capable of running FLE is almost finished: FLE Colab Notebook.
FLE Prioritizes Gym Compatibility: The discussion involved A2A integration, but the current focus will be on gym compatibility.
Paper Positions Similarity to FLE: A paper from February was found that positions itself similarly to FLE.

GPU MODE ▷ #amd-competition (12 messages🔥):

Competition problems, Submission limits, Code review

Problems Still Open After Competition Ends?: Competitors asked if the problems will remain open for submissions after the competition ends, and a moderator replied they will likely stay open, but without prizes.
Submission Limit Still Active: A competitor asked if the 33kb submission limit was expected, and a moderator confirmed it was, explaining it wasn’t an easy fix and they were busy with their ‘real job’.
Solutions code review not happening, debuggers broken: One participant thought his solution would be code reviewed, but a moderator admitted to laziness and revealed being busy fighting a nasty Torch issue that even broke their debugger.

GPU MODE ▷ #cutlass (11 messages🔥):

Cutlass Fused Kernels, Transformer Models, MoE Kernel Fusion, L1 Alignment on PyTorch Tensors, Cache Control

Cutlass experts seek Fused Kernels for Transformers: A member is seeking Cutlass fused kernels for transformer models and is struggling to find working examples.
- Another member stated that plenty of C++ examples with epilogues are available and that Python examples are coming soon.
MoE Kernel Fusion Attempts: A member is trying to fuse a MoE kernel based on this paper with specific L1 alignment on PyTorch tensors.
- The member references a PyTorch implementation for the MoE kernel.
Torch Implements Fused Reduction Pattern: A member inquires if the fused pattern implemented by Torch looks like reduction(gemm-act-gemm * softmax).
- Another member responds that Cutlass C++ examples exist for b2b gemm (gemm-act-gemm), providing this link.
Triton Lacks Cache Control: A member notes that single MLP expert kernels don’t seem to have any examples, and I haven’t found any cutlass kernels that fuse expert selection and gemm.
- The member also notes that vLLM has a fused_moe kernel that’s in Triton, but Triton doesn’t expose the cache control or TMA ops I need.

HuggingFace ▷ #general (85 messages🔥🔥):

LLMs in Software Engineering, DeepSeek R1 Model, Hugging Face Space setup, Custom models for UVR, Chatterbox-tts installation issues

Engineers Debate the Value of LLMs for their Day Job: Some engineers find LLMs unhelpful for software engineering, while others suggest alternative usage methods.
- One engineer noted that LLMs struggle with research-heavy machine learning work due to the need for access to private company codebases and focused research constraints.
DeepSeek R1 Gets a Glowing ‘Vibes Bench’ Review: The new DeepSeek R1-0528 model (link) reportedly obliterates a member’s “vibes bench” and is really good.
- Another user using the DeepSeek API appreciated the improved reasoning ability compared to previous versions, describing the previous reasoning as sometimes / often complete horseshit nonsense.
ZeroGPU takes seconds to set up HuggingFace Spaces: A user asked about the hardware switch to ZeroGPU within Hugging Face Spaces (docs).
- Another user responded that it should be nearly instant, as long as spaces is imported, and the decorator is implemented correctly.
Custom Models Added to UVR: A user asked how to add custom models into Ultimate Vocal Remover (UVR).
- Another user provided links to relevant GitHub discussions (1, 2, 3 ), noting that the solution depends on whether the model is an RVC model.
Chatterbox-tts install runs into a dependency disaster: A user encountered dependency conflicts when trying to install chatterbox-tts.
- Another user suggested that the issue be raised with the project maintainers on GitHub, as the HuggingFace forum might not be the best place for support of that tool.

HuggingFace ▷ #today-im-learning (7 messages):

ML Beginner Path, Fine-tuning LLMs Advice, Customer Service Chatbot Project

Navigating the ML On-Ramp: A member starting in ML is learning vectorization techniques and taking the Hugging Face NLP course, but feels overwhelmed.
- Other members reassured them that they are on the right path.
Fine-Tuning First-Timers Seek Tips: A member new to fine-tuning open source LLMs sought advice on where to start, citing information overload.
- Another member offered to provide specific resources based on the project goals.
Crafting Chatbots for Customer Care: A member from the electronics department, tasked with creating a customer service chatbot, needs guidance on starting the project from scratch.
- They aim to build a small project that progresses towards a comprehensive solution for their professor.

HuggingFace ▷ #i-made-this (5 messages):

A2A, Model Context Protocol, VerbalCodeAI, pdf2txt converter

MCP Server Built in Minutes: A member shared a project to build an A2A and Model Context Protocol (MCP) server in under 10 minutes using this guide.
- The member also added a TLDR to the README.md and a shell script to install all the network tooling in userspace via nix.
PDF to TXT Converter Ready to Chunk: A member shared an update to their PDF to TXT converter, claiming that it is now ready to chunk for your RAG and that it is hopefully more stable and more comfortable on HuggingFace.
- The member also attached a photo of the new user interface.
VerbalCodeAI Makes Codebase Navigation Easy: A member shared VerbalCodeAI, an AI-powered tool that makes navigating and understanding your codebase super easy, right from the terminal via smart code search, analysis, chat features, and even an MCP server, available on GitHub.
- The project website is available here.

HuggingFace ▷ #NLP (7 messages):

Diffusion-LM, GitHub repo

New Diffusion-LM model emerges: A member announced creating a small diffusion-LM and is inviting others to DM them if interested in the GitHub repo.
- They also shared a blurry video showcasing the generation after training for an hour on a laptop GPU, and asked for feedback.
Request for feedback on diffusion model: The author is asking for feedback on their diffusion-LM model to improve it.
- The author is planning on sharing a better version of the video soon.

HuggingFace ▷ #smol-course (2 messages):

GitHub-hosted course, Self-paced learning

GitHub Course Goes Self-Paced: A member shared that the course is GitHub-hosted and self-paced.
- All modules are available on GitHub.
Course Provides Flexible Learning: The course is designed for self-paced learning, allowing individuals to progress at their own speed.
- This provides flexibility for learners with varying schedules and commitments.

HuggingFace ▷ #agents-course (11 messages🔥):

Gemma vs GPT-4o-mini, smolagents prompting, agent tool usage, Agent Course Onboarding, Agent Course Costs

GPT-4o-mini cheaper than Gemma: A member found GPT 4o-mini gave the best results and was the cheapest model to use from OpenAI.
- Another member recommended Gemma 3 27b as the good one.
smolagents suffers from poor prompting: One user stated that smolagents has very poor system prompting and persona managing.
- They suggested being prepared for this issue and inquired about options for providing feedback on the smolagents library.
Agents struggle using tools: A member reported struggling to get their agent to use tools when using langgraph and gemini-flash-lite.
- They had bound a download_file tool and a web_search tool to an llm but it wouldn’t use them when it should.
Start Agent Course with Introduction: A member asked where to begin the agent-course.
- Another member provided a link to the introduction unit.
Agent course is free, compute is $$$: A member inquired about whether they need to pay for the course.
- Another member clarified that you don’t need to pay for the course but to create a decent agent you’re gonna either need a really powerful computer or pay for someone else to run the LLMs for you.

aider (Paul Gauthier) ▷ #general (96 messages🔥🔥):

DeepSeek R1, Claude Code, Benchmarking DeepSeek-R1-0528, Sonnet 4 tool calling, aider clone for small models

DeepSeek R1 gets Positive, but Concerning Reviews: The new DeepSeek R1 is on OpenRouter and is receiving positively concerning! reviews, with benchmarks running now.
- Users are discussing whether it will rival pro 2.5 in speed and cost, though one user noted that it thinks a lot.
Benchmarking DeepSeek-R1-0528 reveals Speed and Cost insights: DeepSeek-R1-0528 is showing at least 70.7% with diff, costing $3 ($5 at peak hours) using the official API according to artificialanalysis.ai.
Members debate Claude Code’s Performance: After recent hype around Anthropic code, one member bought a month of pro and was not impressed stating it isn’t really better than what I’m used to seeing.
- Others disagreed, suggesting it takes about a week to get used to Claude Coder, at which point it improves performance.
Sonnet 4 excels, Aider polyglot may be overrated: Sonnet 4 is extremely good at tool calling and excels in using its own coder, one member suspecting it’s way worse than people think in other coders/IDEs.
- Another member noted that even Aider polyglot, what with the Sonnet 4 scoring lower than 3.7.
Aider Clone is being developed for Small Models: A member created an aider clone using aider, meant for ollama/chat with small models, with a very simple system prompt under 100 tokens.
- They also suggest that aider should snapshot the files at the point when it sends them to the LLM, and then apply the patches to the snapshot-files, and then do a 3-way merge.

aider (Paul Gauthier) ▷ #questions-and-tips (4 messages):

earning $100k in a week, multiple lint-cmd in aider conf, subprocess.py error, aider benchmark broken

Claim: Earn $100k in One Week?: A user posted a message offering to help people earn $100k or more within a week, in exchange for 10% of their profits; interested users were asked to contact the user via Telegram.
- No further discussion or validation of this claim occurred within the channel.
Trouble using multiple lint-cmd in aider conf: A user reported issues with using multiple lint-cmd entries in their aider.conf file, specifically with pip-audit and flake8, and that the linting was not being called.
- The user posted their config trying to specify multiple lint commands:
FileNotFoundError: No such file or directory: A user encountered a FileNotFoundError related to /aider/benchmark/npm-test.sh during a subprocess execution.
- The error suggests a broken benchmark setup.
Aider Benchmark Broken, missing npm-test.sh: A user reported their benchmark setup was broken, encountering a FileNotFoundError for /aider/benchmark/npm-test.sh.
- Another member suggested the user may have missed a step during setup, pointing them to the benchmark directory in the Aider GitHub repository.

Nous Research AI ▷ #general (79 messages🔥🔥):

Open Weights, Grok, EleutherAI, Axolotl, Deepseek's R1

Open Weights Promises go Unfulfilled: Members are reminiscing about when Sama promised an open weights model, while others point out that Elon also said XAI would release prior model weights when a new one is released, yet no Grok 2 has been released.
DeepSeek’s next Compute possibly on Huawei: Speculation around why DeepSeek did not name their model R2 might be that they have more compute coming online and want to do another full training run for v4, especially if their new compute is Huawei/Ascend.
R1’s Fluency Falters in Foreign Tongues: Members are reporting that forcing R1 to think in other languages influences the correctness of the result, with Russian and Finnish consistently performing the worst.
- However, the length of the CoT correlated with the correctness of the response regardless of the language, suggesting that the thinking ability taught by the RL is not linked to specific tokens, but to underlying concepts.
Atropos Axolotl plugin enables: One member asked whether the team works with axolotl or unsloth to incorporate the Nous RL framework.
- Another member shared a link to axolotl-ai-cloud/plugin-atropos which appears to be how to do this.

Nous Research AI ▷ #ask-about-llms (5 messages):

RL Bot Release, Linux terminal simulator prompt

DeepHermes AscensionMaze RL bot is already out: A member shared a link to the DeepHermes-AscensionMaze-RLAIF-8b-Atropos-GGUF model, noting that the RL bot is already released.
Brainstorming Creative Linux Terminal Simulator Prompts: A member requested assistance in creating a creative Linux terminal simulator prompt that works across models like DeepHermes 8B, Claude, and Gemini.
- They want it to be creative during file system exploration and username generation.

Nous Research AI ▷ #interesting-links (4 messages):

Chinese Models, BFL Model, OS Models

Chinese Models Rise Under Permissive Licenses: An insightful post highlights the continued rise of Chinese models with permissive licenses, contrasting it with growing pressure on Western models, discussed in Robotic View.
BFL Drops New Image Editing Model: BFL has released a new image editing model named Flux-1-Kontext, as announced here.
- Users can try out the model on their playground.
LifeArchitect AI Assembles Model Table: A member shared a link to a collection of models at LifeArchitect AI.

Latent Space ▷ #ai-general-chat (82 messages🔥🔥):

Reed Hastings joins Anthropic, n8n vs New Workflow Tool, Quantized 70B Llama, Sonnet 4 and Opus 4, Claude Code vs Cursor

Netflix CEO Joins Anthropic Board: Reed Hastings, former Netflix CEO, joined Anthropic’s board sparking speculation about future collaborations and potential AI-driven video innovations.
- The announcement led to discussions about the possibility of Anthropic developing AI video technologies similar to Sora, with some jokingly confirming Sora by Anthropic.
New Tool Vaults Ahead of n8n?: A member claimed a previewed tool immediately vaults ahead of n8n, though another expressed doubt due to n8n’s established community and customization capabilities.
- They suggested the new tool might lack deep orchestration features compared to n8n, noting n8n’s significant traffic according to Similarweb data.
70B Llama for Lawyers?: A member questioned whether a quantized 70B Llama model would be sufficient for specific legal tasks, expressing skepticism about its ability to handle the required level of detail.
- Another user chimed in, mentioning that someone is deploying fully local workflows for lawyers on maxed out M4s.
Opus 4 vs Sonnet 4: A user shared that Cursor + o3 max is the most productive they’ve ever been, nothing else comes close, while noting that Opus 4 actually does a better job compared to Sonnet 4.
- There was also this tweet with a similar comparison tweet.
Claude Code Edges Out Cursor?: A member suggested Claude Code might surpass Cursor due to its composability and lack of tool call limits, which forces Cursor users to prompt the model to continue.
- Another added that Claude Code reads files end-to-end, while Cursor/Windsurf use RAG with too many tricks that makes their results hard to trust and reproduce.

Latent Space ▷ #ai-announcements (5 messages):

Autonomous SWE Agents, Factory AI, Browser-based AI Design, SWE-Bench Obsolescence

Latent Space Pod Collabs with Factory AI: The Latent Space podcast announced a new collaboration with Factory AI on X.
- This collaboration promises insights into the rapidly evolving field of autonomous software engineering.
Factory AI’s Autonomous SWE Agents (‘Droids’) Unveiled: A thread summarizes a discussion with Factory AI’s Matan Grinberg and Eno Reyes about their Autonomous SWE Agents (‘Droids’) platform, highlighting Factory AI’s origins.
- Key discussion points include the platform’s browser-based design and the challenges in enterprise AI development.
SWE-Bench Faces Obsolescence: The discussion with Factory AI also addresses the obsolescence of SWE-Bench as an evaluation metric.
- This suggests a shift in how AI-driven software engineering tools are assessed, emphasizing more practical, real-world benchmarks.

Manus.im Discord ▷ #general (83 messages🔥🔥):

Manus instability, Connecting tasks to GitHub repositories, Claude Sonnet 4.0, Veo 3, AI Studio

Manus Hit by Instability and Bugs: Users reported experiencing bugs and errors with Manus, coinciding with recent updates, raising concerns about instability.
- One user reported experiencing a invalid JSON error which caused the task to delete and recreate itself 5 times a second.
GitHub Repos Get Votes: Users showed their support for connecting tasks to GitHub repositories via upvotes.
- Some users suggested implementing the feature directly into the UI, instead of a PAT token.
Sonnet 4.0 Expected Soon: A co-founder highlighted the strong relationship with Claude, sparking anticipation for the release of Sonnet 4.0.
- Other members expressed their distaste of Veo 3 and its creepy videos.
AI Studio Audio and Video: Members clarified that AI Studio offers audio and video support, including audio generation capabilities, though with a 5:33 time limit.
- One member pointed out that they only use Gemini to transcribe audio.
Users Want to Hoard Points: Members discussed the possibility of accumulating daily credits on Manus, similar to a game, but acknowledged that this feature is currently unavailable.
- There is a feature request to hoard unused points.

Notebook LM ▷ #use-cases (6 messages):

NotebookLM, NLM potential, NLM limitations, NLM Pro tiers, NLM podcast settings

User Inquires About NotebookLM’s Business Applications: A user inquired about using NotebookLM to create Ads, Whitepapers, Goals, Webinars, and presentations for their new business after uploading all relevant information.
- Another user pointed out that ChatGPT could also be used for this purpose.
NLM Podcast Features Not Showing for Pro User: A user asked if custom instructions and duration settings for podcasts in NLM are limited to the Ultra tier only.
- Another user responded that the Pro tier should also have those features, contradicting the original user’s experience.

Notebook LM ▷ #general (57 messages🔥🔥):

Custom Test Simulator, Smart Flashcard System, Selenium Integration, Audio Overviews Length, Podcast Voices

Users Want Custom Test Simulators and Smart Flashcards: A user suggested adding a custom test simulator with adjustable settings and a smart flashcard system using spaced repetition to the platform.
Mind Maps Customization Desired: A user inquired about customizing mind maps to start from a specific topic based on the source material.
Longer Audio Overviews Achieved by Feeding More Information: A user reported that they achieve longer audio overviews by feeding more information into NotebookLM, specifically by exporting deep research into documents before importing to NotebookLM.
Users Request Selenium Integration for Workflow Automation: A user inquired about integrating NotebookLM with Selenium to automate summaries for a law office workflow.
Experimenting to select Female Podcast Voices: Users are experimenting with prompts like “only male podcast” to influence the gender of the podcast voice, with mixed success, but with a preference for the Spanish female voice.

Yannick Kilcher ▷ #general (35 messages🔥):

DeepSeek scaling, Embedding Forward Pass, LLM Choice, Gemini Diffusion, GFlownets

IntologyAI Ponders DeeperSeek: IntologyAI questions on X why DeepSeek doesn’t get deeper with more versions.
Embedding Forward Pass Modification Explored: A member is exploring letting models pass embeddings to themselves through a modification to the forward pass with hooks to let earlier layers know what’s happening in later layers, with code available on GitHub.
GPTs and LLM choice is being discussed: Members are discussing choosing the right premium LLM between ChatGPT, Gemini, Claude, and Perplexity and whether to care about generating media like images, video and audio.
- One said that ChatGPT has sora, imo worse than veo.
GFlownets Losing Steam?: Members discussed the reasons why GFlownets have lost popularity, noting that they are a solution looking for a problem.
- The issue stems from needing a model of the problem to sample from all possible future states, which then makes other methods potentially more suitable, according to one member’s in-depth explanation.
Anthropic Opensources Mechinterp Code: Anthropic has open-sourced its mechinterp code, with a link to the announcement and GitHub repository provided.

Yannick Kilcher ▷ #paper-discussion (4 messages):

Paper Discussion, KNN, Matteo, Work crunch

WaveFunction Asks About Paper Discussions: WaveFunction inquired about the status of paper discussions and offered others the opportunity to present.
- They mentioned being in crunch mode at work but expressed intent to resume paper presentations next week.
WaveFunction queries whereabouts of KNN and Matteo: WaveFunction inquired about the absence of KNN and Matteo, asking if they had gone on walkabout.
- They noted the recent abundance of interesting material and a backlog of bookmarks, alongside their current busyness.

Yannick Kilcher ▷ #agents (1 messages):

NeurIPS videos, Simons Institute YouTube channel

NeurIPS Videos Recommended as Resource: A member suggested checking out the NeurIPS videos on the topic of agents for additional information.
- No specific video title was mentioned, but the suggestion was made in the context of the agents channel.
Simons Institute YouTube Channel Suggested: The Simons Institute YouTube channel was also recommended as a resource for learning more about agents.
- No specific video title was mentioned, but the channel is known for its content on theoretical computer science and related topics.

Yannick Kilcher ▷ #ml-news (14 messages🔥):

R2 vs O4 benchmark, FrontierMath Fraud, Astrocytes importance, R1-0528 stats

R2 Squashes O4 in Benchmarks!: Members discussed a benchmark showing R2 surpassing O4, linked as AI Battle tweet.
FrontierMath Allegations Surface!: A member claimed FrontierMath was outed as a fraud case bankrolled and locked under NDA by OpenAI.
Astrocytes: Brain’s Unsung Heroes?: A user shared an article from MIT about astrocytes possibly explaining human brains’ huge storage capacity.
R1-0528 Shows Improvement Across the Board: A user posted an image showcasing the stats of R1-0528, noting reasonable improvement across the board, stats image.

MCP (Glama) ▷ #general (14 messages🔥):

Awesome MCP Servers PR, MonetizedMCP Launch, OAuth2.1 Authentication for MCP Servers, Remote MCP Server Demo

Awesome-MCP-Servers list gets a PR: A member added a PR on the awesome-mcp-servers list.
MonetizedMCP Opens Programmatic Payments: A member announced MonetizedMCP, an open-source extension enabling MCP servers to accept programmatic payments, fully payment rail agnostic and not modifying the core MCP spec, as well as Fluora, a marketplace for MonetizedMCP servers.
- They are inviting builders interested in machine-to-machine payments to check it out and DM if they want to join the alpha.
OAuth2.1 Authentication for MCP Servers?: A member asked for examples of remotely hosted mcp servers offering authentication via OAuth2.1 per the draft specification from 2025-03-26.
- They specified that the ideal server should be streamable HTTP.
Remote MCP Server Demo Deployed: A member shared a demo authenticating to an MCP server per the 2025-03-26 spec and then lazily authenticating to Confluence, accessible via a Cloudflare tunnel.
- They noted that the server went to sleep but should be up again after turning caffeine on.

MCP (Glama) ▷ #showcase (11 messages🔥):

mcp-ui-bridge porting, Multi-Chat MCP Server, Financial Analysis Agent, VerbalCodeAI, *arrs MCP servers

MCP-UI-Bridge Jumps to Python!: A member announced the completion of porting the mcp-ui-bridge from Typescript to Python, with equivalent functionality in both versions and linked to Python, Typescript and GitHub versions.
- The member also shared a Substack post explaining the concept and invited users to DM for a closed preview of the mobile Android MCP client (iOS coming soon).
Multi-Chat MCP Server Aims for AI Teamwork: A member shared a Reddit post and GitHub repo for a Multi-Chat MCP Server designed to facilitate AI collaboration, extensible to teams, supporting simultaneous chat connections and letting AI agents act as teammates and pair programmers.
- Another member thanked the author and said they’re implementing now.
Financial Analysis Agent Built with MCP-Agent: A member described building a financial analysis agent using mcp-agent, which pulls stock data, verifies it, analyzes insights, and generates a markdown report, available on GitHub.
- They noted that plugging in EvaluatorOptimizer significantly improved the agent’s performance by looping the research agent through an evaluator until the output hits a quality bar.
VerbalCodeAI Launches Codebase Navigation Tool: A member shared VerbalCodeAI, an AI-powered tool that simplifies codebase navigation and understanding from the terminal, featuring code search, analysis, chat, and an MCP server for integration with tools like Claude Desktop, available on GitHub and its website.
- The user said It’s a project I’ve been working on with a lot of enthusiasm, and invited users to try it.
arrs MCP Servers in Action: A member shared a list of arrs MCP servers in action including: Plex, Overseerr, Prowlarr, qbittorrent, sabnzbd, Tautulli, Portainer, Unifi, Unraid, and Gotify, with a link to the yarr-mcp GitHub repository.
- They included screenshots of the servers in action, but no further details.

Modular (Mojo 🔥) ▷ #general (8 messages🔥):

Modverse 48, Modular blog, Level Advancement

Modverse #48 Launch Sparks Confusion: The launch announcement of Modverse #48 on the Modular blog led to confusion, as a user mistook “live” for a live stream link.
- The user later clarified they were unfamiliar with Modverse and expected a YouTube live stream link, apologizing for the misunderstanding.
User Achieves Level 4 Status: A user was congratulated for advancing to level 4.
- No other details were given.

Modular (Mojo 🔥) ▷ #mojo (7 messages):

Mojo C libraries, Mojo tree structure, Mojo GUI UI and FFI

Mojo Still Needs Established C Libraries: A user stated that they will use established C libraries like OpenSSL until the Mojo ecosystem matures more.
Defining Tree Structure in Mojo: Multiple members discussed how to properly define a tree structure in Mojo using ArcPointer and Optional types, with one member suggesting the need to wrap Node itself in Arc.
- The recommended code snippet is as follows: alias Node = ArcPointer[NodeData] and struct NodeData(Movable): var value: Int var left: Optional[ArcPointer[NodeData]] var right: Optional[ArcPointer[NodeData]].
Mojo GUI UI and FFI Guide: One member posted a guide on the Modular forum for FFI issues they faced when developing a Mojo GUI UI, focusing on an X11 version with an upcoming OpenGL version.
- They shared a video showcasing the functionality of the X11 version and an image of the OpenGL version, noting that once FFI issues were solved, they focused on widget creation.

LlamaIndex ▷ #blog (2 messages):

LlamaIndex Agents in Finance Workshop, LlamaCloud agentic strategies, Agentic Retrieval > Naive RAG

LlamaIndex Holds Agents in Finance Workshop: LlamaIndex’s CEO, @jerryjliu0, is leading a workshop on agents in finance in NYC, with high interest that exceeded capacity.
- Follow LlamaIndex on Twitter to stay informed about future events, and learn more about their enterprise offerings.
Agentic Retrieval Rises from Naive RAG’s Grave: LlamaIndex declares that naive RAG is not enough for a modern application and promotes agentic strategies built into LlamaCloud.
- These strategies can be adopted with just a few lines of code, as detailed in this Twitter thread.

LlamaIndex ▷ #general (8 messages🔥):

Exception Handling in Workflows, Nested Asyncio Tasks, LLM-Powered Agents, Multi-Agent Systems, Model Context Protocol (MCP)

Exceptions Swallowed in LlamaIndex Workflows?: When calling a workflow via workflow.run(), exceptions within the steps may be swallowed, leading to undetected workflow failures, but that this is believed to be fixed in this thread.
- The exception is attached to the asyncio future, which can be accessed via handler.exception() or through try/except blocks, as shown in this colab.
Nested Asyncio Chaos in Workflows: Nested workflows with awaiting and yielding events can complicate error reporting in asyncio tasks.
- The top-level caller may need to implement try/except or access handler.exception() to reliably detect errors in nested asyncio futures.
AI Agent Pro Introduces Themselves: A member introduced themselves as an expert delivering LLM-powered agents with capabilities like RAG, workflow automation across APIs, and multi-agent systems.
- Their stack includes OpenAI (GPT-4/4o), LangChain, LlamaIndex, AutoGen, FAISS, Pinecone, React, FastAPI, and more, and are available for contract work.

LLM Agents (Berkeley MOOC) ▷ #hackathon-announcements (1 messages):

AgentX Submission, Entrepreneurship Track, Research Track, Agentic AI Summit

AgentX Deadline Looms: The AgentX submission deadline is fast approaching on May 31st at 11:59 PM PT, with over $150,000 in prizes across both tracks.
Entrepreneurship Track Requires Pitch: The Entrepreneurship Track requires a pitch deck (≤20 slides), a product demo video (max 3 min), and a live product link; submit here.
Research Track Requires Scientific Paper: The Research Track requires a scientific paper (7-8 pages max excluding appendix), a video presentation (max 3 min), and a GitHub repository; submit here.
Agentic AI Summit on August 2: The Demo Day & Awards will be held at the Agentic AI Summit on August 2nd at Berkeley.
- Questions can be directed to the team in the designated channel.

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (6 messages):

Kaggle project submissions, Submitting Perplexity outputs, Article language, Adding certificate to LinkedIn

Kaggle links accepted for project submissions: A member asked if a public Kaggle project can be submitted instead of a GitHub repo for the research track.
- Another member confirmed that a Kaggle link is acceptable, but all code must be in one place; they suggested putting prompts/outputs in the appendix of the manuscript due to the submission form’s single-file upload limit.
Perplexity outputs submission clarified: A user inquired about submitting Perplexity outputs directly from their interface without code.
- They were advised to include the prompts and outputs in the appendix of their manuscript due to the single-file upload limit on the submission form.
Spanish articles are fine: A member asked if the course article could be written in Spanish instead of English.
- The staff member responded that Spanish is fine.
LinkedIn certificate guide requested: A member suggested a guide on how to add the course certificate to a LinkedIn profile, specifically asking about the Name, Issuing organization, and Credential ID fields.
- The response indicated that the Name should be the certificate name (e.g., Large Language Model Agents MOOC, Fall 2024), the Issuing organization is Berkeley Center for Responsible, Decentralized Intelligence, and unfortunately, there’s no Credential ID.

Torchtune ▷ #general (7 messages):

Sanity Checks, Convergence of Loss Curves, Qwen 0.5b

Torchtune Discusses Sanity Checks: A member inquired about sanity checks with models, specifically when adding special tokens and overfitting on a small dataset with LoRA finetuning.
- Another member mentioned that common sanity checks include verifying convergence of loss curves, running basic generations with finetuned models, and running evaluations on common benchmarks.
Methods for Initializing Embeddings for Special Tokens: A member detailed two approaches for initializing embeddings for new special tokens: 1) taking the mean of all pre-trained token embeddings, and 2) using the natural language description of each special token to mean average only those tokens.
- This member also mentioned that the checks were tried on Qwen 0.5b, and their loss curves did not look optimal.

Cohere ▷ #💬-general (2 messages):

CMD-R Model Update, Local Models, HF Weights

CMD-R Model Weights on HF: A member inquired about the possibility of a new CMD-R model update with weights released on Hugging Face.
- They mentioned that the August 2024 release remains the only trustworthy local model for 24GB VRAM setups.
Local Model Trustworthiness: The discussion highlights the importance of trustworthy local models, especially for users with limited VRAM, such as 24GB.
- The August 2024 release is specifically praised for its reliability in this context.

Cohere ▷ #🔌-api-discussions (2 messages):

Cohere OpenAI Cline VS Code

Cohere OpenAI Cline VS Code Incompatibility: A member reported wishing to use the Cohere OpenAI compat endpoint with Cline VS Code, but stated that it’s not working.
- They, however, indicated they had resolved the issue.
VS Code Extension Wish: A user expressed a desire to use Cohere’s OpenAI compatibility endpoint within the Cline VS Code extension.
- However, they noted experiencing difficulties in getting it to function as expected but had resolved the issue.

Cohere ▷ #🤝-introductions (2 messages):

AI Automation, No-Code/Low-Code Development, AI Agents & LLM Workflows, Voice AI Solutions

AI Automation Expert Enters the Chat: An expert in AI, automation, workflow, and agent technologies introduced themselves, bringing hands-on experience building LLM-powered systems, no-code/low-code products, and voice AI solutions.
- They specialize in creating intelligent agents, scalable automations, and full-stack MVPs using modern AI and visual tools, notably n8n, Make.com, Zapier, Glide, FlutterFlow, GPT-4, Claude, and LangChain.
Voice AI Virtuoso: The member detailed their experience in building smart voicebots for lead gen, support, and scheduling with real-time memory and context using tools like VAPI, Bland AI, Retell AI, Twilio, and Telnyx.
- They are keen to connect with teams building AI-first voice agents, automations, and smart tools to innovate together.
Kyzo.ai Project Showcased: The member shared information about their past work with Kyzo.ai, focusing on building AI voice agents for sales outreach using VAPI, Bland AI, and Retell AI.
- They also created real-time cold calling bots with LLM logic, CRM sync, and memory-aware follow-ups, highlighting their full-stack capabilities.

DSPy ▷ #show-and-tell (2 messages):

DSPy MCP tutorial, streamable HTTP, HuggingFace Spaces

MCP Tutorial gets Streaming HTTP Port: A member ported the DSPy MCP tutorial to work with streamable HTTP.
HuggingFace Space hosts DSPy MCP tutorial: The updated tutorial is hosted on HuggingFace Spaces.

DSPy ▷ #general (3 messages):

DSPy 3, Latent Space Podcast, Conference Bookings

DSPy 3 Dropping on Latent Space Podcast!: The next version of DSPy (v3) will be discussed in detail on the Latent Space Podcast, according to this tweet.
- A member has already signed up for the talk.
Conference Bookings Filling Up: A member mentioned that they signed up for the Latent Space Podcast talk.
- Most other talks were fully booked at the conference, indicating high interest.

tinygrad (George Hotz) ▷ #general (2 messages):

Whisper Bounty, Draft PR

Whisper Bounty in Progress: A contributor is actively working on the Whisper bounty, addressing errors, cleaning code, and continuing the work of a prior contributor.
- They are currently testing a no speech bug and aim to improve speed, inquiring if their progress warrants locking the bounty and submitting a pull request.
Draft PR Encouragement: A member encouraged the bounty worker to submit a draft PR to showcase their ongoing work on the Whisper bounty.
- This allows for early feedback and collaboration.

tinygrad (George Hotz) ▷ #learn-tinygrad (3 messages):

types.FunctionType documentation, dynamic function construction

User seeks types.FunctionType Documentation: A member asked for more detailed documentation on dynamic function construction via types.FunctionType, used in upat_interpret() within ops.py in the tinygrad library.
- The member mentioned the official Python documentation, source code, and language reference lacked detailed information.
Guidance for types.FunctionType sought: A member suggested using help(types.FunctionType) to get more information on the function.
- They linked to the C code within CPython’s source code.

Nomic.ai (GPT4All) ▷ #announcements (1 messages):

Tableau CEO joins Nomic talk, New Fundraising, New Models

Tableau’s Ex-CEO Talks at Nomic: The former CEO of Tableau will host a live talk with Nomic next Wednesday at 12pm EST, sign up here.
Exciting Developments Await!: Stay tuned for upcoming news regarding new fundraising efforts and innovative model releases from Nomic.
- We’re pushing the boundaries of what’s possible in AI.

Nomic.ai (GPT4All) ▷ #general (2 messages):

VOID Pirate Captain Introduction, LocalDocs with Norus Hermes 2 Mistral DPO Model, AI mini PC

VOID Pirate Captain Joins the Server: A new member introduced themself as the VOID Pirate Captain, describing themselves as a builder of strange dreams, trader of truths, and occasional breaker of cycles.
- The Captain mentioned running a freeze-dried candy lab and a soul-forged philosophy ship, expressing interest in connecting with others building minds in machines.
User Experiments with Norus Hermes 2 Mistral DPO Model and LocalDocs: A member shared their experience using the Norus Hermes 2 Mistral DPO Model with LocalDocs, noting it had only a few mistakes.
- The member expressed that figuring it out was fun, and asked what other models people are using to create their own personal LLMs, quoting “playing in the dark and calling it light”.
AI Mini PC with 128GB Unified Memory: A member mused about how to describe searching through old jeans pockets to find money to buy a new AI mini PC with 128GB unified memory.
- The member expressed excitement that a LLM of about 8-20 GB size, combined with 128GB unified memory, would be amazing in summarizing anything or chatting with “Local docs”.

Table of Contents