Tuesday, May 19, 2026

17 articles — 🔴 3 critical , 🟡 9 important , 🟢 5 interesting

🤖 Models (3)

🔴 🤖 Models May 19, 2026 · 4 min read

arXiv:2605.15514: RoPE mathematically cannot distinguish positions or tokens in long contexts — theoretical proof of a fundamental limitation

Editorial illustration: arXiv paper 2605.15514 provides a mathematical proof that Rotary Positional Embeddings (RoPE) loses ability to distinguish positions in long contexts

arXiv paper 2605.15514 provides a mathematical proof that Rotary Positional Embeddings (RoPE), the positional mechanism used by nearly all modern large language models including Llama, Mistral, Qwen and GPT-NeoX, loses the ability to distinguish positions and tokens in long contexts. The authors conclude that fundamentally new architectural mechanisms are needed.

🟡 🤖 Models May 19, 2026 · 2 min read

Anthropic: Claude API web search tool now returns enriched data from SEC filings

Editorial illustration: Anthropic updated the web search tool in the Claude API to return richer, structured data from SEC filings

On May 18, 2026, Anthropic updated the web search tool in the Claude API to return richer and more structured data from SEC filings — including 10-K, 10-Q and 8-K documents. The upgrade makes it easier to build financial agents for earnings analysis, due-diligence and research with referenced primary sources.

🟢 🤖 Models May 19, 2026 · 2 min read

arXiv:2605.18732: Scaling Law for Hallucinations — Larger Model Does Not Always Mean Fewer Errors

Editorial illustration: Scaling law for LLM hallucinations — sigmoid curve for factual recall

Researchers tested 38 models on 8,900+ references and showed that LLM factual recall follows a sigmoid curve: the combination of parameter count and topic prevalence in training data explains 60–94% of variance. Hallucinations are not random — they are predictable and measurable.

📦 Open Source (2)

🟡 📦 Open Source May 19, 2026 · 2 min read

PyTorch: ExecuTorch MLX Delegate delivers 3–6× faster model execution on Apple Silicon GPUs

Editorial illustration: PyTorch team released the experimental ExecuTorch MLX Delegate — a backend leveraging Apple MLX framework and Metal GPU kernels

The PyTorch team released the experimental ExecuTorch MLX Delegate — a backend that leverages the Apple MLX framework and Metal GPU kernels for 3 to 6 times greater throughput on Apple Silicon chips. Supports Llama 3.2, Qwen 3, Phi-4 mini, Whisper and Voxtral real-time streaming transcription.

🟢 📦 Open Source May 19, 2026 · 2 min read

GitHub: Copilot Spaces API now generally available

Editorial illustration: GitHub announced the general availability of the REST API for Copilot Spaces, allowing teams to programmatically manage contextual AI workspaces

GitHub announced the general availability of the REST API for Copilot Spaces, allowing teams to programmatically create, configure and delete contextual AI workspaces. The new interface is especially useful for organizations managing large numbers of Spaces without relying on manual workflows.

⚖️ Regulation (1)

🟡 ⚖️ Regulation May 19, 2026 · 2 min read

EU AI Office: Public Consultation on High-Risk AI System Classification — Deadline June 23, 2026

Editorial illustration: European Commission opens public consultation on draft guidelines defining when an AI system falls under the high-risk category

The European Commission has opened a public consultation on draft guidelines defining when an AI system falls under the "high-risk" category of the AI Act. Stakeholders — from developers to civil society — can submit feedback until June 23, 2026 at 10 PM CET.

🤝 Agents (8)

🔴 🤝 Agents May 19, 2026 · 3 min read

Anthropic: Acquiring Stainless integrates MCP server tooling and SDK development directly into the Claude platform

Editorial illustration: On May 18, 2026, Anthropic acquired Stainless, a company founded in 2022 behind all official Anthropic SDKs

On May 18, 2026, Anthropic acquired Stainless, a company founded in 2022 that is behind all official Anthropic SDKs and MCP server tooling. Stainless builds SDKs for hundreds of companies, and the acquisition aims to better integrate Claude agents with external data and tools.

🔴 🤝 Agents May 19, 2026 · 3 min read

Anthropic: MCP Tunnels, Self-Hosted Sandboxes and Automatic File-Spill for Agents

Editorial illustration: Anthropic introduces three major Claude API platform updates for agent builders: MCP Tunnels for private networks

Anthropic has introduced three major updates to the Claude API platform for agent builders: MCP Tunnels for connecting to private networks without internet exposure, self-hosted sandboxes as an alternative to Anthropic infrastructure, and automatic file-spill for tool outputs exceeding 100K tokens.

🟡 🤝 Agents May 19, 2026 · 2 min read

arXiv:2605.18661: AI for Automated Research — Roadmap and User Guide

Editorial illustration: arXiv paper 2605.18661 from NUS and NTU researchers analyzing systems that autonomously generate research papers

arXiv paper 2605.18661 from researchers at NUS and NTU analyzes systems that autonomously generate research papers for just $15. Key finding: frontier LLMs fabricate results and cannot reliably assess idea novelty. A comprehensive roadmap defines the boundary between reliable assistance and unsafe AI autonomy.

🟡 🤝 Agents May 19, 2026 · 3 min read

arXiv:2605.16233: FORGE — AI agents develop shared memory without fine-tuning

Editorial illustration: arXiv:2605.16233 presents FORGE, a method by which LLM agents build shared memory through population-based experience sharing

arXiv:2605.16233 presents FORGE, a method by which LLM agents build shared memory through population-based experience sharing — without any model weight updates. On the CybORG CAGE-2 network defense task it achieves 1.7–7.7× better performance over the zero baseline, with particularly pronounced gains for weaker models.

🟡 🤝 Agents May 19, 2026 · 2 min read

Anthropic Claude Code: v2.1.144 Brings /resume for Background Sessions and Fix for 75-Second Hang

Editorial illustration: Claude Code CLI v2.1.144 introduces /resume support for background sessions with duration display like 'Agent completed'

Claude Code CLI v2.1.144 introduces /resume support for background sessions showing duration like 'Agent completed · 3h 2m 5s', fixes the 75-second hang on unavailable API, resolves an MCP tools/list pagination bug that silently lost tools, and delivers a range of terminal and MCP fixes.

🟡 🤝 Agents May 19, 2026 · 2 min read

GitHub: Copilot CLI remote control now generally available on all platforms

Editorial illustration: GitHub announced the general availability of remote control functionality for GitHub Copilot CLI

GitHub announced the general availability (GA) of remote control functionality for GitHub Copilot CLI. With the /remote on command, a developer can monitor and control an active terminal session from a mobile device, web, VS Code or JetBrains IDE — without interrupting the workflow.

🟢 🤝 Agents May 19, 2026 · 3 min read

arXiv:2605.18747: Code as Operational Substrate — A New AI Agent Paradigm

Editorial illustration: 41 researchers from UIUC and NVIDIA argue that code is not just an LLM output but an agent harness — operational substrate

41 researchers from UIUC and NVIDIA argue that code is not merely an LLM output but an agent harness — an operational substrate that unifies reasoning, action and verification into a single framework for building reliable AI systems.

🟢 🤝 Agents May 19, 2026 · 2 min read

arXiv:2605.16238: LLM-guided tree search beats CDC in epidemic forecasting

Editorial illustration: arXiv:2605.16238 presents an autonomous system combining LLMs and tree search algorithms for predicting seasonal epidemics

arXiv:2605.16238 presents an autonomous system combining LLMs and tree search algorithms for predicting seasonal epidemics. In real time, throughout the 2025-26 season, the system independently built models for influenza, COVID-19 and RSV that consistently matched or surpassed the CDC's gold-standard human-curated ensemble.

🏥 In Practice (1)

🟢 🏥 In Practice May 19, 2026 · 2 min read

CNCF: Kubernetes debugger erases traces — a serious problem for security audits

Editorial illustration: CNCF warns that kubectl debug — a tool for diagnosing Kubernetes containers — leaves no record after a session ends

CNCF warns that kubectl debug — a tool for diagnosing Kubernetes containers — leaves no record after a session ends. As a result, regulated industries cannot answer a key question: who viewed which container and for how long — directly violating PCI DSS and SOC 2 audit log requirements.

🛡️ Security (2)

🟡 🛡️ Security May 19, 2026 · 2 min read

arXiv:2605.16090: CrossMPI — an attack on vision-language models using image-only perturbation

Editorial illustration: arXiv:2605.16090 introduces CrossMPI — an attack on vision-language models that injects malicious instructions through invisible pixel changes

arXiv:2605.16090 introduces CrossMPI — an attack on vision-language models that injects malicious instructions solely through invisible pixel changes in an image, without any text. Researchers discovered that the critical layers of multimodal integration are located in the middle of the model, not at the end as previously assumed. The attack achieves an average ASR of 66.36%, surpassing all known baseline methods by 40.91 percentage points.

🟡 🛡️ Security May 19, 2026 · 3 min read

arXiv:2605.17634: Why Data/Instruction Separation Cannot Stop Prompt Injection

Editorial illustration: CISPA Helmholtz Center and Google researchers mathematically prove that data/instruction separation fails against contextual attacks

Researchers from CISPA Helmholtz Center and Google mathematically prove that data/instruction separation — currently the dominant defense against prompt injection attacks — fails to protect against contextual manipulation. With a new theoretical framework based on Contextual Integrity, they propose a fundamentally different approach to designing AI agent defenses.

← Previous day Next day →