Thursday, May 7, 2026

19 articles — 🔴 4 critical , 🟡 14 important , 🟢 1 interesting

🤖 Models (3)

🟡 🤖 Models May 7, 2026 · 2 min read

arXiv:2605.03195: Terminus-4B — 4 billion parameters for terminal execution matches Claude Opus and GPT-5.3-Codex on SWE-Bench Pro with ~30% fewer main agent tokens

Editorial illustration: two concentric circles — smaller 4B model for terminal and larger frontier model for planning connected by a task delegation arrow

Terminus-4B is a 4-billion-parameter Qwen3 fine-tune specialized for terminal execution in agentic systems — on the SWE-Bench Pro benchmark it matches and sometimes outperforms Claude Sonnet/Opus and GPT-5.3-Codex baselines, while reducing main agent token consumption by approximately 30% by isolating verbose build/test logs in a subagent context.

🟡 🤖 Models May 7, 2026 · 2 min read

arXiv:2605.04908: Gosset with Curated Pharma Index Outperforms Frontier LLMs by 3.2x

Editorial illustration: arXiv:2605.04908: Gosset with curated pharma database outperforms frontier LLMs 3.2x

Gosset is a specialized AI platform with curated pharmaceutical data that returned 3.2 times more verified drugs per query compared to four frontier systems, achieving 100% precision and full recall across ten niche oncology and immunology targets.

🟡 🤖 Models May 7, 2026 · 2 min read

Google: Gemini API Gets Multimodal File Search for Images and Breaking Change in Interactions API

Editorial illustration: Gemini API gains multimodal File Search and breaking change in Interactions API

Google has expanded Gemini File Search to multimodal image search using the gemini-embedding-2 model, with media_id in grounding metadata for visual citations. Simultaneously, a breaking change is announced in the Interactions API where outputs becomes steps, with the new default on 20.05.2026 and removal of the old schema on 06.06.2026.

📦 Open Source (1)

🟡 📦 Open Source May 7, 2026 · 2 min read

AMD: vLLM-ATOM plugin brings Instinct optimisations without changing vLLM code

Editorial illustration: vLLM-ATOM plugin brings Instinct optimisations without changing vLLM code

AMD has released vLLM-ATOM, an open-source plugin that integrates Instinct GPU optimisations into the vLLM production framework without any changes to the upstream source code. It activates automatically through Python entry_points, supports dense and MoE models including Kimi-K2.5 and DeepSeek V3/R1, and uses AITER kernels for fused MoE and flash attention.

⚖️ Regulation (1)

🔴 ⚖️ Regulation May 7, 2026 · 2 min read

EU AI Office: Political Agreement to Simplify AI Act and Ban Nudification Apps

Editorial illustration: Political agreement to simplify the AI Act and ban nudification applications

The European Commission, Parliament and Council reached a political agreement on the Digital Omnibus package, which simplifies the AI Act and introduces an explicit ban on nudification apps. High-risk AI systems will apply from 02.12.2027, and AI integrated into products from 02.08.2028.

🤝 Agents (5)

🔴 🤝 Agents May 7, 2026 · 2 min read

arXiv:2605.06651: Google DeepMind introduces AI Co-Mathematician with 48% on FrontierMath Tier 4

Editorial illustration: 2605.06651: Google DeepMind introduces AI Co-Mathematician with 48% on FrontierMath Tier 4

The Google DeepMind team has published a paper on the AI Co-Mathematician, an interactive workspace where agents collaborate with mathematicians on open problems. The system achieved 48% on the FrontierMath Tier 4 benchmark — a new record among all AI systems.

🟡 🤝 Agents May 7, 2026 · 2 min read

Anthropic: Managed Agents gain multiagent sessions, Outcomes, webhooks and vault refresh in public beta

Editorial illustration: diagram of multiple Claude agents connected in a single session canvas with vault and webhook icons

Claude Managed Agents is Anthropic's managed platform for autonomous agents, which on May 6, 2026 received four new features in public beta: multiagent sessions, the Outcomes mechanism for defining goals, webhooks for session and vault lifecycle events, and background refresh for mcp_oauth credentials. New filters for sessions by status and events by type and creation time were also added.

🟡 🤝 Agents May 7, 2026 · 2 min read

GitHub: validation of agentic behavior via dominator analysis from compiler theory achieves 100% accuracy vs 82% agent self-assessment

Editorial illustration: graph structure diagram with highlighted dominator nodes representing essential steps in agent execution

GitHub publishes a validation framework for non-deterministic AI agents that borrows dominator analysis from compiler theory — from 2 to 10 successful executions of the Copilot Coding Agent, the system learns which steps are essential and which are optional, achieving 100% accuracy in distinguishing agent bugs from genuine product regressions.

🟡 🤝 Agents May 7, 2026 · 2 min read

GitHub: Copilot for VS Code Gets Terminal Access and Bring-Your-Own API Keys

Editorial illustration: Copilot for VS Code gains terminal access and bring-your-own API keys

GitHub Copilot for Visual Studio Code received semantic search across the entire codebase, agent access to open terminals, and the ability to plug in your own API keys for Anthropic, OpenAI and other providers during the April release cycle (versions 1.116–1.119).

🟡 🤝 Agents May 7, 2026 · 2 min read

vLLM: Mooncake distributed KV cache store integration delivers 3.8× higher throughput and 46× lower P50 TTFT for multi-turn agentic workloads

Editorial illustration: network of GPU nodes connected by RDMA links with a central distributed KV cache pool

vLLM integrates Mooncake, an open-source distributed KV cache store that eliminates repeated prefix computation between agentic turns — on realistic Codex traces with 12 GB200 GPUs, throughput increases 3.8×, P50 TTFT drops 46×, end-to-end latency drops 8.6×, and cache hit rate jumps from 1.7% to 92.2%.

🔧 Hardware (1)

🔴 🔧 Hardware May 7, 2026 · 3 min read

NVIDIA: Spectrum-X Multipath Reliable Connection becomes OCP open standard for gigascale AI networks

Editorial illustration: parallel fiber optic paths between AI racks with MRC, Spectrum-X and OCP open standard labels

NVIDIA Spectrum-X Multipath Reliable Connection (MRC) is an RDMA transport protocol that distributes a single connection across multiple network paths and has now been published as an open specification through the Open Compute Project. MRC is already in production at OpenAI, Microsoft's Fairwater data center and Oracle's Abilene data center, and was developed in collaboration with AMD, Broadcom, Intel and Microsoft.

🏥 In Practice (4)

🟡 🏥 In Practice May 7, 2026 · 2 min read

Anthropic: Claude Code v2.1.132 Brings 25+ Fixes and New Env Variables for Hooks

Editorial illustration: Claude Code v2.1.132 brings 25+ fixes and new environment variables for hooks

Anthropic released Claude Code v2.1.132 with 25+ bug fixes and two new environment variables: CLAUDE_CODE_SESSION_ID for hook integration and CLAUDE_CODE_DISABLE_ALTERNATE_SCREEN for native scrollback. A serious bug with 10GB+ RSS memory growth in MCP servers has also been resolved.

🟡 🏥 In Practice May 7, 2026 · 2 min read

Anthropic: Claude Code v2.1.133 brings worktree.baseRef and race condition fix

Editorial illustration: Claude Code v2.1.133 brings worktree.baseRef and race condition fix

Anthropic has released Claude Code v2.1.133 with new parameters worktree.baseRef, sandbox.bwrapPath/socatPath and the environment variable CLAUDE_EFFORT in hooks. The version fixes a race condition in parallel sessions and issues with Windows drive root paths. The third release this week after v2.1.131 and v2.1.132.

🟡 🏥 In Practice May 7, 2026 · 2 min read

GitHub: Optimising agentic workflows achieves token savings of 19% to 62%

Editorial illustration: Optimising agentic workflows achieves token savings of 19% to 62%

GitHub instrumented its production agentic workflows and identified three main sources of token waste: unnecessary MCP tools, deterministic data fetching and misconfigured bash rules. Optimisation achieved savings of 19% to 62% per workflow.

🟢 🏥 In Practice May 7, 2026 · 2 min read

arXiv:2605.04012: SymptomAI in the Fitbit app with 13,917 patients outperforms independent clinicians in differential diagnosis

Editorial illustration: user conversing with SymptomAI agent in the Fitbit app while a background panel displays a ranked list of diagnoses

SymptomAI is a conversational AI agent integrated into the Fitbit app and tested on approximately 13,917 participants; in the clinical evaluation subset its diagnostic recommendations achieved an odds ratio of 2.47 compared to independent clinicians who evaluated the same conversations. The study is a preprint.

💬 Community (1)

🔴 💬 Community May 7, 2026 · 3 min read

Anthropic: SpaceX becomes compute partner with 300 MW and doubled Claude Code limits

Anthropic has signed a compute partnership with SpaceX giving access to over 300 MW of new capacity and more than 220,000 NVIDIA GPUs at the Colossus 1 data center within a month. Simultaneously, five-hour rate limits for Claude Code Pro, Max, Team and Enterprise users are being doubled, and API limits for the Opus model are being raised.

🛡️ Security (3)

🟡 🛡️ Security May 7, 2026 · 2 min read

arXiv:2605.04019: automated red teaming agent achieves 85% success rate against Meta Llama Scout with 45+ attacks and 450+ transformations

Editorial illustration: automated agent simultaneously launching dozens of attack vectors against a language model on a control panel screen

A new paper presents an agentic red teaming system built on the Dreadnode SDK that achieves an 85% success rate against Meta's Llama Scout using 45+ attacks, 450+ transformations and 130+ scorers, reducing security testing from weeks to hours without any hand-written code.

🟡 🛡️ Security May 7, 2026 · 2 min read

arXiv:2605.04785: AgentTrust Intercepts AI Agent Tool Calls with 95–97% Accuracy

AgentTrust is an open-source runtime system that intercepts AI agent tool calls — file operations, SQL queries and shell commands — and returns one of four verdicts before execution. Across 930 test scenarios it achieves 95–97% accuracy, and approximately 93% on shell-obfuscated attacks.

🟡 🛡️ Security May 7, 2026 · 2 min read

arXiv:2605.06390: Automated alignment research is harder than it looks

Editorial illustration: 2605.06390: Automated alignment research is harder than it looks

A new paper by four researchers — including Geoffrey Irving (DeepMind/Anthropic) — argues that AI agents cannot reliably automate alignment research. Without clear evaluation criteria, optimisation pressure generates plausible but catastrophically wrong safety assessments that human reviewers struggle to detect.

← Previous day Next day →