Friday, May 22, 2026

12 articles — 🔴 2 critical , 🟡 8 important , 🟢 2 interesting

🤖 Models (2)

🟡 🤖 Models May 22, 2026 · 3 min read

arXiv:2605.21006: Off-the-shelf persona vectors achieve 68-98% effectiveness of targeted sycophancy steering in LLM models

Editorial illustration: arXiv:2605.21006 — Off-the-shelf persona vectors achieve 68-98% effectiveness of targeted sycophancy steering in LLM models

Researchers published a paper on arXiv on 21 May 2026 titled 'Playing Devil's Advocate' showing that existing persona vectors developed for roleplay tasks can reduce sycophancy (the model's tendency to agree with the user even when the user is wrong) to 68-98% of the effectiveness of specialised Contrastive Activation Addition (CAA) — without training on sycophancy-specific data. Geometric analysis reveals that sycophancy is a persona-level property rather than a single steerable direction in activation space, opening much easier pathways for alignment.

🟢 🤖 Models May 22, 2026 · 3 min read

Black Forest Labs: FLUX Erase outperforms GPT Image-2 (68.5%) and Finegrain (63.2%) in prompt-free object removal

Editorial illustration: FLUX Erase outperforms GPT Image-2 (68.5%) and Finegrain (63.2%) in prompt-free object removal

Black Forest Labs launched FLUX Erase on 21 May 2026 — an inpainting tool that uses a binary mask to remove objects, shadows, watermarks, and text from images and reconstructs the background without any textual prompt. A benchmark on 198 test images demonstrates superiority over GPT Image-2 (68.5%) and Finegrain Eraser Standard (63.2%). The tool is available through the BFL API and a public demo at flux-tools.bfl.ai/erase, positioning BFL as a specialist in professional creative workflow tools.

⚖️ Regulation (1)

🔴 ⚖️ Regulation May 22, 2026 · 3 min read

UK AI Safety Institute: Overseeing advanced AI systems is becoming harder — 20+ degradation pathways identified

Editorial illustration: Overseeing advanced AI systems is becoming harder — 20+ degradation pathways identified

UK AI Safety Institute (AISI) published a report on 21 May 2026 analysing the future of oversight over advanced AI systems, based on 25 expert interviews from industry, government, and academia. The main finding: existing oversight rests on foundations that are likely to erode. More than 20 distinct degradation pathways for oversight mechanisms have been identified, with particular focus on latent reasoning, capability masking, external AI actions, and AI-to-AI communication.

🤝 Agents (5)

🔴 🤝 Agents May 22, 2026 · 3 min read

Microsoft Research: MagenticLite + Fara1.5 (4B/9B/27B) — agentic AI optimised for small models achieves SOTA

Editorial illustration: MagenticLite + Fara1.5 (4B/9B/27B) — agentic AI optimised for small models achieves SOTA

Microsoft Research released a trio for agentic AI with small models on 21 May 2026: MagenticLite (a browser and filesystem UI application), MagenticBrain (a 14B orchestration model fine-tuned from Qwen 3 14B), and Fara1.5 (a computer-use model in 4B, 9B, and 27B variants). Fara1.5-27B reaches over 90% of SOTA on the Online-Mind2Web benchmark (300 web tasks), nearly doubling the performance of the previous Fara-7B. The goal is to demonstrate that agentic AI does not require massive models — only well co-designed tools and a harness.

🟡 🤝 Agents May 22, 2026 · 3 min read

AWS: Nova Act receives HIPAA eligibility — agentic ePHI automation for healthcare workflows

Editorial illustration: Nova Act receives HIPAA eligibility — agentic ePHI automation for healthcare workflows

AWS announced on 21 May 2026 that Amazon Nova Act, the agentic AI service for automating browser and UI workflows, has received formal HIPAA-eligible status. Healthcare organisations can now use Nova Act to work with protected health information (ePHI) — authorising prior authorisations, verifying insurance, and submitting referrals through vendor web portals. The service integrates with Amazon Bedrock AgentCore and the Strand Agents framework, requires a signed BAA agreement and AWS KMS encryption, and currently operates only in the US East (N. Virginia) region.

🟡 🤝 Agents May 22, 2026 · 3 min read

Anthropic: Claude Code v2.1.147 introduces Workflow tool for deterministic multi-agent orchestration

Editorial illustration: Claude Code v2.1.147 introduces Workflow tool for deterministic multi-agent orchestration

Anthropic released Claude Code v2.1.147 on 21 May 2026 at 20:39 UTC — a new CLI version introducing the Workflow tool, the first deterministic multi-agent orchestration mechanism in the Claude Code ecosystem. The tool is initially disabled by default and activated via the CLAUDE_CODE_WORKFLOWS=1 environment variable. The same version renames the existing /simplify command to /code-review with effort levels (high/medium/low) and adds sandbox hardening against prototype-pollution and thenable-based escape attacks.

🟡 🤝 Agents May 22, 2026 · 4 min read

LangChain: From token streams to agent streams — typed channels replace classic streaming for multi-agent UI

Editorial illustration: From token streams to agent streams — typed channels replace classic streaming for multi-agent UI

LangChain published a post on 21 May 2026 by authors Christian Bromann and Nick Hollon describing a paradigmatic evolution from token streams to structured agent streams. Modern AI agents plan tasks, delegate to sub-agents, call tools, and pause for human review — classic text token streaming is insufficient for displaying that work. LangChain proposes typed channels transmitting messages, tool calls, state changes, sub-agent activity, and custom events. Applications subscribe only to relevant event types, making the UI efficient for long-running workloads.

🟡 🤝 Agents May 22, 2026 · 3 min read

OpenAI: Codex scaling to enterprise — 4 million weekly active users and the Codex Labs program

Editorial illustration: Codex scaling to enterprise — 4 million weekly active users and the Codex Labs program

OpenAI announced the enterprise scaling of the Codex platform on 21 May 2026 — the agentic coding tool that has reached 4 million weekly active users. The new Codex Labs program and partnerships with major consulting firms were announced to help large enterprises implement and scale Codex. The news marks a formal enterprise go-to-market move positioning Codex as a direct competitor to GitHub Copilot in the mid and high market segments.

🏥 In Practice (2)

🟡 🏥 In Practice May 22, 2026 · 3 min read

arXiv:2605.21427: PALS — power-aware LLM serving for MoE models achieves +26.3% energy efficiency and 4-7× fewer QoS violations

Editorial illustration: arXiv:2605.21427 — PALS achieves +26.3% energy efficiency and 4-7× fewer QoS violations in MoE LLM serving

Researchers published PALS on 21 May 2026 on the arXiv preprint server — a runtime system that integrates GPU power control directly into LLM serving for Mixture-of-Experts models. PALS uses lightweight offline power-performance models and a feedback controller that dynamically optimises configurations against throughput targets. It achieves 26.3% improvement in energy efficiency and 4-7× reduction in QoS violations under power constraints, integrates into vLLM without modifying the API or retraining models. It addresses a growing operational pain point for data centres — GPU cluster energy consumption that is becoming the dominant constraint on growth.

🟢 🏥 In Practice May 22, 2026 · 3 min read

CNCF: NetEase Games achieves 30-second LLM cold start on Kubernetes via Fluid prefetching layers

Editorial illustration: NetEase Games achieves 30-second LLM cold start on Kubernetes via Fluid prefetching layers

CNCF published a case study from NetEase Games on 21 May 2026 (authors Haifeng Liao and Xiang Zhang) describing how they reduced load times for 70B-class LLM models from 42 minutes (direct S3 access) to under 30 seconds using the CNCF-incubated Fluid project. The key is a Fluid prefetching layer that shares models between teams instead of duplicating caches, together with pre-warming scheduling that eliminates cold starts. A technical case study for everyone running serverless LLM inference on Kubernetes with large models.

🛡️ Security (2)

🟡 🛡️ Security May 22, 2026 · 4 min read

Microsoft Research: Vega — ZK proofs for digital identity, 92ms generation and 70% faster repeated proofs

Editorial illustration: Vega — ZK proofs for digital identity, 92ms generation and 70% faster repeated proofs

Microsoft Research presented Vega on 21 May 2026 — a zero-knowledge proof system that proves facts from government documents (age, status, qualifications) without revealing the document itself. Proof generation takes 92ms on standard devices, proof size is 108KB, and verification takes 23ms. The key innovation is fold-and-reuse proving, which makes every subsequent proof of the same credential up to 70% faster, and a lookup-centric circuit design that avoids parsing the entire CBOR document. Vega is particularly relevant for AI agents that need to prove identity on behalf of users without storing sensitive data.

🟡 🛡️ Security May 22, 2026 · 3 min read

OECD AI: Collective AI security requires G7 coordination — prompt injection, agent security, and model poisoning as priorities

Editorial illustration: Collective AI security requires G7 coordination — prompt injection, agent security, and model poisoning as priorities

OECD AI published a policy report on 21 May 2026 by authors de Rivoire, de Leusse, Seger, and Butts, arguing that AI security requires international coordination because it exceeds the scope of classical cybersecurity. Three priority areas are identified: defending against prompt injection attacks with reusable attack methods, security of AI agents autonomously accessing tools and memory, and preventing model poisoning where a small number of contaminated documents can compromise models of various sizes. The report recommends coordination through G7 and OECD-GPAI mechanisms with active public-private collaboration.

← Previous day Next day →