Monday, May 18, 2026

11 articles — 🟡 5 important , 🟢 6 interesting

🤖 Models (1)

🟡 🤖 Models May 18, 2026 · 4 min read

GitHub Copilot: GPT-5.3-Codex becomes base model for Business and Enterprise with 12-month LTS guarantee

On May 17, 2026, GitHub announced that GPT-5.3-Codex replaces GPT-4.1 as the base model for Copilot Business and Enterprise. The change applies only to enterprise tiers (not Copilot Pro, Pro+, or Free). GPT-5.3-Codex is the first LTS (long-term support) model — guaranteed availability for 12 months from February 5, 2026 to February 4, 2027. Pricing: 1× premium request multiplier; GPT-4.1 remains force-enabled at 0× multiplier until deprecation on June 1, 2026.

📦 Open Source (6)

🟡 📦 Open Source May 18, 2026 · 3 min read

arXiv:2605.15041 CAST Framework: Case-Based Calibration for LLM Tool Use Achieves +5.85pp BFCLv2 and -26% Reasoning Length

Editorial illustration: LLM agent with a case library view and tool call validation indicators.

CAST is a new arXiv paper published on May 14, 2026, by Renning Pang, Tian Lan, Leyuan Liu, Piao Tong, Sheng Cao, and Xiaosong Zhang, introducing a case-based calibration framework for LLM tool use. The approach treats historical execution trajectories as structured information for reinforcement learning — achieving up to +5.85 percentage points execution accuracy improvement over the BFCLv2 baseline and a 26% reduction in average reasoning length.

🟢 📦 Open Source May 18, 2026 · 5 min read

arXiv:2605.15706 Differentiable Mixture-of-Agents: dynamic per-step agent routing achieves SOTA across 9 benchmarks

Editorial illustration: neural network routing graph with LLM agents and sparse activation indicators.

Differentiable Mixture-of-Agents is a new arXiv paper published on May 15, 2026 by Xingjian Wu, Junkai Lu, Siyu Yan, Xiangfei Qiu, Jilin Hu, Chenjuan Guo, and Bin Yang that introduces a differentiable routing mechanism for multi-agent LLM collaboration. The system dynamically selects and activates agents per reasoning step instead of using fixed topologies, achieves SOTA results across 9 benchmarks, and adapts at test-time without external annotations via predictive entropy self-supervision.

🟢 📦 Open Source May 18, 2026 · 4 min read

arXiv:2605.15100 Dual-Dimensional Consistency: 10× Token Consumption Reduction with Maintained Accuracy Across Five Benchmarks

Editorial illustration: parallel reasoning paths with confidence score badges and a pruning icon.

Dual-Dimensional Consistency is a new arXiv paper published on May 14, 2026, by Rongman Xu, Yifei Li, Tianzhe Zhao, Yanrui Wu, Bo Li, and Hang Yan addressing inference-time scaling efficiency. The framework combines a Confidence-Weighted Bayesian protocol and Trend-Aware Stratified Pruning — across five benchmarks it demonstrates over 10× reduction in token consumption while maintaining or improving accuracy over strong baselines.

🟢 📦 Open Source May 18, 2026 · 3 min read

arXiv:2605.15015 Small Private LM: Competitive Results in Educational Assessment Design with Human-in-the-Loop Recommendations

Editorial illustration: classroom scene with small LM icons, a Bloom's taxonomy pyramid, and a human reviewer depiction.

Small, Private Language Models as Teammates for Educational Assessment Design is a new arXiv paper published on May 14, 2026, by Chris Davis Jaldi, Anmol Saini, Shan Zhang, Noah Schroeder, Cogan Shimizu, and Eleni Ilkou. A systematic comparison of smaller models against larger alternatives in generating pedagogically aligned assessment questions — smaller models reach competitive results with privacy benefits, but the authors emphasize that model-based evaluations show systematic inconsistencies and recommend a Human-in-the-Loop approach.

🟢 📦 Open Source May 18, 2026 · 3 min read

arXiv:2605.15109 Traversal Context: Agentic GraphRAG Must Document Visited-but-Uncited Entities for True Provenance

Editorial illustration: knowledge graph with a highlighted traversal path and citation badge annotations alongside uncited nodes.

Why Neighborhoods Matter is a new arXiv paper published on May 14, 2026, by Riccardo Terrenzi, Maximilian von Zastrow, and Serkan Ayvaz (accepted for the IJCAI-ECAI 2026 Joint Workshop on GENAIK and NORA). The authors argue that agentic GraphRAG systems must treat citation faithfulness as a trajectory-level problem — true provenance covers not only cited evidence, but also visited-but-uncited entities that influence model reasoning.

🟢 📦 Open Source May 18, 2026 · 3 min read

xAI SDK Python v1.13.0: prepare_extension() Enables Batch Video Extension for Generated Clip Series

Editorial illustration: a series of video frames with a batch processing icon and xAI SDK code snippet visualization.

xAI SDK Python v1.13.0 was released on May 16, 2026 (commit author @double-di, PR #141) and introduces the new prepare_extension() method for batch video extension. The function extends the video API introduced in v1.10.0 by adding batch processing capabilities — developers can now prepare extensions for a series of clips in a single call instead of sequentially for each individual clip.

🤝 Agents (3)

🟡 🤝 Agents May 18, 2026 · 4 min read

arXiv:2605.16217 Argus: evidence assembly architecture for deep research agents achieves +12.7pp with 8 parallel searchers

Editorial illustration: knowledge graph with evidence nodes and parallel searcher agents around a central navigator.

Argus is a new arXiv paper published on May 15, 2026 by Zhen Zhang, Liangcai Su, Zhuo Chen, and colleagues that presents an evidence assembly framework for deep research agents. The system uses a dual-agent architecture — Searcher (ReAct-style traces) + Navigator (shared evidence graph + RL synthesis) — achieving +5.5pp with a single Searcher, +12.7pp with 8 parallel, and a score of 86.2 on BrowseComp with 64 parallel searchers without exceeding context limits.

🟡 🤝 Agents May 18, 2026 · 4 min read

GitHub Copilot: Grok Code Fast 1 Deprecated May 15, 2026; Recommended Replacements GPT-5 mini and Claude Haiku 4.5

Editorial illustration: deprecated stamp on the xAI Grok icon with arrows toward GPT-5 mini and Claude Haiku 4.5 logos.

GitHub formally deprecated the Grok Code Fast 1 model on May 15, 2026, across all Copilot experiences (Chat, inline edits, ask, agent mode, code completions). The deprecation comes one week after the announcement on May 8. Recommended replacements: GPT-5 mini and Claude Haiku 4.5 — both available through standard model policies. Enterprise admins must enable alternatives through Copilot settings.

🟢 🤝 Agents May 18, 2026 · 4 min read

Databricks + Veeva Vault CRM: three specialized AI agents for life sciences commercial workflows

Editorial illustration: pharma sales rep with tablet and AI agent overlay with patient data dashboard.

On May 18, 2026, Databricks announced a partnership with Veeva Systems that integrates Genie AI agents directly into Vault CRM workflows for the life sciences industry. Three specialized agent personas — Sales Rep Agent, Medical Science Liaison (MSL) Agent, and Territory Manager Agent — access the Databricks lakehouse through Unity Catalog governance. The announcement precedes the Veeva Commercial Summit in Boston (May 19–20, 2026).

🛡️ Security (1)

🟡 🛡️ Security May 18, 2026 · 5 min read

arXiv:2605.15338 Sleeper Memory Poisoning: 99.8% attack success rate on GPT-5.5 via persistent memory of LLM agents

Editorial illustration: LLM agent memory store with dormant adversarial tokens and wake-up trigger icons.

Hidden in Memory is a new arXiv paper published on May 14, 2026 by Sidharth Pulipaka, Stanislau Hlebik, Leonidas Raghav, Sahar Abdelnabi, Vyas Raina, Ivaxi Sheth, and Mario Fritz that presents a delayed-execution attack on stateful LLM agents. Adversarial content in external context (documents, webpages) corrupts the agent's persistent memory — 99.8% success on GPT-5.5 and 95% on Kimi-K2.6, with 60–89% success converting poisoned memory into attacker-intended actions.

← Previous day Next day →