Wednesday, April 22, 2026

16 articles — 🔴 3 critical , 🟡 7 important , 🟢 6 interesting

🤖 Models (2)

🟢 🤖 Models April 22, 2026 · 2 min read

MathNet: 30,676 olympiad problems from 47 countries, SOTA models still fall short

Editorial illustration: Connected nodes with mathematical symbols and globe fragments from 47 countries

An MIT team published MathNet, a multimodal benchmark with 30,676 olympiad math problems from 47 countries and 17 languages. Gemini-3.1-Pro achieves 78.4%, GPT-5 69.3%, and embedding models have significant difficulty finding mathematically equivalent problems.

🟢 🤖 Models April 22, 2026 · 3 min read

xAI Speech-to-Text API exits beta: general availability for 25 languages

Editorial illustration: Microphone and audio wave streams converting into transcripts in 25 languages through the Grok API

xAI has announced that its Speech-to-Text (STT) API is moving from beta to general availability. The service supports 25 languages, offers batch and streaming modes, and is available without a waitlist — completing the voice stack alongside the previously GA-released Grok Voice Agent.

📦 Open Source (1)

🟢 📦 Open Source April 22, 2026 · 2 min read

WorldDB: memory engine combining vector database and world graph for AI agents

Editorial illustration: Vector database and graph of nodes connected by ontology in agent memory

WorldDB is a new memory engine for AI agents that combines a vector database with a graph of nested worlds and an ontology. It reconciles knowledge at write time and prevents contradictions in agent memory, achieving 96.40% accuracy on the LongMemEval-s benchmark.

⚖️ Regulation (1)

🟡 ⚖️ Regulation April 22, 2026 · 3 min read

EU opens call for AI disinformation and deepfake influence campaign research

Editorial illustration: Scales of justice with a broken screen and robotic arm, EU fight against AI disinformation

The European Commission opened the DIGITAL-2026-BESTUSE-AWARENESS call worth 6 million euros to build a joint research framework against manipulative AI content. The call delivers the European Democratic Shield announced in November 2025 and accepts applications until October 1, 2026.

🤝 Agents (5)

🔴 🤝 Agents April 22, 2026 · 4 min read

Google ReasoningBank: agents learn from experience without retraining, +8.3% success on WebArena

Editorial illustration: Robot in a maze with illuminated nodes representing learned experience

Google introduced ReasoningBank, a memory framework that enables AI agents to learn from their own successes and failures without retraining the language model. On the WebArena benchmark it achieved 8.3% higher success rate, and on SWE-Bench-Verified 4.6% with approximately 3 fewer steps per task.

🔴 🤝 Agents April 22, 2026 · 4 min read

OpenAI scales Codex to enterprise: Codex Labs program and 4 million weekly active users

Editorial illustration: Futuristic cityscape with AI entity and corporate skyscrapers alongside code screens

OpenAI launched the Codex Labs program and strategic partnerships with Accenture, Deloitte, and KPMG to bring the Codex agent to large enterprises worldwide. The tool has reached 4 million weekly active users, offers certifications for consultants, and enterprise packages with a consumption-based billing model.

🟡 🤝 Agents April 22, 2026 · 2 min read

Agent-World: scalable environment synthesis for AI agent evolution from Renmin University

Editorial illustration: Dynamic environments with landscapes and cities automatically generated for AI agent training

Agent-World is a new research framework from China's Renmin University that automatically generates thousands of diverse environments for training AI agents. It replaces manually crafted benchmarks with dynamic scenarios and enables evolutionary learning through co-evolution of agent and environment.

🟡 🤝 Agents April 22, 2026 · 3 min read

Gemini Deep Research gets MCP integration, collaborative planning, and two new versions

Editorial illustration: Robot silhouette with modular servers and data flows for the Deep Research agent

Google launched two new Deep Research agent versions in the Gemini API — deep-research-preview-04-2026 and deep-research-max-preview-04-2026 — with MCP server integration, collaborative planning, visualizations, and streaming responses. The move positions Gemini as a serious competitor to ChatGPT Deep Research and Perplexity Deep Research.

🟡 🤝 Agents April 22, 2026 · 3 min read

Multi-Agent Systems survey: from classical paradigms to a large model-driven future

Editorial illustration: Connected AI agents in communication bridging classical paradigm with the modern LLM era

A new arXiv survey comprehensively bridges classical Multi-Agent Systems literature with the modern LLM-agent stack. The paper identifies a paradigm shift in coordination, communication protocols, and emergent behavior — from low-level state exchange to semantic reasoning.

🔧 Hardware (1)

🔴 🔧 Hardware April 22, 2026 · 3 min read

Google unveils 8th-generation TPU chips: two specialized variants for the agentic AI era

Editorial illustration: Two specialized 8th-generation TPU chips for training and inference of agentic AI workloads

At Cloud Next '26, Google introduced the eighth generation of its TPU chips in two specialized variants — TPU 8t for model training and TPU 8i for agentic inference. This is the first generation purpose-built for autonomous AI agents and multi-step reasoning.

🏥 In Practice (3)

🟡 🏥 In Practice April 22, 2026 · 3 min read

Claude Cowork comes to Amazon Bedrock — AI for entire organizations

Editorial illustration: Claude Cowork application on desktop in an AWS Bedrock environment for enterprise teams

AWS and Anthropic enable running the Claude Cowork desktop application within AWS accounts via Amazon Bedrock. Data remains under user control, models are not trained on it, and integration with IAM and CloudTrail provides enterprise-grade auditing. Payment goes through existing AWS contracts.

🟢 🏥 In Practice April 22, 2026 · 2 min read

HolmesGPT and CNCF tools auto-diagnose Kubernetes alerts for $0.04

Editorial illustration: Kubernetes dashboard with alerts and robotic arm for automatic diagnosis

The STCLab SRE team uses HolmesGPT with the ReAct pattern and CNCF tools for automatic diagnosis of Kubernetes alerts. The cost is $0.04 per investigation, around 40% of alerts are resolved autonomously, and the most important lesson: quality runbooks matter more than model choice.

🟢 🏥 In Practice April 22, 2026 · 2 min read

On-device psychiatric AI: Gemma, Phi, and Qwen run without sending data to the cloud

Editorial illustration: Mobile device with psychiatric AI application and local neural networks

Researchers led by Eranga Bandara published a mobile application that locally orchestrates Gemma, Phi-3.5-mini, and Qwen2 for DSM-5 aligned psychiatric assessments. The system sends no data to the cloud and targets sensitive contexts such as the military, criminal justice, and remote healthcare.

🛡️ Security (3)

🟡 🛡️ Security April 22, 2026 · 3 min read

DESPITE benchmark: LLMs plan well for robots, but not safely

Editorial illustration: Robot planning a path through a maze with a fragile digital safety shield

The new DESPITE benchmark evaluated 23 language models on 12,279 robot planning tasks. Result: the best planner fails in only 0.4% of cases, but produces dangerous plans in 28.3%. Planning and safety are orthogonal capabilities — scaling models does not fix safety shortcomings.

🟡 🛡️ Security April 22, 2026 · 3 min read

HuggingFace manifesto: open source as the foundation of AI cybersecurity

Editorial illustration: Broken digital shield reinforced with open-source blocks as the foundation of AI security

HuggingFace published a manifesto in which Margaret Mitchell, Yacine Jernite, Clem Delangue, and 17 co-authors argue that closed AI systems are a single point of failure in cybersecurity. The text responds to Anthropic's Mythos and calls for semi-autonomous agents with auditable logs and human oversight.

🟢 🛡️ Security April 22, 2026 · 2 min read

GitHub CodeQL gets declarative sanitizers and validators — no QL code required

Editorial illustration: Digital fortress with locks for 8 programming languages protecting code from vulnerabilities

GitHub enabled declarative definition of sanitizers and validators in CodeQL 2.25.2 through YAML, without writing QL code. Eight languages are supported (C/C++, C#, Go, Java/Kotlin, JS/TS, Python, Ruby, Rust), democratizing static security analysis for teams without QL experts.

← Previous day Next day →