🤝 Agents

12 articles

🔴 🤝 Agents April 14, 2026 · 1 min read

OpenAI and Cloudflare: GPT-5.4 and Codex power new Agent Cloud platform for enterprise

Cloudflare has integrated OpenAI's GPT-5.4 and Codex models into its new Agent Cloud platform, enabling enterprise users to build, deploy, and scale AI agents for real-world business tasks with an emphasis on speed and security.

🟡 🤝 Agents April 14, 2026 · 2 min read

AI2: AI agents solve 80% of school-level science but only 20% of real scientific problems

The Allen Institute for AI analyzes two benchmarks that reveal a dramatic gap between AI performance on knowledge tests and the ability to make real scientific discoveries. While models reach 80% at the school level, they drop to 20% on complex scientific tasks.

🟡 🤝 Agents April 14, 2026 · 2 min read

ArXiv HiL-Bench: Do AI agents know when to ask a human for help?

The new HiL-Bench benchmark measures the ability of AI agents to recognize their own limitations and ask for human help instead of guessing. Results show that even frontier models poorly judge when they need help, but targeted training can improve this ability.

🔴 🤝 Agents April 13, 2026 · 2 min read

ArXiv HiL-Bench: no frontier model knows when to ask for help

A new benchmark reveals a universal judgment deficiency in AI agents — when specifications are incomplete, no frontier model achieves more than a fraction of its full performance. Researchers show this skill can be trained with RL.

🟢 🤝 Agents April 13, 2026 · 2 min read

ArXiv SAGE: 27 LLMs tested — models understand intent but don't execute correctly

A new benchmark for customer services reveals two phenomena: 'Execution Gap' (models correctly classify intents but don't perform the correct actions) and 'Empathy Resilience' (models remain polite while making logical errors).

🟡 🤝 Agents April 12, 2026 · 2 min read

GitHub Copilot CLI: Official Beginner's Guide — Delegating Tasks to Cloud Agents from the Terminal

On April 10, GitHub published an official tutorial for the Copilot CLI tool. The guide covers installation via npm, authentication with a GitHub account, and practical examples — including delegating tasks to cloud agents.

🟡 🤝 Agents April 11, 2026 · 2 min read

Anthropic publishes 'Trustworthy agents in practice' policy framework

Anthropic has published a comprehensive policy framework 'Trustworthy agents in practice' that defines what it means to develop, deploy, and use AI agents in a reliable manner. The document serves as a guide for companies building or using agents.

🟡 🤝 Agents April 11, 2026 · 2 min read

ArXiv PASK: proactive AI agents with long-term memory that predict user intent

A new paper, PASK, introduces a framework for proactive AI agents that combine intent detection, hybrid memory, and self-initiated action. The IntentFlow model reached the level of the leading Gemini 3 Flash models in recognizing latent user needs.

🟡 🤝 Agents April 11, 2026 · 2 min read

ArXiv SAVeR: self-auditing for LLM agents — verify before you execute (ACL 2026)

A new method, SAVeR (Self-Audited Verified Reasoning), accepted at ACL 2026, enables LLM agents to audit themselves before executing actions. The goal: to prevent coherent reasoning that violates logical constraints from leading to incorrect decisions.

🟢 🤝 Agents April 11, 2026 · 2 min read

ArXiv KnowU-Bench: new benchmark for interactive and proactive mobile AI agents

Researchers have introduced KnowU-Bench — a comprehensive benchmark for evaluating a new generation of mobile AI agents, focusing on interactivity, proactivity, and personalization through long-term use.

🟡 🤝 Agents April 10, 2026 · 2 min read

AWS Agent Registry: enterprise catalog of AI agents now in preview

Amazon has released a preview of AWS Agent Registry, a centralized catalog of AI agents, tools and agent skills for enterprise organizations. The system indexes agents regardless of where they are hosted (AWS, other clouds, on-premises) and uses a combination of keyword and semantic search along with IAM-based access control.

🟡 🤝 Agents April 10, 2026 · 2 min read

AWS Bedrock AgentCore: stateful MCP client enables interactive AI workflows

Amazon has extended Bedrock AgentCore Runtime with three new MCP capabilities — elicitation (requesting structured input from the user), sampling (requesting LLM completions from the client), and progress notifications. Stateful sessions can now last up to 8 hours in isolated microVMs and enable two-way communication between agent and client.