Latest AI News

Last 72 hours, organized by category

🔴 🤝 Agents June 19, 2026 · 3 min read

Anthropic: Project Fetch Phase Two Shows 20× Faster Robotic Operation with 10× Less Code

Editorial illustration: Project Fetch — phase two shows 20× faster robotic operation with 10× less code

Claude Opus 4.7 autonomously controlled a robotic quadruped and completed tasks 20× faster than a human team while writing ~10× less code with equal or better results, though precise closed-loop control remains a challenge.

🟡 🤝 Agents June 19, 2026 · 2 min read

arXiv:2606.18543: CEO-Bench — Can Agents Run a Startup for the Long Term?

Editorial illustration: arXiv:2606.18543: CEO-Bench — can agents run a startup for the long term?

CEO-Bench is a benchmark simulating 500 days of running a startup that tests AI agents' ability to make business decisions without supervision. Only Claude Opus 4.8 and GPT-5.5 exceed the initial capital of $1 million, but no model achieves consistent profit.

🟡 🤝 Agents June 19, 2026 · 2 min read

AWS: Amazon Bedrock AgentCore Harness Reaches General Availability — Deploy in 2 API Calls

Editorial illustration: Amazon Bedrock AgentCore Harness reaches general availability — deploy in 2 API calls

Amazon Bedrock AgentCore Harness has exited preview and is now available to everyone in production. Deploying a production AI agent now requires only 2 API calls, and the platform supports Claude, Nova, Llama, DeepSeek, GPT-5.5, and GPT-5.4 with the ability to switch models mid-session without losing context.

🟢 🤝 Agents June 19, 2026 · 2 min read

GitHub: AI Agents Can Now Create Fully Triaged Issues Without Human Intervention

Editorial illustration: AI agents can now create fully triaged issues without human intervention

GitHub introduced real-time duplicate detection when creating issues and expanded the MCP server with support for reading and writing issue fields. AI agents can now create issues with automatically set metadata, eliminating manual triaging.

🟡 🤝 Agents June 18, 2026 · 2 min read

AWS: Bedrock AgentCore Gets Web Search, Payments, and A/B Testing for Agents

Editorial illustration: AI agents with expanded knowledge and continuous learning capabilities

AWS has expanded Amazon Bedrock AgentCore with three knowledge layers: a managed knowledge base with an agentic retriever, web search on Amazon infrastructure, and AgentCore Payments for licensed content access. New additions include Monitoring to detect silent behavioral errors, Optimization with A/B testing in production, and Guardrails evaluating every agent action for prompt injection and data exposure. AWS WAF simultaneously introduces AI traffic monetization.

🟡 🤖 Models June 19, 2026 · 2 min read

arXiv:2606.19327: Rubric-Conditioned Self-Distillation Outperforms GRPO in Reasoning Model Training

Editorial illustration: arXiv:2606.19327: Rubric-conditioned self-distillation outperforms GRPO in reasoning model training

Rubric-conditioned self-distillation is a new method for training reasoning models that outperforms GRPO by +1.0 point and OPSD by +0.9 points on scientific reasoning benchmarks. Instead of scalar rewards, the approach converts rubrics into token-level guidance for more precise credit assignment.

🟡 🤖 Models June 19, 2026 · 1 min read

OpenAI: GPT-5.5 Instant Advances Health Intelligence in ChatGPT

Editorial illustration: GPT-5.5 Instant advances health intelligence in ChatGPT

GPT-5.5 Instant improves health and wellness responses in ChatGPT through stronger reasoning and better context. A group of physicians conducted the evaluation, with the goal of clearer and more reliable communication on medical topics.

🔴 🤖 Models June 18, 2026 · 2 min read

Google: AMIE Reached Primary Care Physician Level in Nature Study on Disease Management

Editorial illustration: medical AI system for disease management in a virtual clinic

Google published a study in Nature in which the medical AI AMIE matched the level of 21 primary care physicians in disease management in a blinded trial, and was statistically superior in plan precision and guideline adherence. AMIE uses an empathetic dialogue agent and a deep reasoning agent that cross-checks hundreds of pages of clinical guidelines. Google subsequently launched a national randomised study in real virtual clinics.

🟡 🤖 Models June 18, 2026 · 2 min read

GitHub: Copilot HyDRA Router Automatically Selects Model and Delivers 72.5% Savings at Same Quality

Editorial illustration: automatic routing of queries to different AI models

GitHub introduced HyDRA, a model routing system that automatically selects the AI model for Copilot based on reasoning depth, code complexity, and tool orchestration needs. HyDRA delivers 72.5% cost savings while maintaining quality, and in conservative mode achieves a 70.8% task resolution rate with 3.3x savings versus OpenRouter Auto. It adds prompt caching and Tool Search, with routing accuracy within 4 percentage points of the English baseline across 19 languages.

🟡 🤖 Models June 18, 2026 · 2 min read

OpenAI: Near-Autonomous AI Chemist Based on GPT-5.4 Improves Reaction in Drug Synthesis

Editorial illustration: AI system optimising a chemical reaction in pharmaceutical research

OpenAI and Molecule.one demonstrated a near-autonomous AI chemist based on GPT-5.4 that improved a key reaction in drug synthesis with minimal human intervention. The system optimises reactions in medicinal chemistry, marking a step towards AI-driven pharmaceutical research and development. The full publication was not available, so this article is based on the official OpenAI description.

🏥 In Practice

More in In Practice
🟡 🏥 In Practice June 19, 2026 · 2 min read

arXiv:2606.19245: TxBench-PP — AI Agents in the Search for New Drugs

Editorial illustration: arXiv:2606.19245: TxBench-PP — AI agents in the search for new drugs

TxBench-PP is a benchmark testing AI agents in preclinical small-molecule pharmacology across 4,800 trajectories and 11 models. Claude Opus 4.8 achieves 59.3% success and leads ahead of GPT-5.5 at 55.3%, but no model reaches the reliability needed for medical application.

🟡 🏥 In Practice June 19, 2026 · 2 min read

GitHub: MAI-Code-1-Flash Available on 8 Development Surfaces Within Copilot

Editorial illustration: MAI-Code-1-Flash available on 8 development surfaces within Copilot

MAI-Code-1-Flash, Microsoft's compact coding model positioned as best-in-class for its size, is now available on 8 development surfaces within GitHub Copilot — from CLI to mobile platforms — across all plans from free to max tier.

🟡 🏥 In Practice June 19, 2026 · 2 min read

OpenAI: AI Helps Physicians Diagnose Rare Genetic Diseases in Children

Editorial illustration: AI helps physicians diagnose rare genetic diseases in children

OpenAI's reasoning model identified 18 new diagnoses in previously unsolved cases of rare genetic diseases in children, collaborating with physicians and research institutions. The results open new questions about the role of AI in clinical diagnostics.

🟢 🏥 In Practice June 19, 2026 · 2 min read

PyTorch: LLMs Reduce GPU Kernel Optimization from Minutes to Seconds

Editorial illustration: LLMs reduce GPU kernel optimization from minutes to seconds

The PyTorch core team published LLM-guided autotuning for Helion kernels that accelerates GPU code optimization from minutes to seconds. Instead of exhaustive search across all configurations, large language models intelligently guide the parameter space search.

🟡 🏥 In Practice June 18, 2026 · 2 min read

GitHub: Copilot desktop app now generally available — parallel sessions and cloud automations

Editorial illustration: desktop application for an AI coding assistant

GitHub has announced that the Copilot desktop application for macOS, Windows, and Linux is now generally available. The app brings parallel sessions, a canvas interface, cloud automations, and integration of custom models and tools. It unifies GitHub Spark, Copilot Chat, and Copilot CLI into a single desktop experience. The release is part of a broader wave of Copilot announcements the same day, including HyDRA model routing and Auto mode available to all users.

⚖️ Regulation

More in Regulation
🟢 ⚖️ Regulation June 18, 2026 · 2 min read

EU: 2026 State of the Digital Decade Report calls for closing structural gaps by 2030.

Editorial illustration: European digital and technological sovereignty

The European Commission published its fourth annual State of the Digital Decade report, assessing the EU's progress toward 2030 digital targets across infrastructure, business, skills, and public services. The report recognises AI, semiconductors, cloud, and open source as pillars of European technological sovereignty, and identifies delivery of results at scale, speed, and consistency as the key challenge. According to Eurobarometer, a large majority of Europeans rank digital policy at the top of EU priorities.

🟢 ⚖️ Regulation June 17, 2026 · 2 min read

CNCF: Architectural Patterns for Digital Sovereignty of Cloud-Native Platforms Under EU Regulation

Editorial illustration: digital sovereignty and separate jurisdictions in cloud infrastructure

CNCF published a guide on architectural patterns for digital sovereignty of cloud-native platforms, as the EU Data Act has been fully applicable since January 11, 2025. Regulators expect four properties: jurisdictional boundedness, operational autonomy, cryptographic access control, and portability. A single Kubernetes cluster is insufficient because a shared control plane means shared risk; the proposed pattern gives each jurisdiction its own control plane (vCluster), with GitOps declaration of boundaries.

🟢 ⚖️ Regulation June 16, 2026 · 2 min read

arXiv:2606.16723: AgentFairBench Measures Demographic Discrimination in LLM Agent Actions

Editorial illustration: measuring fairness in AI agent decisions

AgentFairBench is the first benchmark that measures demographic inequality in the actual actions of LLM agents — not just in their responses — across employment, lending, and medical triage domains. It uses counterfactual flip rate and action-rate disparity metrics and tests four agent scaffolds. In a pilot of 864 decisions, Claude Haiku showed no demographic effect above the noise floor, and the paper warns that naive comparison of six groups can overestimate inequality by roughly 2.4 times.

🟢 ⚖️ Regulation June 16, 2026 · 2 min read

arXiv:2606.17005: Bayesian Framework for Auditing Reveals That AI Leaderboards Hide Incompatible Histories

Editorial illustration: auditing the history of results on AI model leaderboards

The paper introduces a Bayesian audit framework showing that a single final leaderboard snapshot of 1,000 systems can correspond to multiple incompatible historical trajectories, with convergence times ranging from 23 to 75 steps. Drawing on archived data from LiveBench, Open LLM Leaderboard v2, LMArena, GAIA, and tau-bench, the author proposes an archive-and-adjudication protocol for reconstructing scoring history and rejecting unsubstantiated claims about frontier models.

🛡️ Security

More in Security
🔴 🛡️ Security June 19, 2026 · 3 min read

Google DeepMind: Over 50% of Agent Security Incidents Are Mistakes, Not Attacks

Editorial illustration: More than 50% of agent security incidents are mistakes, not attacks

Google DeepMind analyzed one million agent coding trajectories and found that more than 50% of flagged security incidents in AI agents stem from task misinterpretation or model overeagerness rather than external adversarial attacks, shifting the priorities in defense.

🟡 🛡️ Security June 19, 2026 · 2 min read

GitHub: Two Security Upgrades to GitHub Actions Protect Against Pwn Request Attacks

Editorial illustration: Two security upgrades to GitHub Actions protect against pwn request attacks

GitHub released two complementary security upgrades for Actions in a single day: actions/checkout@v7 blocks pwn request attacks from fork PRs, and a new Workflow execution protections feature enables admin allow-lists by actor and event type for entire organizations.

🟡 🛡️ Security June 18, 2026 · 2 min read

arXiv:2606.18060: PseudoBench Shows Agentic AI Spreads Pseudoscience with Near-Zero Rejection Rate

Editorial illustration: AI agents generating convincing but false scientific claims

The new benchmark PseudoBench tested seven top AI agents on 200 pseudoscientific claims across five domains and found a near-zero rejection rate — the highest resistance measured was only 27.4%. Paradoxically, stronger models package pseudoscience in more sophisticated academic language, increasing the risk. The authors warn that 'scientific alignment' is necessary before the mass deployment of autonomous research agents that generate convincing fake studies from experiment design through to writing.

🟡 🛡️ Security June 17, 2026 · 1 min read

Anthropic: Red Team Maps AI-Enabled Cyberattacks to MITRE ATT&CK Framework, in Partnership with Verizon

Editorial illustration: mapping AI-enabled cyberattacks to a security framework

Anthropic's Red Team published an analysis mapping real-world AI-weaponized cyberoperations to the MITRE ATT&CK framework, developed in partnership with Verizon. The work analyzes patterns of AI-assisted attacks observed in practice. In parallel, the Red Team also published an analysis of how large language models accelerate the exploitation of already-disclosed but unpatched (N-day) vulnerabilities.

🟡 🛡️ Security June 17, 2026 · 2 min read

AWS: New Bedrock InvokeGuardrailChecks API Brings Safety Checks Without Resources for Agentic Applications

Editorial illustration: safety checks in agentic AI applications

AWS introduced InvokeGuardrailChecks in Amazon Bedrock, a detection API that returns scores without automatically blocking and without requiring a pre-created guardrail resource. It covers three protections: content filters, prompt attack detection (jailbreak, injection, leakage) and PII recognition with 31 entity types. The API returns severity and confidence scores on a scale of 0 to 1.0, designed for multi-step agentic loops where the development team controls the thresholds.

🟢 💬 Community June 19, 2026 · 1 min read

GitHub: How Pull Request Limits Are Reducing Noise in Open Source Projects

Editorial illustration: How pull request limits are reducing noise in open source projects

GitHub is introducing pull request limits that give maintainers control over the number of simultaneous proposals from external contributors. Since January 2023, the number of merged PRs has grown from 25 million to 90+ million per month — a 3.6× increase — which has overwhelmed many projects.

🟡 💬 Community June 18, 2026 · 2 min read

Anthropic: Seoul office opens and new partnerships announced in the Korean AI ecosystem

Editorial illustration: global expansion of an AI company into the Asian market

Anthropic has opened an office in Seoul, its first in Asia, and announced new collaborations within the Korean AI ecosystem. The move is part of the company's broader global expansion beyond the US and Europe. Details of individual partnerships were not available in the source and are deliberately omitted here. South Korea is a significant market with a strong technology industry and a high rate of AI adoption.

🟡 💬 Community June 17, 2026 · 2 min read

GitHub: GitHub Models Closed to New Customers — GitHub Withdraws from Model Serving

Editorial illustration: closure of access to an AI model serving platform

GitHub closed GitHub Models — its playground, API, and model catalog — to new organizations as of June 16, 2026. Existing users retain full access for now, with details on a full shutdown to follow. GitHub directs new users to Azure AI Foundry as an alternative. No reason for the withdrawal was stated in the announcement, but the move signals that GitHub is stepping back from directly serving AI models to developers.

🟡 💬 Community June 16, 2026 · 1 min read

OpenAI: Partner Network Launched with $150 Million to Accelerate Enterprise AI Adoption

Editorial illustration: global network of business partners around an AI platform

OpenAI has launched the Partner Network, a program backed by $150 million for global partners to accelerate enterprise AI adoption. The initiative aims to help companies with implementation, deployment, and business transformation through AI. The announcement was published on June 14, 2026. The full text of the post was not accessible, so this article is based on the official OpenAI post description.

🟡 🔧 Hardware June 19, 2026 · 2 min read

AMD: Analysis of RoCE Network Traffic Patterns in Large Language Model Training

Editorial illustration: Analysis of RoCE network traffic patterns in large language model training

AMD published a comparative analysis of RoCE network traffic patterns during the training of four large LLMs — GPT-4, Llama 3, DeepSeek-V2, and Grok 4.0 — as a practical guide for building AI infrastructure in scale-out clusters with multiple GPU nodes.

🟢 🔧 Hardware June 18, 2026 · 1 min read

AMD: Open-source Schola connects Unreal Engine and reinforcement learning for robotic arm training on ROCm

Editorial illustration: reinforcement learning training of a robotic arm in simulation

AMD introduced Schola, an open-source Unreal Engine plugin that enables Gymnasium-compatible reinforcement learning training through Python frameworks and gRPC. In the example, a collaborative robotic arm xArm6 is trained in Unreal Engine 5.7 with MuJoCo physics, the PPO algorithm, and PyTorch on the AMD ROCm stack for GPU acceleration. The tutorial demonstrates a reach task in which the arm tip moves to randomly placed target locations.

🟡 🔧 Hardware June 17, 2026 · 2 min read

AMD: Instinct MI355X in MLPerf Training v6.0 Within 5% of NVIDIA, 3.5× Faster Than Previous Generation

Editorial illustration: AMD Instinct MI355X accelerator in a data center

AMD's MLPerf Training v6.0 results show that the Instinct MI355X is within approximately 5% of an equivalent NVIDIA GPU's performance on LLM benchmarks. MI355X is 3.5× faster than last year's MI300X and 13–19% faster than the previous round. AMD introduced MXFP4 (FP4) training recipes and the Primus unified framework for the first time, alongside a multi-node submission of 512 MI300X GPUs across 64 nodes.

🟡 🔧 Hardware June 17, 2026 · 2 min read

NVIDIA: Blackwell Sweeps MLPerf Training 6.0 — Fastest on All 7 Benchmarks, GB300 Up to 1.6× Faster

Editorial illustration: NVIDIA Blackwell GPU cluster for AI model training

NVIDIA announced that its Blackwell platform achieved the best results on all seven MLPerf Training 6.0 benchmarks, cementing dominance in large-model training. GB300 NVL72 delivers up to 1.6× faster training than GB200 NVL72, and the largest submission used 8,192 Blackwell GPUs on the DeepSeek-V3 model with 671 billion parameters. CoreWeave trained DeepSeek-V3 671B in 2.02 minutes on 8,192 GPUs, while Microsoft Azure completed Llama 3.1 405B in 7.07 minutes.

🟡 🔧 Hardware June 16, 2026 · 2 min read

AMD: New ATOM Inference Engine for Instinct GPUs Brings OpenAI-Compatible API and MoE Optimizations

Editorial illustration: AMD Instinct GPU stack for serving AI models

AMD has introduced ATOM, an inference engine for Instinct GPUs that exposes an OpenAI-compatible API and orchestrates KV cache, scheduling, and parallelism. ATOM sits at the top of the ROCm stack, alongside AITER kernels and MoRI RDMA communication, supports TP, DP, and EP parallelism, and is optimized for MoE models such as DeepSeek V2–V4, Mixtral, and Qwen3-MoE. It offers FP8, MXFP4, INT8, and INT4 quantization and MTP speculative decoding with an EAGLE proposer.

📦 Open Source

More in Open Source
🟡 📦 Open Source June 19, 2026 · 2 min read

Black Forest Labs: Robin Rombach Calls on G7 Leaders to Support Open AI Development

Editorial illustration: Robin Rombach calls on G7 leaders to support open AI development

Robin Rombach, co-founder and CEO of Black Forest Labs (creators of the FLUX models), addressed G7 leaders with a call for open and responsible AI development to become a global norm. Rombach argues that public availability of AI parameters drives innovation and democratizes technology.

🟡 📦 Open Source June 18, 2026 · 2 min read

Allen Institute: Open-Source MolmoMotion Predicts 3D Motion from Video and Sets SOTA in Robotics

Editorial illustration: predicting 3D object trajectories for robotic manipulation

Allen Institute released MolmoMotion, a fully open-source model that predicts 3D object trajectories from video and natural language instructions such as 'rotate the bowl'. The model achieves state-of-the-art on PointMotionBench with 0.109 m average displacement versus 0.134 m for the previous record, and raises pick-and-place task success in robotics from 56% to 76.3%, a gain of 20.3 percentage points. It was trained on the MolmoMotion-1M dataset of 1.16 million videos with 3D trajectories and action descriptions.

🟡 📦 Open Source June 17, 2026 · 1 min read

vLLM: Semantic Router Fusion Combines a Model Panel with a Judge That Synthesizes a Single Response

Editorial illustration: panel of AI models and a judge model synthesizing a single response

vLLM introduced Semantic Router Fusion, a primitive in which multiple models work in parallel as a panel, and a judge model analyzes consensus and differences to synthesize a single response. It supports local vLLM and private endpoints as well as public providers such as Gemini, Kimi, DeepSeek, and Claude. External validation on OpenRouter DRACO showed 69% for the fused panel versus 65.3% for the best single model, with full OpenAI API compatibility.

🟡 📦 Open Source June 16, 2026 · 2 min read

GitHub: Open Multilingual Repositories Dataset with 80 Million Rows and 40 Million Repositories

Editorial illustration: multilingual dataset from open code repositories

GitHub has published the Multilingual Repositories Dataset with more than 80 million classification rows across 40+ million repositories, under the fully open CC0-1.0 license. For each repository the dataset records three text sources — the README, the most-commented issue, and the most-commented pull request — alongside language detection through three tools: fastText, gcld3, and lingua-py. Portuguese leads among non-English README files, while Korean is the most represented in issue discussions.

🟢 📦 Open Source June 16, 2026 · 2 min read

CNCF: Oracle's $3 Million Donation in OCI Credits Accelerates Arm64 Support in 12+ Projects

Editorial illustration: Arm64 cloud infrastructure for open-source projects

CNCF reports that Oracle's donation of $3 million in OCI compute credits enables Arm64 CI/CD support in more than 12 projects, including OpenTelemetry, containerd, Falco, Longhorn, Crossplane, and Jaeger. Demand quickly outpaced the initial guideline of $5,000 per month. The shift follows data showing that more than 50 percent of new AWS instances and 33 percent of Azure instances now run on Arm64 architecture.