🏥 In Practice

100 articles

🟡 🏥 In Practice May 23, 2026 · 4 min read

arXiv:2605.22681: CUSP benchmark shows frontier models cannot reliably predict scientific breakthroughs

Editorial illustration: scientific curve with breakthrough point and an AI system missing the prediction

The CUSP benchmark tests AI models' ability to predict scientific breakthroughs from a database of 4,700 events. Frontier models (GPT-5, Claude Opus 4.7, Gemini 3 Pro) identify plausible research directions but systematically miscalibrate outcomes and timing with overconfidence. Additional pre-cutoff context does not help — the limitation is structural, not informational.

🟡 🏥 In Practice May 23, 2026 · 3 min read

GitHub: Gartner Magic Quadrant 2026 — GitHub Copilot Leader for the third consecutive year in Enterprise AI Coding Agents

Editorial illustration: quadrant matrix with GitHub Copilot positioned in the Leader sector

Gartner positioned GitHub as a Leader in its 2026 Magic Quadrant report for Enterprise AI Coding Agents — for the third consecutive year since the category was created. GitHub Copilot is currently used by 140,000 organizations worldwide, and the evaluation emphasized agentic workflows covering the full SDLC from code to review, security, and governance, not just code generation.

🟢 🏥 In Practice May 23, 2026 · 4 min read

arXiv:2605.22337: Meta-Soft introduces KV cache compression via composable meta-tokens and learnable orthogonal bases

Editorial illustration: meta-tokens compressing attention cache into an orthogonal basis structure

Researchers presented Meta-Soft, a new method for dynamic KV cache compression in LLM inference. The approach uses a learnable orthogonal basis matrix and a selector network that synthesize soft meta-tokens — a compressed representation of key information from a long prompt. An attention-flow mechanism redistributes semantic information from removed tokens into retained ones, outperforming existing KV cache eviction methods.

🟢 🏥 In Practice May 23, 2026 · 4 min read

arXiv:2605.22664: WorkstreamBench tests LLM agents on end-to-end spreadsheet tasks in finance — and frontier models fail

Editorial illustration: Excel spreadsheet with formulas and an AI agent analyzing them

WorkstreamBench is a new benchmark from a 10-author team led by Thomson Yen that tests LLM agents on real Excel and spreadsheet tasks in the financial domain — invoices, reports, cost analysis. GPT-4o, Claude, and Gemini are compared and none passes reliably through the full task set, pointing to structural shortcomings in current agentic infrastructure for enterprise finance.

🟢 🏥 In Practice May 23, 2026 · 2 min read

Anthropic Claude Code v2.1.150 — internal infrastructure patch with no user-facing changes

Editorial illustration: Claude Code terminal with version numbering and internal cogwheels

Anthropic released Claude Code CLI version v2.1.150 at 04:03 UTC on Saturday, just one day after v2.1.149. The release contains exclusively internal infrastructure improvements with no user-facing changes. Available for Darwin, Linux, and Windows on ARM64 and x64 architectures, as well as Linux musl builds.

🟡 🏥 In Practice May 22, 2026 · 3 min read

arXiv:2605.21427: PALS — power-aware LLM serving for MoE models achieves +26.3% energy efficiency and 4-7× fewer QoS violations

Editorial illustration: arXiv:2605.21427 — PALS achieves +26.3% energy efficiency and 4-7× fewer QoS violations in MoE LLM serving

Researchers published PALS on 21 May 2026 on the arXiv preprint server — a runtime system that integrates GPU power control directly into LLM serving for Mixture-of-Experts models. PALS uses lightweight offline power-performance models and a feedback controller that dynamically optimises configurations against throughput targets. It achieves 26.3% improvement in energy efficiency and 4-7× reduction in QoS violations under power constraints, integrates into vLLM without modifying the API or retraining models. It addresses a growing operational pain point for data centres — GPU cluster energy consumption that is becoming the dominant constraint on growth.

🟢 🏥 In Practice May 22, 2026 · 3 min read

CNCF: NetEase Games achieves 30-second LLM cold start on Kubernetes via Fluid prefetching layers

Editorial illustration: NetEase Games achieves 30-second LLM cold start on Kubernetes via Fluid prefetching layers

CNCF published a case study from NetEase Games on 21 May 2026 (authors Haifeng Liao and Xiang Zhang) describing how they reduced load times for 70B-class LLM models from 42 minutes (direct S3 access) to under 30 seconds using the CNCF-incubated Fluid project. The key is a Fluid prefetching layer that shares models between teams instead of duplicating caches, together with pre-warming scheduling that eliminates cold starts. A technical case study for everyone running serverless LLM inference on Kubernetes with large models.

🟡 🏥 In Practice May 21, 2026 · 2 min read

AWS: SageMaker AI gains OpenAI-compatible API endpoints

Editorial illustration: AWS SageMaker AI OpenAI-compatible API endpoints for drop-in replacement on own GPU infrastructure

AWS announced on 20 May 2026 that Amazon SageMaker AI now exposes an OpenAI-compatible API on its real-time inference endpoints — a drop-in replacement for the OpenAI SDK, LangChain, and Strands Agents that requires only a URL change. Endpoints expose the /openai/v1 path with Chat Completions and streaming support, while authentication uses time-limited bearer tokens generated from AWS credentials.

🟡 🏥 In Practice May 21, 2026 · 2 min read

Google DeepMind: WeatherNext predicted Hurricane Melissa as Category 5 five days ahead

Editorial illustration: Google DeepMind WeatherNext first 5-day Category 5 forecast for Hurricane Melissa

Google DeepMind announced on 19 May 2026 that its WeatherNext model accurately predicted Hurricane Melissa would reach Category 5 a full five days in advance with 80 percent confidence — the first such case in modern meteorology. The model uses an ensemble of 50 parallel scenarios and was the top-ranked system for hurricane forecasting in the 2025 season according to the US National Hurricane Center.

🟢 🏥 In Practice May 19, 2026 · 2 min read

CNCF: Kubernetes debugger erases traces — a serious problem for security audits

Editorial illustration: CNCF warns that kubectl debug — a tool for diagnosing Kubernetes containers — leaves no record after a session ends

CNCF warns that kubectl debug — a tool for diagnosing Kubernetes containers — leaves no record after a session ends. As a result, regulated industries cannot answer a key question: who viewed which container and for how long — directly violating PCI DSS and SOC 2 audit log requirements.

🟡 🏥 In Practice May 16, 2026 · 3 min read

GitHub: Copilot Memory remembers commit style, PR structure and communication preferences across all repositories

Editorial illustration: Copilot interface with memory icons linking repositories.

GitHub Copilot Memory User Preferences is a new personalization feature published May 15, 2026 that enables Copilot to remember user preferences across the entire repository ecosystem. Memory captures commit message style, pull request structure and communication preferences (formal vs. casual tone, level of detail) — and applies them consistently on every repo the user works on. The feature is part of a broader Copilot personalization layer competing with Cursor and Codeium adaptive features.

🟡 🏥 In Practice May 16, 2026 · 3 min read

OpenAI: ChatGPT Personal Finance — Pro subscribers in the US securely connect financial accounts for AI-powered insights

Editorial illustration: financial dashboard with an AI conversation view and banking integration icons.

ChatGPT Personal Finance is a new OpenAI feature announced on May 15, 2026, allowing Pro subscribers in the US to securely connect financial accounts for AI-powered insights grounded in the user's financial context, goals, and priorities. The feature expands ChatGPT from a general-purpose chat tool into a personalized financial assistant tier, directly competing with Google Finance and Perplexity's finance_search tool (announced May 13).

🟡 🏥 In Practice May 16, 2026 · 3 min read

OpenAI + Databricks: GPT-5.5 integrated into enterprise agent workflows after new OfficeQA Pro benchmark records

Editorial illustration: Databricks and OpenAI logos with a GPT-5.5 icon and enterprise agent workflow display.

The OpenAI Databricks integration is a new enterprise agent partnership announced on May 15, 2026, bringing the GPT-5.5 model to the Databricks platform for building agent workflows. The announcement marks the first explicit deployment of GPT-5.5 through a partner channel — the model set records on the OfficeQA Pro benchmarks and is now available to enterprise clients via the Databricks Mosaic AI runtime. All Anthropic Claude alternatives, Google Gemini, and Mistral competitors gain a real challenger in the Databricks ecosystem.

🟢 🏥 In Practice May 16, 2026 · 3 min read

AWS: Amazon Quick — document-level access control for S3 knowledge bases with deny-by-default and ALLOW/DENY rules

Editorial illustration: a document base with an ACL layer and user identity verification before access.

Amazon Quick document-level access control is a new enterprise RAG security mechanism published May 15, 2026 by Josh DeMuth. It enables document-level ACLs for S3 knowledge bases within Amazon Quick through two configuration methods: a global ACL file (centralized JSON for stable structures) and document-level metadata files. The system uses deny-by-default and supports ALLOW/DENY rules at user and group level, where DENY always wins.

🟡 🏥 In Practice May 15, 2026 · 2 min read

Amazon Lex: Assisted NLU LLM Mode Achieves 92% Intent Accuracy and 84% Slot Resolution at No Extra Cost

Editorial illustration: chatbot intent flow with ML and LLM components.

Amazon Lex Assisted NLU is a new LLM-powered mode for chatbots announced on May 14, 2026, that upgrades the traditional Lex NLU with large language models. It achieves 92% intent classification accuracy and 84% slot resolution accuracy on average, plus 11-15% improvement in intent classification and 23.5% fewer fallback responses in real-world deployments. Available in two modes — Primary (every input) and Fallback (low confidence only) — included in the standard Lex price.

🟢 🏥 In Practice May 15, 2026 · 3 min read

GitHub Copilot Cloud Agent: Auto Model Selection Automatically Chooses the Model with a 10% Discount on Token Multiplier

Editorial illustration: AI agent with rotating model icons and discount percentage.

GitHub Copilot Cloud Agent Auto Model Selection is a new feature announced on May 14, 2026, that automatically selects the optimal model for a task based on system health and model performance signals. Users of Auto mode receive a 10% discount on the standard model multiplier and are exempt from weekly rate limits. The feature eliminates manual model selection and addresses the increasingly common frustration pattern of enterprise users hitting their limit before the end of the week.

🟢 🏥 In Practice May 15, 2026 · 3 min read

IBM Consulting: Forward Deployed Units — 6-Person AI+Human Pods Doing the Work of 30-Person Teams at Riyadh Air, Nestlé, Heineken

Editorial illustration: consulting pod with 6 members — humans and AI agents together in a workspace.

IBM Forward Deployed Units (FDUs) is a new enterprise consulting model announced on May 14, 2026, by Mohamad Ali (Senior VP, IBM Consulting). Six-member pods — senior consultants, engineers, and AI agents — replace classic 30-person consulting teams. The model introduces continuous engagement instead of one-shot project logic. Live deployments at Riyadh Air, Nestlé, Heineken, and Pearson.

🟢 🏥 In Practice May 15, 2026 · 3 min read

OpenAI: Sea Limited (Garena, Shopee) deploys Codex across engineering teams in Asia — AI-native dev case study

Editorial illustration: Asian engineering team with Codex CLI shown on several monitors.

OpenAI Sea Codex Case Study is a new enterprise deployment article published May 14, 2026 in which the CPO of Sea Limited — parent company of the Garena and Shopee brands — explains the strategy for rolling out the OpenAI Codex coding agent across engineering teams in Asia. Sea approaches Codex as a tool for AI-native software development — a fundamental workflow change, not just a developer accelerator for existing practices.

🟡 🏥 In Practice May 14, 2026 · 2 min read

Perplexity: finance_search Agent API tool returns OHLCV, balance sheets, transcripts, and analyst estimates in a single call

Editorial illustration: financial data dashboard with OHLCV chart and AI agent arrows.

Perplexity finance_search is a new Agent API tool released in May 2026 that returns structured financial data for public companies — near-real-time prices, OHLCV ranges, pre-market and after-hours data, income statements, balance sheets, cash flow, earnings call transcripts, SEC filings, analyst estimates, and ETF constituents. The model decides which fields to fetch based on the prompt.

🟢 🏥 In Practice May 14, 2026 · 2 min read

AWS: Pulse AI + Bedrock pipeline fine-tunes Nova Micro for financial documents — 1,000 documents in 3 hours

Editorial illustration: financial documents with an automated extraction pipeline and structured data.

AWS Pulse AI + Amazon Bedrock financial document processing is a new enterprise pipeline blueprint published on May 13, 2026. It combines Pulse AI extraction from complex financial documents (SEC filings, balance sheets, audit materials) with fine-tuning of the Amazon Nova Micro model. A batch of 1,000 documents that previously required a multi-day turnaround was processed in under three hours, and the fine-tuned model achieved 100% check data extraction versus a 50% baseline.

🟢 🏥 In Practice May 14, 2026 · 3 min read

CNCF: KubeStellar AI agents achieve 81% PR acceptance with 91% test coverage and 63 CI/CD workflows

Editorial illustration: Kubernetes cluster with AI agent icons and CI/CD pipeline arrows.

KubeStellar AI Agents is a new case study CNCF blog post by Andy Anderson, Chief Maintainer of KubeStellar Console, published on May 14, 2026. The multi-cluster Kubernetes dashboard achieved 81% PR acceptance over 82 days using two parallel AI coding agents. Infrastructure: 63 CI/CD workflows, 32 nightly test suites, 91% coverage across 12 shards, bug-to-merge roughly 30 minutes. Anderson defines five levels of AI codebase maturity.

🟢 🏥 In Practice May 14, 2026 · 2 min read

GitHub: Copilot Cloud Agent REST API open for fan-out refactoring, repo setup and weekly release preparation

Editorial illustration: GitHub Actions workflow with autonomous agent icons and a REST API endpoint.

GitHub Copilot Cloud Agent REST API is a new developer endpoint published on May 13, 2026 in public preview that enables programmatic triggering of autonomous Copilot tasks. Three primary use cases: code refactoring/migration fan-out across many repositories, one-click repository initialization from a developer portal, and automated weekly release preparation with release notes. Available to Copilot Business and Enterprise subscribers.

🟡 🏥 In Practice May 13, 2026 · 2 min read

GitHub: Copilot Pro $10, Pro+ $39 and new Max plan $100 with flex credit model

Editorial illustration: subscription structure with base and flex credit icons in a developer interface.

GitHub Copilot Flex Allotments + Max plan is a new GitHub Copilot pricing structure announced on May 12, 2026, effective June 1, 2026. The Pro tier costs $10/month with $15 total usage credits, Pro+ $39 with $70 credits, and the new Max plan $100 with $200 credits. Code completions and next edit suggestions remain unlimited on all paid tiers.

🟡 🏥 In Practice May 13, 2026 · 2 min read

Perplexity: April 2026 changelog adds Claude Opus 4.7, GPT-5.5 and Grok 4.20 Reasoning to Agent API

Editorial illustration: API endpoints with model icons and security keys in a developer panel.

The Perplexity April 2026 changelog is a new batch of Perplexity Agent API updates that adds Claude Opus 4.7, GPT-5.5 and Grok 4.20 Reasoning models, native n8n integration, availability on AWS Marketplace as SaaS, a one-time API key reveal security model, and a new /v1/models endpoint in OpenAI-compatible format.

🟡 🏥 In Practice May 12, 2026 · 2 min read

Anthropic: Claude Code v2.1.139 — Agent View showing all sessions + /goal command for autonomous completion

Editorial illustration: Claude Code v2.1.139 — Agent View showing all sessions + /goal command for autonomous completion

Claude Code v2.1.139 is a release of Anthropic's CLI agent that introduces Agent View in Research Preview — a unified list of all sessions (active, blocked, completed) — and the /goal command that drives Claude through multiple turns until a set condition is met, with a panel showing elapsed time, step count, and token consumption.

🟡 🏥 In Practice May 12, 2026 · 3 min read

IBM: Red Hat AI Inference and OpenShift Virtualization Service announced as managed products on IBM Cloud

Editorial illustration: enterprise cloud infrastructure with red and blue glow, abstract servers running inference workloads, hybrid VM and container orchestration.

IBM today announced Red Hat AI Inference Service and Red Hat OpenShift Virtualization Service as managed enterprise products available on IBM Cloud. The first offers an optimized serving environment for open-source LLMs (Granite, Llama, Mistral) with automatic scaling and SLA guarantees; the second enables running VMs and containers within the same OpenShift control plane. The goal is to reduce the operational burden on enterprise teams that want open-source AI without their own Kubernetes infrastructure.

🟡 🏥 In Practice May 12, 2026 · 3 min read

OpenAI: DeployCo — new standalone organization for enterprise AI deployment announced alongside Q1 2026 results

Editorial illustration: enterprise consulting handshake with abstract AI infrastructure pipelines, deployment lifecycle visualization.

OpenAI on Tuesday launched DeployCo (The Deployment Company), a separate organization that helps enterprises build and scale AI applications in production. The goal is to separate foundation model R&D from enterprise deployment consulting, which until now lived in the same OpenAI team and created operational tension. DeployCo offers managed deployment, custom evaluation, post-launch monitoring, and industry-specific fine-tuning.

🟡 🏥 In Practice May 11, 2026 · 2 min read

OpenAI: guide on how enterprises scale AI implementations in organizations

Editorial illustration: guide on how enterprises scale AI implementations in organizations

OpenAI published the enterprise guide 'How enterprises are scaling AI' describing the path from preliminary experiments to lasting business impact. The guide focuses on four foundational pillars: organizational trust, governance frameworks, workflow design, and maintaining quality as implementations scale.

🟢 🏥 In Practice May 11, 2026 · 2 min read

Google: AI-powered Finance expands to Europe — stock summaries, portfolio analysis, AI assistant

Editorial illustration: AI-powered Finance expands to Europe — stock summaries, portfolio analysis, AI assistant

Google Finance is an AI-powered platform expanding to the European market this week with full local language support. It brings AI research on stocks and market trends, advanced visualizations, real-time cryptocurrency and commodity coverage, live transcripts of corporate earnings calls, and globally available Deep Search for complex financial questions.

🟡 🏥 In Practice May 9, 2026 · 2 min read

Anthropic: Claude Code v2.1.136 brings 54 fixes, MCP OAuth fix and hard-deny rule

Editorial illustration: Claude Code terminal showing MCP OAuth fix and hard-deny rule

Anthropic released Claude Code v2.1.136, a maintenance release with 54 changes that introduces the new settings.autoMode.hard_deny rule for unconditionally blocking actions in auto mode, fixes the MCP OAuth race condition that forced users to re-login daily, and resolves an API 400 error during extended thinking.

🟢 🏥 In Practice May 9, 2026 · 2 min read

AWS: Halliburton AI assistant for seismics cuts workflow creation time by over 95 percent

Editorial illustration: seismic workflow generated from natural language via Amazon Bedrock

Halliburton and AWS built an AI assistant for Seismic Engine that converts natural language into seismic workflows using Amazon Bedrock and Claude models. The system achieves 84–97 percent success rate and reduces creation time from 2–20 minutes to 5.9–16.6 seconds — a reduction of over 95 percent.

🟡 🏥 In Practice May 8, 2026 · 2 min read

GitHub: Five risks and a 10-minute framework for reviewing AI pull requests

Editorial illustration: Five risks and a 10-minute framework for reviewing AI pull requests

GitHub has published a practical guide to reviewing AI-generated code that defines five critical risks and a structured 10-minute code review framework. More than one in five pull requests on the platform now involves an agent.

🟡 🏥 In Practice May 7, 2026 · 2 min read

Anthropic: Claude Code v2.1.132 Brings 25+ Fixes and New Env Variables for Hooks

Editorial illustration: Claude Code v2.1.132 brings 25+ fixes and new environment variables for hooks

Anthropic released Claude Code v2.1.132 with 25+ bug fixes and two new environment variables: CLAUDE_CODE_SESSION_ID for hook integration and CLAUDE_CODE_DISABLE_ALTERNATE_SCREEN for native scrollback. A serious bug with 10GB+ RSS memory growth in MCP servers has also been resolved.

🟡 🏥 In Practice May 7, 2026 · 2 min read

Anthropic: Claude Code v2.1.133 brings worktree.baseRef and race condition fix

Editorial illustration: Claude Code v2.1.133 brings worktree.baseRef and race condition fix

Anthropic has released Claude Code v2.1.133 with new parameters worktree.baseRef, sandbox.bwrapPath/socatPath and the environment variable CLAUDE_EFFORT in hooks. The version fixes a race condition in parallel sessions and issues with Windows drive root paths. The third release this week after v2.1.131 and v2.1.132.

🟡 🏥 In Practice May 7, 2026 · 2 min read

GitHub: Optimising agentic workflows achieves token savings of 19% to 62%

Editorial illustration: Optimising agentic workflows achieves token savings of 19% to 62%

GitHub instrumented its production agentic workflows and identified three main sources of token waste: unnecessary MCP tools, deterministic data fetching and misconfigured bash rules. Optimisation achieved savings of 19% to 62% per workflow.

🟢 🏥 In Practice May 7, 2026 · 2 min read

arXiv:2605.04012: SymptomAI in the Fitbit app with 13,917 patients outperforms independent clinicians in differential diagnosis

Editorial illustration: user conversing with SymptomAI agent in the Fitbit app while a background panel displays a ranked list of diagnoses

SymptomAI is a conversational AI agent integrated into the Fitbit app and tested on approximately 13,917 participants; in the clinical evaluation subset its diagnostic recommendations achieved an odds ratio of 2.47 compared to independent clinicians who evaluated the same conversations. The study is a preprint.

🟡 🏥 In Practice May 6, 2026 · 2 min read

IBM: Enterprise Advantage gets Context Studio — Providence Health cut manager hiring time by 90%

Editorial illustration: IBM Enterprise Advantage Context Studio for AI agents grounded in organizational data

IBM expanded the Enterprise Advantage platform by launching Context Studio, a tool for building AI agents grounded in an organization's own data while preserving digital sovereignty. Providence Health reduced manager hiring time by 90%, and IBM projects operational cost savings above 25% within 18 months.

🟢 🏥 In Practice May 6, 2026 · 2 min read

Anthropic: Claude Code v2.1.131 — Windows VS Code activation and Mantle x-api-key hotfix

Editorial illustration: Claude Code v2.1.131 hotfix for Windows VS Code and Mantle authentication

Anthropic released Claude Code v2.1.131, a hotfix that resolves two bugs: a VS Code extension activation crash on Windows due to a hardcoded build path and a missing x-api-key header sent to Mantle inference endpoints. Binaries are published for all major platforms.

🟡 🏥 In Practice May 5, 2026 · 2 min read

arXiv:2605.02740: ReClaim — Foundation Model Trained on 200 Million Patient Records Achieves Mean AUC 75.6% on 1,000+ Medical Tasks

Editorial illustration: developer workspace with monitors displaying code, a stethoscope and medical charts

A new arXiv preprint presents ReClaim — a foundation model with 1.7 billion parameters trained on 43.8 billion medical events from 200 million patient records. Across more than 1,000 diagnostic tasks it achieves a mean AUC of 75.6%, significantly outperforming LightGBM (66.3%) and the Delphi specialized model (69.4%). It opens a new class of foundation models trained on administrative health data.

🟡 🏥 In Practice May 5, 2026 · 3 min read

Anthropic Claude Code v2.1.128: 30+ Fixes, .zip Plugin Support and ~3× Lower cache_creation Cost for Sub-Agents

Editorial illustration: developer workspace with monitors, a .zip archive and a plugin installation progress bar

Claude Code v2.1.128 (released May 4, 2026) brings 30+ improvements: tool count display in the /mcp panel with flagging of servers with 0 tools, support for .zip plugin archives in --plugin-dir, a fix for the EnterWorktree bug that lost local unpushed commits, ~3× reduction in cache_creation cost for sub-agents, and a fix for crashes when piping inputs larger than 10 MB.

🟡 🏥 In Practice May 4, 2026 · 3 min read

IBM study: 76% of organizations have a Chief AI Officer, CEOs expect 48% autonomous AI decisions by 2030

IBM Institute for Business Value released a study of 2,000 CEOs from 33 countries showing that the Chief AI Officer role expanded from 26% to 76% of organizations in a single year. CEOs expect that by 2030 AI systems will make 48% of operational decisions without human intervention, fundamentally redefining the entire C-suite structure.

🔴 🏥 In Practice May 1, 2026 · 3 min read

DeepMind AI co-clinician: in blind evaluation of 98 primary care queries doctors preferred it over leading tools, zero critical errors in 97/98 cases

Editorial illustration: AI agent assisting a doctor with a patient in a clinical scene with medical equipment

Google DeepMind announced the AI co-clinician research initiative on April 30, 2026 — a triadic care model in which an AI agent assists patients under clinical oversight of a physician. In blind head-to-head evaluations of 98 realistic primary care queries, doctors consistently preferred co-clinician responses over two leading evidence synthesis tools, and the system recorded zero critical errors in 97 of 98 cases.

🟡 🏥 In Practice May 1, 2026 · 2 min read

Amazon Nova 2 Lite with Reinforcement Fine-Tuning achieves 4.33/5.0 and outperforms Claude Sonnet 4.5 on automated legal contract review

Editorial illustration: AI judge on a podium evaluating a legal contract while a robotic arm marks clauses

Reinforcement Fine-Tuning (RFT) is a training method in which a language model acts as a judge (LLM-as-Judge) and provides feedback instead of expensive manual labeling. Amazon Nova 2 Lite achieved an aggregate score of 4.33/5.0 and perfect JSON validation of 1.00, outperforming Claude Sonnet 4.5 and Claude Haiku 4.5 on automated legal contract review.

🟢 🏥 In Practice May 1, 2026 · 2 min read

IBM Research and Dallara: AI surrogate model GIST evaluates racing car aerodynamics in 10 seconds instead of hours of classical CFD simulation

Editorial illustration: racing car with an arrow showing rapid flow simulation around the rear diffuser

GIST (Gauge-Invariant Spectral Transformer) is an AI surrogate model based on graph neural operators, jointly developed by IBM Research and Dallara, the Italian racing car manufacturer. Aerodynamic evaluation of the rear diffuser of an LMP2 vehicle is thus reduced from several hours of classical CFD simulation to approximately 10 seconds, and the work was presented at the AI & PDE Workshop at ICLR 2026.

🟡 🏥 In Practice April 30, 2026 · 2 min read

Anthropic Claude for Creative Work: Connectors for Blender, 50+ Adobe Creative Cloud Tools, Autodesk Fusion, Ableton, SketchUp, and Splice

Editorial illustration: AI assistant connecting 3D modeling, audio production, and graphic design

Anthropic announced Claude connectors for professional creative software on April 28, 2026, partnering with Blender, Autodesk Fusion, Adobe (50+ Creative Cloud tools), Ableton, Splice, Affinity and Canva, Resolume Arena, and SketchUp. Claude can write Python scripts for Blender, control Resolume in real time for VJs, automate batch asset processing in Affinity, and search the Splice sample catalog. Anthropic also joined the Blender Development Fund as a patron, and three academic institutions — RISD, Ringling College, and Goldsmiths — are launching courses where students learn to work with Claude.

🟡 🏥 In Practice April 30, 2026 · 2 min read

Google ERA: AI system for scientific research reaches CDC top for hospitalization forecasting, solves an open cosmological problem, and tracks CO2 every 10 minutes

Editorial illustration: scientific tools and an AI network connected in a star constellation of domains

On April 29, 2026, Google Research introduced ERA (Empirical Research Assistance) — an internal AI system that combines LLMs with computational tools to accelerate scientific research. Four concrete results across different domains: top of the CDC leaderboard for COVID/flu/RSV hospitalization forecasting, six new solutions for gravitational wave emission from cosmic strings, a neural network tracking atmospheric CO2 every 10 minutes, and interpretable neural circuits in zebrafish.

🟡 🏥 In Practice April 30, 2026 · 3 min read

EvalEval Coalition: AI evaluation is becoming the new compute bottleneck — GAIA single run $2,829, HAL leaderboard $40,000, academic auditors hit a budget wall before a technical one

Editorial illustration: a scale tipping toward evaluation costs over training costs

The EvalEval Coalition (Avijit Ghosh, Yifan Mai, Georgia Channing, Leshem Choshen) published on April 29, 2026, an analysis on the HuggingFace blog showing how AI model evaluation costs have exploded. A single GAIA run costs $2,829, the HAL leaderboard $40,000 (k=8 reliability $320,000), and PaperBench around $9,500 per agent. Static benchmarks compress 100-200×, agentic ones only 2-3.5× — an accountability barrier for independent auditors.

🟡 🏥 In Practice April 29, 2026 · 2 min read

Anthropic Claude for Creative Work: connectors for 60+ creative tools, new Claude Design product, and partnerships with RISD, Ringling, and Goldsmiths

Editorial illustration: Claude connectors branching into a stack of creative tools spanning design, video, and 3D production

Anthropic has introduced Claude for Creative Work — a package of connectors linking Claude to Adobe Photoshop, Premiere, 50+ Creative Cloud tools, Blender, Autodesk Fusion, Ableton Live and Push, the Resolume suite, SketchUp, Splice, and Affinity by Canva. Also launched is the new Claude Design product from Anthropic Labs for visualizing software interface ideas with Canva export. In parallel, academic partnerships with RISD, Ringling College, and Goldsmiths University of London are bringing Claude into creative computing curricula.

🟡 🏥 In Practice April 29, 2026 · 2 min read

IBM Launches Bob: AI Development Partner for the Full SDLC with 80,000+ Internal Users and 45% Average Productivity Uplift

Editorial illustration: software development lifecycle gears with an assistant emblem at the center

On April 28, 2026, IBM launched 'Bob,' an AI partner for the entire software development lifecycle: planning, design, coding, testing, deployment, operations, and modernization. Bob orchestrates Anthropic Claude, Mistral, and IBM Granite models, is already used internally by 80,000+ IBM employees with an average 45% productivity uplift, and is available as SaaS with a free 30-day trial at bob.ibm.com.

🟡 🏥 In Practice April 29, 2026 · 2 min read

OpenAI Comes to AWS: GPT Models, Codex, and Managed Agents Now Available Within AWS Environments for Enterprise Users

On April 28, 2026, OpenAI announced that GPT models, Codex, and Managed Agents are now available on AWS, enabling enterprise users to build secure AI systems within their AWS environments. The announcement comes on the same day as the amended OpenAI × Microsoft partnership. This marks the first OpenAI distribution outside the Microsoft Azure ecosystem.

View full archive →