🤖 24 AI
🟡 🤝 Agents Sunday, April 19, 2026 · 2 min read

RadAgent: AI Tool That Interprets Chest CT Scans Step by Step with +36% Relative F1 Improvement

Editorial illustration: AI agent analyzing a chest CT scan, medical context without faces

Why it matters

RadAgent is an AI agent for chest CT scan interpretation that outperforms the baseline CT-Chat model by 36.4% relative macro-F1, 19.6% micro-F1, and 41.9% adversarial robustness in a transparent step-by-step process. The tool generates radiology reports with inspectable decision traces and achieves 37% Faithfulness compared to 0% for the baseline.

What Is RadAgent?

RadAgent is an AI agent for radiological interpretation of chest CT (Computed Tomography) scans, introduced in a new paper on arXiv. A team of 13 researchers from Zurich, Stanford, and NYU built a system that, in a transparent step-by-step process, uses vision-language models (VLMs) and specialized tools to generate structured radiology reports.

Unlike monolithic VLM approaches, RadAgent operates as a tool-calling agent — invoking tools for segmentation, lesion detection, measurement, and mapping to medical standards — while maintaining an explicit decision trace that a radiologist can later review and revise.

How Much Better Is It Compared to the Baseline?

The numbers are significant. Compared to the baseline CT-Chat model, RadAgent achieves:

  • Macro-F1: +6.0 points absolute (36.4% relative)
  • Micro-F1: +5.4 points absolute (19.6% relative)
  • Adversarial robustness: +24.7 points (41.9% relative)
  • Faithfulness score: 37.0% compared to a baseline of 0%

The Faithfulness score measures to what extent the generated report literally reflects visible findings on the scan — the baseline model essentially had no traceable link between findings and the report, while RadAgent reaches a level where more than a third of all claims can be traced to a specific detection in the image.

Why Does This Matter for Clinical Practice?

Radiological interpretation is one of the most promising yet sensitive areas of AI application in medicine. Black-box models — which produce reports without explanation — have been the main obstacle to regulatory approval, because radiologists cannot verify what the AI actually relied on.

The decision trace that RadAgent generates changes the dynamic: a radiologist can open the step-by-step log, see which lesions the tool detected, which it measured, and how it categorized them. Combined with improved F1 scores and resistance to adversarial attacks, this yields an architecture that is a more mature candidate for clinical deployment than previous generations.

What Comes Next?

The authors do not mention a public code release date, but the paper is available on arXiv as a preprint. Given the multi-institutional authorship and metrics that surpass industry benchmarks, RadAgent is a strong candidate for peer-reviewed publication in a leading medical AI journal, and could set a new standard for step-by-step radiology agents.

🤖

This article was generated using artificial intelligence from primary sources.