RadAgent is an AI agent that uses vision-language models and specialized tools for transparent, multi-step interpretation of chest CT scans, generating a radiology report with a traceable reasoning chain.

How much better is it than existing models?

It achieves a 36.4% relative improvement in macro-F1 and 41.9% in adversarial robustness over the CT-Chat baseline, along with a 37% Faithfulness score that the baseline lacked entirely.

RadAgent: AI Tool That Interprets Chest CT Scans Step by Step with +36% Relative F1 Improvement

What Is RadAgent?

RadAgent is an AI agent for radiological interpretation of chest CT (Computed Tomography) scans, introduced in a new paper on arXiv. A team of 13 researchers from Zurich, Stanford, and NYU built a system that, in a transparent step-by-step process, uses vision-language models (VLMs) and specialized tools to generate structured radiology reports.

Unlike monolithic VLM approaches, RadAgent operates as a tool-calling agent — invoking tools for segmentation, lesion detection, measurement, and mapping to medical standards — while maintaining an explicit decision trace that a radiologist can later review and revise.

How Much Better Is It Compared to the Baseline?

The numbers are significant. Compared to the baseline CT-Chat model, RadAgent achieves:

Macro-F1: +6.0 points absolute (36.4% relative)
Micro-F1: +5.4 points absolute (19.6% relative)
Adversarial robustness: +24.7 points (41.9% relative)
Faithfulness score: 37.0% compared to a baseline of 0%

The Faithfulness score measures to what extent the generated report literally reflects visible findings on the scan — the baseline model essentially had no traceable link between findings and the report, while RadAgent reaches a level where more than a third of all claims can be traced to a specific detection in the image.

Why Does This Matter for Clinical Practice?

Radiological interpretation is one of the most promising yet sensitive areas of AI application in medicine. Black-box models — which produce reports without explanation — have been the main obstacle to regulatory approval, because radiologists cannot verify what the AI actually relied on.

The decision trace that RadAgent generates changes the dynamic: a radiologist can open the step-by-step log, see which lesions the tool detected, which it measured, and how it categorized them. Combined with improved F1 scores and resistance to adversarial attacks, this yields an architecture that is a more mature candidate for clinical deployment than previous generations.

What Comes Next?

The authors do not mention a public code release date, but the paper is available on arXiv as a preprint. Given the multi-institutional authorship and metrics that surpass industry benchmarks, RadAgent is a strong candidate for peer-reviewed publication in a leading medical AI journal, and could set a new standard for step-by-step radiology agents.

RadAgent: AI Tool That Interprets Chest CT Scans Step by Step with +36% Relative F1 Improvement

What Is RadAgent?

How Much Better Is It Compared to the Baseline?

Why Does This Matter for Clinical Practice?

What Comes Next?

Sources

Related news