arXiv:2605.06635: LLM agents cite but don't verify — links valid 94%+, accuracy only 39–77%
New research tested 14 LLM models on deep research tasks and uncovered a major gap: links are valid in 94%+ of cases, but the factual accuracy of citations is only 39–77%. The key finding: citation accuracy drops by 42% when the number of tools increases from 2 to 150, overturning the assumption that more retrieval means better quality.