What is the Prisoner's Dilemma?

A classic situation in game theory where two players must choose between cooperation or defection. Both benefit if they cooperate, but each individually does better by defecting — the result is a systemically poor outcome despite locally rational choices.

Why would a stronger model be less cooperative?

A stronger reasoning model more accurately recognizes that defection is the Nash equilibrium in a single-shot game. The irony: understanding game theory leads it into a non-cooperative trap instead of seeing the long-term benefits of cooperation.

CoopEval: stronger LLMs less cooperative in social dilemmas

What does the paper test?

CoopEval is a new benchmark presented on arXiv on April 17, 2026 that systematically tests cooperative behavior of LLM agents in classic social dilemmas from game theory:

Prisoner’s Dilemma — two players, cooperation vs. defection
Public Goods — each player can contribute to a common good or free-ride
Other mixed-motive games — situations where individually rational choices lead to collectively poor outcomes

The authors tested multiple generations of LLMs, from smaller models to state-of-the-art reasoning variants, measuring the rate of cooperative choices in controlled single-shot and multi-round scenarios.

Counterintuitive finding: stronger models defect more

The expectation would be that stronger models — those with better reasoning — achieve better results in everything, including cooperation. CoopEval finds the opposite.

Weaker models more often choose cooperation in single-shot social dilemmas
Stronger reasoning models systematically defect — they recognize that defection is the Nash equilibrium in a single-shot situation, and act “rationally”

The irony is sharp: the better a model understands game theory, the more reliably it falls into the trap that undermines collective outcomes. A model that “thinks like an economist” in Prisoner’s Dilemma always defects — exactly as theory predicts, and exactly what is generally considered bad for society.

What does this mean for multi-agent AI?

The finding is important because many future AI scenarios involve multiple agents interacting with each other:

AI assistants negotiating on behalf of users (e.g., purchasing products, reservations)
AI agents coordinating in multi-party systems (fleet management, supply chains)
Multiple AI systems in the same digital ecosystem (autonomous trading, resource scheduling)

If all these agents exhibit “game-theoretically rational” behavior, the result can be systemically poor — the AI equivalent of the “tragedy of the commons,” where each individual agent chooses optimally but the collective outcome collapses.

What does the paper propose?

The authors examine mechanisms that would “sustain cooperation”:

Reputation systems — agents track past behavior of others and punish defectors in future interactions
Commitment mechanisms — agents can publicly bind their choice before the game
Training modifications — reward shaping that explicitly incorporates collective benefit into the loss function

No mechanism is perfect, but the paper argues that the problem can be mitigated — with intentional design.

The paper is a preprint, but the conceptual relevance for long-term AI deployment is significant. For builders of multi-agent systems, this is required reading before deploying in an environment where an agent actually interacts with other agents.

CoopEval: stronger reasoning models are systematically less cooperative in social dilemmas — a counterintuitive finding for multi-agent AI

What does the paper test?

Counterintuitive finding: stronger models defect more

What does this mean for multi-agent AI?

What does the paper propose?

Sources

Related news