Towards Socially Intelligent Multi-Agent Systems: Zero-Shot MARL Coordination and Theory-of-Mind Benchmarking of LLM Agents for Strategic Deception

June 2026

Towards Socially Intelligent Multi-Agent Systems: Zero-Shot MARL Coordination and Theory-of-Mind Benchmarking of LLM Agents for Strategic Deception

Authors:

Karan Mirakhor

Abstract:

An agent that performs well on its own may still struggle when working with others. In multi-agent environments, success depends not only on understanding the world but also on understanding what other agents know, intend, and conceal. Cooperative partners follow hidden conventions, while adversarial opponents deceive. This work argues that robust multi-agent behavior requires explicit reasoning about these hidden mental states, and that we must measure this reasoning directly rather than simply looking at task outcomes.

These concepts are developed through two complementary projects. The first, BEACON, addresses the zero-shot coordination problem: how can an agent coordinate effectively with unfamiliar partners it has never trained with? When agents learn from offline data, they often lock into dataset-specific conventions that work well with familiar partners but fail with new ones. BEACON is an offline-to-online learning framework that clusters offline trajectories into different conventions, trains diverse specialists for each convention, and uses belief-conditioned counterfactual rollouts to adapt online. On 2- and 3-player Hanabi, BEACON achieves state-of-the-art zero-shot coordination performance while using up to five times fewer training frames than strong online baselines. It also coordinates with human partners comparably to a leading online method. The second project, AmongUs-X, asks whether large language model agents genuinely deceive or merely win through other means. Built on the social-deduction game Among Us and spanning 21 model families across more than 8,700 games, the benchmark elicits agents' beliefs at fixed points during meetings. This yields eight Theory-of-Mind metrics measuring detection, deception, influence, and grounding. Win-rate-derived ratings track crewmate detection but miss impostor deception entirely. However, the elicited beliefs remain well-calibrated, enabling direct mechanism-level evaluation.

Both projects arrive at the same conclusion: high self-play scores can hide poor coordination, and high win rates can hide absent deception. Modeling other agents' hidden information and measuring that modeling explicitly is essential for building socially intelligent multi-agent systems and evaluating them reliably.

Notes:

@mastersthesis{Mirakhor-2026-88313,
author = {Karan Mirakhor},
title = {Towards Socially Intelligent Multi-Agent Systems: Zero-Shot MARL Coordination and Theory-of-Mind Benchmarking of LLM Agents for Strategic Deception},
year = {2026},
month = {June},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-26-60},
keywords = {reinforcement learning, multi-agent systems, social intelligence, large language models, zero-shot coordination, strategic deception, theory of mind, benchmarking},
}
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.