LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision, and the Road Ahead

Junda He, Christoph Treude, David Lo

Abstract

Integrating Large Language Models (LLMs) into autonomous agents marks a significant shift in the research landscape by offering cognitive abilities that are competitive with human planning and reasoning. This article explores the transformative potential of integrating Large Language Models into Multi-Agent (LMA) systems for addressing complex challenges in software engineering (SE). By leveraging the collaborative and specialized abilities of multiple agents, LMA systems enable autonomous problem-solving, improve robustness, and provide scalable solutions for managing the complexity of real-world software projects. In this article, we conduct a systematic review of recent primary studies to map the current landscape of LMA applications across various stages of the software development lifecycle (SDLC). To illustrate current capabilities and limitations, we perform two case studies to demonstrate the effectiveness of state-of-the-art LMA frameworks. Additionally, we identify critical research gaps and propose a comprehensive research agenda focused on enhancing individual agent capabilities and optimizing agent synergy. Our work outlines a forward-looking vision for developing fully autonomous, scalable, and trustworthy LMA systems, laying the foundation for the evolution of Software Engineering 2.0.

Eigenvector Insight — Zone III / PASF-PADE AnalysisNot part of the original paper

Eigenvector Research — Marco van Hurne

How this paper contributes to solving the Zone III problem (PASF-PADE)

This paper directly addresses one of the core structural challenges in Zone III deployments. The research on LLM-Based Multi-Agent Systems, Software Engineering, Autonomous agents provides evidence-based foundations that enterprise architects cannot ignore when designing long-horizon autonomous workflows. The findings challenge the assumption that a base language model — however capable — can handle the complexity of durable, governed, multi-step execution without explicit architectural intervention. For Zone III practitioners, this paper belongs in the required reading list.

Why AI is not sufficient for Zone III without this

Zone III refers to high-complexity, high-risk, long-running agentic workflows — the class of enterprise AI deployments where a single failure can cascade across hundreds of steps. Standard AI models, trained to predict the next token, are not inherently designed for durable, governed, multi-step execution. This paper addresses one or more of the structural gaps that make Zone III deployments unsafe without explicit architectural intervention.

Topics

LLM-Based Multi-Agent SystemsSoftware EngineeringAutonomous agentsLiterature ReviewSoftware development lifecycle

Relevance Scores

Long-Horizon Score85

Enterprise Score80

Completeness75

Paper Info

Year2025

Venue

Type

ChapterCh. 3

Authors3

Zone III Analysis

Frameworks

PADE OCG