A Survey on Code Generation with LLM-based Agents

Yihong Dong, Xue Jiang, Jiaru Qian

Abstract

Code generation agents powered by large language models (LLMs) are revolutionizing the software development paradigm. Distinct from previous code generation techniques, code generation agents are characterized by three core features. 1) Autonomy: the ability to independently manage the entire workflow, from task decomposition to coding and debugging. 2) Expanded task scope: capabilities that extend beyond generating code snippets to encompass the full software development lifecycle (SDLC). 3) Enhancement of engineering practicality: a shift in research emphasis from algorithmic innovation toward practical engineering challenges, such as system reliability, process management, and tool integration. This domain has recently witnessed rapid development and an explosion in research, demonstrating significant application potential. This paper presents a systematic survey of the field of LLM-based code generation agents. We trace the technology's developmental trajectory from its inception and systematically categorize its core techniques, including both single-agent and multi-agent architectures. Furthermore, this survey details the applications of LLM-based agents across the full SDLC, summarizes mainstream evaluation benchmarks and metrics, and catalogs representative tools. Finally, by analyzing the primary challenges, we identify and propose several foundational, long-term research directions for the future work of the field.

Eigenvector Insight — Zone III / PASF-PADE AnalysisNot part of the original paper

Eigenvector Research — Marco van Hurne

How this paper contributes to solving the Zone III problem (PASF-PADE)

This paper directly addresses one of the core structural challenges in Zone III deployments. The research on Code generation, LLM-based agents, Software development lifecycle provides evidence-based foundations that enterprise architects cannot ignore when designing long-horizon autonomous workflows. The findings challenge the assumption that a base language model — however capable — can handle the complexity of durable, governed, multi-step execution without explicit architectural intervention. For Zone III practitioners, this paper belongs in the required reading list.

Why AI is not sufficient for Zone III without this

Zone III refers to high-complexity, high-risk, long-running agentic workflows — the class of enterprise AI deployments where a single failure can cascade across hundreds of steps. Standard AI models, trained to predict the next token, are not inherently designed for durable, governed, multi-step execution. This paper addresses one or more of the structural gaps that make Zone III deployments unsafe without explicit architectural intervention.

Topics

Code generationLLM-based agentsSoftware development lifecycleMulti-agent architecturesSurvey

Relevance Scores

Long-Horizon Score85

Enterprise Score80

Completeness75

Paper Info

Year2025

Venue

Type

ChapterCh. 3

Authors3

Zone III Analysis

Frameworks

PADE OCG