Retroformer: Retrospective Large Language Agents with Policy Gradient Optimisation

Weiran Yao (Salesforce Research), Shelby Heinecke (Salesforce Research)

Abstract

We present Retroformer, a framework for improving language agents through retrospective policy gradient optimization. Retroformer learns from past trajectories to improve future performance without manual reward engineering.

Eigenvector Insight — Zone III / PASF-PADE AnalysisNot part of the original paper

Eigenvector Research — Marco van Hurne

How this paper contributes to solving the Zone III problem (PASF-PADE)

Retroformer provides a path to continuous agent improvement from operational experience — without manual reward engineering. For Zone III enterprise deployments, this self-improvement capability is essential for adapting to changing enterprise environments.

Why AI is not sufficient for Zone III without this

Zone III refers to high-complexity, high-risk, long-running agentic workflows — the class of enterprise AI deployments where a single failure can cascade across hundreds of steps. Standard AI models, trained to predict the next token, are not inherently designed for durable, governed, multi-step execution. This paper addresses one or more of the structural gaps that make Zone III deployments unsafe without explicit architectural intervention.

Key Contributions

→Retrospective policy gradient for agents
→Trajectory-based learning
→Automated reward signal

Topics

policy gradientretrospective learningagent improvementRL

Relevance Scores

Long-Horizon Score85

Enterprise Score74

Completeness74

Paper Info

Year2023

VenueICLR 2024

Typesystem architecture

ChapterCh. 5

Authors2

Zone III Analysis

Frameworks

PASF AEGIS