HomeResearch LibraryA Subgoal-driven Framework for Improving Long-Horizon L…
Chapter 2 · 2026

A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

Taiyi Wang, Sian Gooding, Florian Hartmann

Abstract

LLM-based agents struggle with long-horizon planning due to losing track of goals and sparse rewards in RL fine-tuning. This paper proposes a subgoal-driven framework with an agent that leverages proprietary models for online planning through subgoal decomposition. It also introduces MiRA (Milestoning your Reinforcement Learning Enhanced Agent), an RL training framework using dense, milestone-based reward signals, significantly improving long-horizon capabilities.

Eigenvector Insight — Zone III / PASF-PADE AnalysisNot part of the original paper
Eigenvector Research — Marco van Hurne
How this paper contributes to solving the Zone III problem (PASF-PADE)

This paper directly addresses one of the core structural challenges in Zone III deployments. The research on LLM agents, long-horizon planning, subgoal decomposition provides evidence-based foundations that enterprise architects cannot ignore when designing long-horizon autonomous workflows. The findings challenge the assumption that a base language model — however capable — can handle the complexity of durable, governed, multi-step execution without explicit architectural intervention. For Zone III practitioners, this paper belongs in the required reading list.

Why AI is not sufficient for Zone III without this

Zone III refers to high-complexity, high-risk, long-running agentic workflows — the class of enterprise AI deployments where a single failure can cascade across hundreds of steps. Standard AI models, trained to predict the next token, are not inherently designed for durable, governed, multi-step execution. This paper addresses one or more of the structural gaps that make Zone III deployments unsafe without explicit architectural intervention.

Topics

LLM agentslong-horizon planningsubgoal decompositionreinforcement learningweb navigation