CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing

Zhibin Gou, Zhihong Shao, Yeyun Gong

Abstract

CRITIC enables LLMs to self-correct by interacting with external tools to verify and critique their outputs, improving accuracy on code generation, mathematical reasoning, and question answering.

Eigenvector Insight — Zone III / PASF-PADE AnalysisNot part of the original paper

Eigenvector Research — Marco van Hurne

How this paper contributes to solving the Zone III problem (PASF-PADE)

CRITIC's tool-interactive self-correction is a Zone III reliability primitive. The key insight is that self-correction without external verification is unreliable — the model corrects based on its own biases. But correction grounded in tool feedback (running code, checking facts against a database) is much more reliable. Zone III agents must have verification mechanisms that go beyond self-reflection.

Why AI is not sufficient for Zone III without this

Zone III refers to high-complexity, high-risk, long-running agentic workflows — the class of enterprise AI deployments where a single failure can cascade across hundreds of steps. Standard AI models, trained to predict the next token, are not inherently designed for durable, governed, multi-step execution. This paper addresses one or more of the structural gaps that make Zone III deployments unsafe without explicit architectural intervention.

Topics

self-correctiontool interactionverificationcritique

Relevance Scores

Long-Horizon Score85

Enterprise Score82

Completeness86

Paper Info

Year2023

Venue

Type

ChapterCh. 8

Authors3

Zone III Analysis

Frameworks

PASF AEGIS