Chapter 8 · 2025
Self-Verifying Reflection Helps Transformers with CoT Reasoning
Zhongwei Yu, Wannian Xia, Xue Yan
Abstract
Advanced large language models (LLMs) frequently reflect in reasoning chain-of-thoughts (CoTs), where they self-verify the correctness of current solutions and explore alternatives. However, given recent findings that LLMs detect limited errors in CoTs, how reflection contributes to empirical improvements remains unclear. To analyze this issue, in this paper, we present a minimalistic reasoning framework to support basic self-verifying reflection for small transformers without natural language, which ensures analytic clarity and reduces the cost of comprehensive experiments.
Topics
LLM self-verificationchain-of-thought reasoningtransformerserror detectionreinforcement learning