SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Abstract
Large Language Models (LLMs) are prone to generating factual inaccuracies, a phenomenon known as hallucination. Detecting these hallucinations without access to external knowledge or ground truth is a challenging problem. This paper introduces SelfCheckGPT, a zero-resource black-box method for hallucination detection in generative LLMs. Our approach leverages the LLM's own internal consistency by prompting it multiple times to generate diverse responses to the same input. By comparing the consistency and coherence across these self-generated responses, SelfCheckGPT can identify instances where the model is "hallucinating" without requiring external verification. This method is particularly valuable for scenarios where external knowledge bases are unavailable or difficult to integrate.