theoretical frameworkChapter 3arXiv · 2023
Scalable Oversight: Supervising AI Systems That Exceed Human Capabilities
Paul Christiano (ARC), Jan Leike (OpenAI)
Abstract
We discuss the challenge of providing oversight to AI systems that may exceed human capabilities in some domains. We propose scalable oversight as a research agenda for maintaining meaningful human control.
Key Contributions
- →Scalable oversight research agenda
- →Debate and amplification techniques
- →Human control preservation methods
Eigenvector Commentary
Scalable oversight is the central governance challenge for Zone III. As agents become more capable, human oversight becomes harder. This paper frames the problem correctly: the goal is not to prevent autonomy but to maintain meaningful control as autonomy increases.
Topics
scalable oversighthuman controlAI safetygovernance
Relevance Scores
Long-Horizon Score82
Enterprise Score90
Completeness80
Paper Info
Year2023
VenuearXiv
Typetheoretical framework
ChapterCh. 3
Authors2
Related Papers
Reflexion: Language Agents with Verbal Reinforcement Le…
2023 · Ch.1
HuggingGPT: Solving AI Tasks with ChatGPT and its Frien…
2023 · Ch.4
AgentBench: Evaluating LLMs as Agents
2023 · Ch.1
Semantic Uncertainty: Linguistic Invariances for Uncert…
2023 · Ch.3
View all Chapter 3 papers →