safetyChapter 3arXiv · 2022
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai (Anthropic), Saurav Kadavath (Anthropic)
Abstract
We present Constitutional AI, a method for training AI systems to be helpful, harmless, and honest using a set of principles (a "constitution") and AI feedback rather than human feedback.
Key Contributions
- →Constitutional AI methodology
- →AI-generated feedback for alignment
- →Principle-based safety training
Topics
constitutional AIalignmentsafetyAI feedback
Relevance Scores
Long-Horizon Score75
Enterprise Score88
Completeness82
Paper Info
Year2022
VenuearXiv
Typesafety
ChapterCh. 3
Authors2
Zone III Analysis
Related Papers
ReAct: Synergizing Reasoning and Acting in Language Mod…
2023 · Ch.1
Reflexion: Language Agents with Verbal Reinforcement Le…
2023 · Ch.1
Tree of Thoughts: Deliberate Problem Solving with Large…
2023 · Ch.1
Toolformer: Language Models Can Teach Themselves to Use…
2023 · Ch.1
View all Chapter 3 papers →