system architectureChapter 5NeurIPS 2022 · 2022
RLHF: Training Language Models to Follow Instructions with Human Feedback
Long Ouyang (OpenAI), Jeff Wu (OpenAI)
Abstract
We present InstructGPT, trained using reinforcement learning from human feedback to follow instructions. RLHF significantly improves alignment with human intent compared to supervised fine-tuning alone.
Key Contributions
- →RLHF methodology
- →InstructGPT
- →Human preference alignment
Topics
RLHFinstruction followingalignmenthuman feedback
Relevance Scores
Long-Horizon Score75
Enterprise Score85
Completeness82
Paper Info
Year2022
VenueNeurIPS 2022
Typesystem architecture
ChapterCh. 5
Authors2
Zone III Analysis
Related Papers
ReAct: Synergizing Reasoning and Acting in Language Mod…
2023 · Ch.1
Reflexion: Language Agents with Verbal Reinforcement Le…
2023 · Ch.1
Tree of Thoughts: Deliberate Problem Solving with Large…
2023 · Ch.1
Toolformer: Language Models Can Teach Themselves to Use…
2023 · Ch.1
View all Chapter 5 papers →