system architectureChapter 5arXiv · 2024
Mixtral of Experts
Albert Q. Jiang (Mistral AI), Alexandre Sablayrolles (Mistral AI)
Abstract
We introduce Mixtral 8x7B, a Sparse Mixture of Experts language model. Mixtral uses a router to select 2 of 8 expert FFN layers per token, achieving strong performance with reduced inference cost.
Key Contributions
- →Sparse MoE architecture
- →Efficient expert routing
- →Strong performance at reduced cost
Topics
mixture of expertsefficient inferencemodel architecturesparse models
Relevance Scores
Long-Horizon Score72
Enterprise Score82
Completeness74
Paper Info
Year2024
VenuearXiv
Typesystem architecture
ChapterCh. 5
Authors2
Zone III Analysis
Frameworks
Related Papers
ReAct: Synergizing Reasoning and Acting in Language Mod…
2023 · Ch.1
Reflexion: Language Agents with Verbal Reinforcement Le…
2023 · Ch.1
Tree of Thoughts: Deliberate Problem Solving with Large…
2023 · Ch.1
Toolformer: Language Models Can Teach Themselves to Use…
2023 · Ch.1
View all Chapter 5 papers →