Chapter 9 · 2024
Towards Efficient and Reliable LLM Serving: A Real-World Case Study
Zhuohan Li, Lianmin Zheng, Ying Sheng
Abstract
We present a real-world case study of LLM serving infrastructure, analyzing latency, throughput, and reliability challenges in production deployments at scale.
Topics
LLM servingproduction deploymentlatencyreliability
Relevance Scores
Long-Horizon Score76
Enterprise Score91
Completeness85
Paper Info
Year2024
Venue
Type
ChapterCh. 9
Authors3
Zone III Analysis
Related Papers
Reflexion: Language Agents with Verbal Reinforcement Le…
2023 · Ch.1
Tree of Thoughts: Deliberate Problem Solving with Large…
2023 · Ch.1
Generative Agents: Interactive Simulacra of Human Behav…
2023 · Ch.2
MemGPT: Towards LLMs as Operating Systems
2023 · Ch.2
View all Chapter 9 papers →