Title:Chiplet Cloud: Building AI Supercomputers for Serving Large Generative Language Models

Download a PDF of the paper titled Chiplet Cloud: Building AI Supercomputers for Serving Large Generative Language Models, by Huwan Peng and 4 other authors

Download PDF

Abstract: Large language models (LLMs) such as ChatGPT have demonstrated unprecedented capabilities in multiple AI tasks. However, hardware inefficiencies have become a significant factor limiting the democratization of LLMs. We propose Chiplet Cloud, an ASIC supercomputer architecture that optimizes total cost of ownership (TCO) per token for serving generative LLMs. Chiplet Cloud fits all model parameters inside the on-chip SRAMs to eliminate bandwidth limitations while moderating the die size to improve system costs while leveraging software mappings to overcome data communication overhead. We propose a comprehensive design methodology that accurately explores a spectrum of major design trade-offs in the joint space of hardware-software and generates a detailed performance-cost analysis on all valid design points. We evaluate Chiplet Cloud on four popular LLMs. Compared to GPU and TPU, our architecture can achieve up to 94x and 15x improvement in TCO/Token respectively, significantly reducing the cost for realistically serving modern LLMs.

Submission history

From: Huwan Peng [view email]

[v1] Wed, 5 Jul 2023 21:42:24 UTC (3,499 KB)

from Hacker News https://ift.tt/tpH2N7d

SymmetricalDataSecurity

Tuesday, July 11, 2023

Chiplet ASIC supercomputers for LLMs like GPT-4

Title:Chiplet Cloud: Building AI Supercomputers for Serving Large Generative Language Models

Submission history

No comments:

Post a Comment

Blog Archive

Search This Blog

Total Pageviews