Research Highlights: SparseGPT: Prune LLMs Accurately in One-Shot

Research Highlights: SparseGPT: Prune LLMs Accurately in One-Shot

A new research paper shows that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models.