diff --git a/content/posts/lm_sharding.md b/content/posts/lm_sharding.md index 0049d7c..1ffb271 100644 --- a/content/posts/lm_sharding.md +++ b/content/posts/lm_sharding.md @@ -5,7 +5,7 @@ draft: false ShowToc: true category: [ai] tags: ["llms", "ai", "inference"] -description: "A guide to fine-tuning GPT-X models with DeepSpeed" +description: "Techniques to load LLMs on smaller GPUs and enable parallel inference using Hugging Face Accelerate" --- *With the rise of deep learning and the development of increasingly powerful models, pre-trained language models have grown in size. While these models deliver impressive performance in various natural language processing (NLP) tasks, their sheer magnitude poses challenges for inference on resource-constrained devices and large-scale distributed systems. Enter sharding, a technique that divides large models into smaller, more manageable parts, offering an efficient and faster approach to distributed inference.*