From fe6bcfc2ad4d592fcb11beda41481d9ce8cfc28c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?X=CE=BBRI-U5?= Date: Thu, 14 Dec 2023 13:20:37 +0700 Subject: [PATCH] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 32c3718..5749bc4 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# 🚧 pipegoose: Decentralized large-scale 4D parallelism multi-modal pre-training for 🤗 `transformers` in Mixture of Experts +# 🚧 pipegoose: Large-scale 4D parallelism multi-modal pre-training for 🤗 `transformers` in Mixture of Experts [](https://github.com/xrsrke/pipegoose) [![tests](https://github.com/xrsrke/pipegoose/actions/workflows/tests.yaml/badge.svg)](https://github.com/xrsrke/pipegoose/actions/workflows/tests.yaml) [](https://discord.gg/s9ZS9VXZ3p) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [Codecov](https://app.codecov.io/gh/xrsrke/pipegoose) [![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/) @@ -6,7 +6,7 @@ -We're building a library for an end-to-end framework for **training multi-modal MoE in a decentralized way, as proposed by the paper [DiLoCo](https://arxiv.org/abs/2311.08105)**. The core papers that we are replicating are: +We're building an end-to-end library for **training multi-modal MoE in a decentralized way, as proposed by the paper [DiLoCo](https://arxiv.org/abs/2311.08105)**. The core papers that we are replicating are: - DiLoCo: Distributed Low-Communication Training of Language Models [[link]](https://arxiv.org/abs/2311.08105) - Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism [[link]](https://arxiv.org/abs/2304.11414) - Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity [[link]](https://arxiv.org/abs/2101.03961)