-
Notifications
You must be signed in to change notification settings - Fork 1
/
proposal.tex
40 lines (29 loc) · 1.41 KB
/
proposal.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
\documentclass{article}
\usepackage{amsmath, amssymb}
\usepackage[left=1in, right=1in, top=1in, bottom=1in]{geometry}
\usepackage{hyperref}
\usepackage[utf8]{inputenc}
\title{Draft: Testing FSDP as a viable alternative to DeepSpeed}
\author{}
\begin{document}
\maketitle
Training large language models (LLM) at scale is at the forefront of
distributed computing research. As both the size of the data and model grow
larger different parallelization schemes for both become critical in increasing
training efficiency. Among several frameworks, DeepSpeed and PyTorch-FSDP are
leading candidates in facilitating distributed deployment of LLM training. It
is timely to make an effort in comparing the performance and scaling efficiency
of FSDP and DeepSpeed on ALCF machines.
This effort is crucial to ALCF for better supporting users who will be doing
distributed LLM trainings on ALCF system. This effort will also benefit the
AuroraGPT project.
\section*{Concrete Tasks}
\begin{itemize}
\item Exploring the scope of adopting various parallelization schemes with
FSDP. For example, tensor parallelism and sequence parallelism. This
would require development work.
\item Exploring compute and communication overhead at scale.
\item Profile both frameworks in detail to identify bottlenecks.
\item Identify opportunities to leverage available system architectures.
\end{itemize}
\end{document}