Chapel port of llm.c.
This is an LLM implementation in Chapel based on llm.c Compared to the reference, this version is:
- more parallel, as it relies on Chapel's parallel constructs like
forall
,reduce
which are more natural to use while delivering better performance, - more succinct, as Chapel's multidimensional arrays are a natural fit for tensor programming for LLMs,
- more user-friendly, as there is no need for dynamic memory management as in C.
This version is based on a relatively earlier commit of llm.c. We are still in the early stages of adding standalone GPU kernels. The current GPU kernels have been contributed by @ShreyasKhandekar.
This repo contains all the helper files as the original version, so you don't need to clone the original. Refer to that repo's quick start instructions to generate input files.
Very briefly:
python3 prepro_tinyshakespeare.py
python3 train_gpt2.py
will create the input files.
make train_gpt2
will compile the application, and
./train_gpt2
will launch it.
Tested with Chapel version 2.2.0 pre-release (c18eea7692). This code relied on fixes on 2.1. As such, it is not expected to work with 2.1 or before.
We are looking for contributors! There are two main items that can improve this implementation:
- Bring the port up-to-speed with the current upstream version
- Implement more GPU kernels
If you have other ideas or notice problems, please create an issue. If you intend to work on any of the existing issues, please drop a comment expressing your interest to avoid duplicate work.