Skip to content

LLaMA Model Annotated

Juncong Moo edited this page Apr 6, 2023 · 4 revisions

LLaMA, the latest language model proposed by Facebook Meta AI, claims to surpass the performance of GPT-3 on most tasks with a smaller size.

Due to the large size of the model, the current equipment has no way to support further experiments for the time being, but the model code has been open source, so you can first understand some details of the model structure through the code, today for the code released on github, learn about the model detail.

In addition, the model is actually a small improvement in the details of the transformer, and the more valuable work should be in data and training. By reading the code, you can review the basic structure of the transformer and understand how the large model distributes reasoning on multiple cards.

Since there are almost no comments in the source code of the project, this will definitely cause trouble for many students when reading, so this article will introduce the code part in detail by the way.

Python Lib Dependencies

The environment dependencies given by this project are only 4:

  • torch
  • fairscale
  • fire
  • sentencepiece

Among them, torch is not much to say, and fairscale is used for GPU distribution. Generally, fairscale is used when DDP still encounters the problem of super memory. At present, I have not tried fairscale. In the source code introduction below, I will use the corresponding basic network in torch to replace the structural layer in fairscale. Fire is a command-line tool, you can use it or not. Sentencepiece is a toolkit for tokenizer, which will be briefly introduced in the tokenizer section.

Model Structure

  • 65B

image

Clone this wiki locally