LLaMA Model Annotated

LLaMA, the latest language model proposed by Facebook Meta AI, claims to surpass the performance of GPT-3 on most tasks with a smaller size.

Due to the large size of the model, the current equipment has no way to support further experiments for the time being, but the model code has been open source, so you can first understand some details of the model structure through the code, today for the code released on github, learn about the model detail.

In addition, the model is actually a small improvement in the details of the transformer, and the more valuable work should be in data and training. By reading the code, you can review the basic structure of the transformer and understand how the large model distributes reasoning on multiple cards.

Since there are almost no comments in the source code of the project, this will definitely cause trouble for many students when reading, so this article will introduce the code part in detail by the way.

Python Lib Dependencies

The environment dependencies given by this project are only 4:

torch
fairscale
fire
sentencepiece

Among them, torch is not much to say, and fairscale is used for GPU distribution. Generally, fairscale is used when DDP still encounters the problem of super memory. At present, I have not tried fairscale. In the source code introduction below, I will use the corresponding basic network in torch to replace the structural layer in fairscale. Fire is a command-line tool, you can use it or not. Sentencepiece is a toolkit for tokenizer, which will be briefly introduced in the tokenizer section.

Model Structure

65B

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaMA Model Annotated

Python Lib Dependencies

Model Structure

Clone this wiki locally