Large Language Models

A large language model, in the context of artificial intelligence and natural language processing, refers to a machine learning model that has been trained on a vast amount of text data to understand and generate human-like language. These models are typically designed to process and generate human language in a way that is contextually relevant and coherent.

The term "large" in this context usually indicates the size of the neural network architecture used to build the model. Larger models have more parameters, allowing them to capture more intricate patterns and nuances in language. These models are trained on diverse datasets, often containing a substantial portion of the internet's text, to learn grammar, vocabulary, facts, reasoning abilities, and even some degree of common sense.

One of the prominent examples of a large language model is OpenAI's GPT (Generative Pre-trained Transformer) series, such as GPT-3. These models have billions of parameters and exhibit impressive capabilities in tasks like language understanding, text completion, translation, summarization, and more. Large language models have found applications in various fields, including natural language processing, chatbots, content generation, and assisting with complex problem-solving tasks.

Implementation

The AI Toolbox implementation is based on the HuggingFace pipelines, which provide a common interface for LLMs. Pipelines input some text (commonly known as prompt) and returns text generated by the LLM (that's why they are called generative models). Pipelines typically consists of five (or more) basic phases:

Tokenize the prompt
Embed the tokens into a real-valued high dimension space
Run the embeddings through the language model
Decode the output using a specified technique
Lookup the tokens for the embeddings returned by the model

Tokenization

Tokenization splits the input text into pieces, mainly words or smaller part. Tokenizer typically is related to the actual model seletected.

Decoding strategies

LLM return probabilities of the next tokens. However, selecting the next token based on the probabilities is not straightforward. Common strategies are:

Greedy Search
Contrastive search
Beam-search
etc.

Read about strategies in more detail in the HuggaginFace page.

Models

HuggingFace supports a diverse set of LLM models, see here.

Fine tuning of LLMs

Fine tuning of LLMS for specific problems (e.g. context aware question-answering, etc.) can be divided into three groups:

Transfer learning the whole network
Transfer learning the attention network
Context injection (also known as Retrieval Augmented Generation)

Fine-tuning the whole network is very resource demanding, so it is not recommended generally. However, fine-tuning of the attention network can be performed easily, and the result is moderate both in size and resource demand. For fine-tuning e.g. the Falcon model, please see this article. Fine tuning by transfer learning requires prompt-response pairs to train on.

Context injection is much easier and more lightweight as transfer learning. Context injection is based on injecting context information into the prompt, so the LLM can be used as-is. An example for context injected prompt:

Answear the question bellow using the context provided!
CONTEXT: John Smith was born in 1956. Today is december 12, 2023.
QUESTION: How old is John Smith?

The context can be generated in variety of ways, see e.g. the Context-Injection Tool.

Tool content

By default, the tool contains a prepared LLama 3 7B model, as a deployable service in the file query.ipynb. LLama 3 7B requires at least 16GB of GPU memory! The inputs for the service:

token: a predefined token for security reasons (string)
system_prompt: the system prompt (string)
user_prompt: the user prompt

The responses:

resp: the response of the LLM model (string)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLMs.md

LLMs.md

Table of contents

Large Language Models

Implementation

Tokenization

Decoding strategies

Models

Fine tuning of LLMs

Tool content

Files

LLMs.md

Latest commit

History

LLMs.md

File metadata and controls

Table of contents

Large Language Models

Implementation

Tokenization

Decoding strategies

Models

Fine tuning of LLMs

Tool content