From 8ab354abbfa3197a1b72619795aea14411c55865 Mon Sep 17 00:00:00 2001 From: Mateusz Charytoniuk Date: Fri, 5 Jul 2024 17:41:01 +0200 Subject: [PATCH] chore: move the tutorials --- infra/README.md | 3 +- .../tutorial-installing-llamacpp-aws-cuda.md | 147 -------------- ...stalling-llamacpp-aws-ec2-image-builder.md | 179 ------------------ 3 files changed, 1 insertion(+), 328 deletions(-) delete mode 100644 infra/tutorial-installing-llamacpp-aws-cuda.md delete mode 100644 infra/tutorial-installing-llamacpp-aws-ec2-image-builder.md diff --git a/infra/README.md b/infra/README.md index b83de03..14c3e4f 100644 --- a/infra/README.md +++ b/infra/README.md @@ -4,5 +4,4 @@ This folder contains example deployment instructions for deploying llama.cpp and ## Tutorials -- [Installing llama.cpp on AWS EC2 CUDA Instance](tutorial-installing-llamacpp-aws-cuda.md) -- [Installing llama.cpp with AWS EC2 Image Builder](tutorial-installing-llamacpp-aws-ec2-image-builder) +Tutorials that were here have been moved to [LLMOps Handbook](https://llmops-handbook.distantmagic.com/). diff --git a/infra/tutorial-installing-llamacpp-aws-cuda.md b/infra/tutorial-installing-llamacpp-aws-cuda.md deleted file mode 100644 index d3920b6..0000000 --- a/infra/tutorial-installing-llamacpp-aws-cuda.md +++ /dev/null @@ -1,147 +0,0 @@ -# Installation on AWS EC2 CUDA Instances - -This tutorial was tested on `g4dn.xlarge` instance with `Ubuntu 22.04` operating -system. This tutorial was written explicitly to perform the installation on a `Ubuntu 22.04` machine. - -## Installation Steps - -1. Start an EC2 instance of any class with a GPU with CUDA support. - - If you want to compile llama.cpp on this instance, you will need at least 4GB for CUDA drivers and enough space for your LLM of choice. I recommend at least 30GB. Perform the following steps of this tutorial on the instance you started. - -2. Install build dependencies: - ```shell - sudo apt update - ``` - ```shell - sudo apt install build-essential ccache - ``` - -3. Install CUDA Toolkit (only the Base Installer). Download it and follow instructions from - https://developer.nvidia.com/cuda-downloads - - At the time of writing this tutorial, the highest available supported version of the Ubuntu version was 22.04. But do not fear! :) We'll get it to work with some minor workarounds (see the [Potential Errors](#potential-errors) section) - -4. Install NVIDIA Drivers: - ```shell - sudo apt install nvidia-driver-555 - ``` - -5. Compile llama.cpp: - ```shell - git clone https://github.com/ggerganov/llama.cpp.git - ``` - ```shell - cd llama.cpp - ``` - ```shell - GGML_CUDA=1 make -j - ``` -5. Benchmark llama.cpp (optional): - - Follow the official tutorial if you intend to run the benchmark. However, keep using `GGML_CUDA=1 make` to compile the llama.cpp (do *not* use `LLAMA_CUBLAS=1`): - https://github.com/ggerganov/llama.cpp/discussions/4225 - - Instead of performing a model quantization yourself, you can download quantized models from Hugging Face. For example, `Mistral Instruct` you can download from https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/tree/main - -## Potential Errors - -### CUDA Architecture Must Be Explicitly Provided - -``` -ERROR: For CUDA versions < 11.7 a target CUDA architecture must be explicitly -provided via environment variable CUDA_DOCKER_ARCH, e.g. by running -"export CUDA_DOCKER_ARCH=compute_XX" on Unix-like systems, where XX is the -minimum compute capability that the code needs to run on. A list with compute -capabilities can be found here: https://developer.nvidia.com/cuda-gpus -``` - -You need to check the mentioned page (https://developer.nvidia.com/cuda-gpus) -and pick the appropriate version for your instance's GPU. `g4dn` instances -use T4 GPU, which would be `compute_75`. - -For example: - -```shell -CUDA_DOCKER_ARCH=compute_75 GGML_CUDA=1 make -j -``` - -### Failed to initialize CUDA - -``` -ggml_cuda_init: failed to initialize CUDA: unknown error -``` - -Sometimes can be solved with `sudo modprobe nvidia_uvm`. - -You can also create a Systemd unit that loads the module on boot: - -```ini -[Unit] -After=nvidia-persistenced.service - -[Service] -Type=oneshot -ExecStart=/usr/sbin/modprobe nvidia_uvm - -[Install] -WantedBy=multi-user.target -``` - -### NVCC not found - -``` -/bin/sh: 1: nvcc: not found -``` - -You need to add CUDA path to your shell environmental variables. - -For example, with Bash and CUDA 12: - -```shell -export PATH="/usr/local/cuda-12/bin:$PATH" -``` -```shell -export LD_LIBRARY_PATH="/usr/local/cuda-12/lib64:$LD_LIBRARY_PATH" -``` - -### cannot find -lcuda - -``` -/usr/bin/ld: cannot find -lcuda: No such file or directory -``` - -That means your Nvidia drivers are not installed. Install NVIDIA Drivers first. - -### Cannot communicate with NVIDIA driver - -``` -NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. -``` - -If you installed the drivers, reboot the instance. - -### Failed to decode the batch - -``` -failed to decode the batch, n_batch = 0, ret = -1 -main: llama_decode() failed -``` - -There are two potential causes of this issue. - -#### Option 1: Install NVIDIA drivers - -Make sure you have installed the CUDA Toolkit and NVIDIA drivers. If you do, restart your server and try again. Most likely, NVIDIA kernel modules are not loaded. - -```shell -sudo reboot -``` - -#### Option 2: Use different benchmarking parameters - -For example, with `Mistral Instruct 7B` what worked for me is: - -```shell -./llama-batched-bench -m ../mistral-7b-instruct-v0.2.Q4_K_M.gguf 2048 2048 512 0 999 128,256,512 128,256 1,2,4,8,16,32 -``` diff --git a/infra/tutorial-installing-llamacpp-aws-ec2-image-builder.md b/infra/tutorial-installing-llamacpp-aws-ec2-image-builder.md deleted file mode 100644 index ab9871e..0000000 --- a/infra/tutorial-installing-llamacpp-aws-ec2-image-builder.md +++ /dev/null @@ -1,179 +0,0 @@ -# Installing llama.cpp with AWS EC2 Image Builder - -This tutorial explains how to install [llama.cpp](https://github.com/ggerganov/llama.cpp) with [AWS EC2 Image Builder](https://aws.amazon.com/image-builder/). - -By putting [llama.cpp](https://github.com/ggerganov/llama.cpp) in EC2 Image Builder pipeline, you can automatically build custom AMIs with [llama.cpp](https://github.com/ggerganov/llama.cpp) pre-installed. - -You can also use that AMI as a base and add your foundational model on top of it. Thanks to that, you can quickly scale up or down your [llama.cpp](https://github.com/ggerganov/llama.cpp) groups. - -We will repackage [the base EC2 tutorial](tutorial-installing-llamacpp-aws-cuda.md) as a set of Image Builder Components and Workflow. - -You can complete the tutorial steps either manually or by automating the setup with [Terraform](https://www.terraform.io/)/[OpenTofu](https://opentofu.org/). Terraform source files are linked to their respective tutorial steps. - -## Installation Steps - -1. Create an IAM `imagebuilder` role ([source file](terraform/aws/aws_iam_role_imagebuilder_role.tf)) - - Go to the IAM Dashboard, click "Roles" from the left-hand menu, and select "AWS service" as the trusted entity type. Next, select "EC2" as the use case: - - ![screenshot-01](https://github.com/malzag/paddler/assets/12105347/9c841ee9-0f19-48fc-8386-4b5cb7507a4b) - - Next, assign the following policies: - - - `arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role` - - `arn:aws:iam::aws:policy/EC2InstanceProfileForImageBuilderECRContainerBuilds` - - `arn:aws:iam::aws:policy/EC2InstanceProfileForImageBuilder` - - `arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore` - - Name your role (for example, "imagebuilder") and finish creating it. You should end up with permissions and trust relationships looking like this: - - ![screenshot-02](https://github.com/malzag/paddler/assets/12105347/cc6e56f1-91e0-472a-814d-6c9dc0c9ba81) - ![screenshot-03](https://github.com/malzag/paddler/assets/12105347/97dee654-c146-4e68-b2a2-05a2a433b545) - -2. Create components. - - We'll need the following four components: - * llama.cpp build dependencies. It needs to install `build-essentials` and `ccache` ([source file](terraform/aws/aws_imagebuilder_component_apt_build_essential.tf)) - * CUDA toolkit ([source file](terraform/aws/aws_imagebuilder_component_cuda_toolkit_12.tf)) - * NVIDIA driver ([source file](terraform/aws/aws_imagebuilder_component_apt_nvidia_driver_555.tf)) - * llama.cpp itself ([source file](terraform/aws/aws_imagebuilder_component_llamacpp_gpu_compute_75.tf)) - - To create the component via GUI, navigate to EC2 Image Builder service on AWS. From there, select "Components" from the menu. We'll need to add four components that will act as the building blocks in our Image Builder pipeline. You can refer to [the generic EC2 tutorial for more details](tutorial-installing-llamacpp-aws-cuda.md) for more information. - - Click "Create component". Next, for each component: - - - Choose "Build" as the component type - - Select "Linux" as the image OS - - Select "Ubuntu 22.04" as the compatible OS version - - Provide the following as component names and contents in YAML format: - - **Component name: apt_build_essential** - ```yaml - name: apt_build_essential - description: "Component to install build essentials on Ubuntu" - schemaVersion: '1.0' - phases: - - name: build - steps: - - name: InstallBuildEssential - action: ExecuteBash - inputs: - commands: - - sudo apt-get update - - DEBIAN_FRONTEND=noninteractive sudo apt-get install -yq build-essential ccache - onFailure: Abort - timeoutSeconds: 180 - ``` - - - **Component name: apt_nvidia_driver_555** - ```yaml - name: apt_nvidia_driver_555 - description: "Component to install NVIDIA driver 550 on Ubuntu" - schemaVersion: '1.0' - phases: - - name: build - steps: - - name: apt_nvidia_driver_555 - action: ExecuteBash - inputs: - commands: - - sudo apt-get update - - DEBIAN_FRONTEND=noninteractive sudo apt-get install -yq nvidia-driver-550 - onFailure: Abort - timeoutSeconds: 180 - - name: reboot - action: Reboot - ``` - - - **Component name: cuda_toolkit_12** - ```yaml - name: cuda_toolkit_12 - description: "Component to install CUDA Toolkit 12 on Ubuntu" - schemaVersion: '1.0' - phases: - - name: build - steps: - - name: apt_cuda_toolkit_12 - action: ExecuteBash - inputs: - commands: - - DEBIAN_FRONTEND=noninteractive sudo apt-get -yq install nvidia-cuda-toolkit - onFailure: Abort - timeoutSeconds: 600 - - name: reboot - action: Reboot - ``` - - - **Component name: llamacpp_gpu_compute_75** - ```yaml - name: llamacpp_gpu_compute_75 - description: "Component to install and compile llama.cpp with CUDA compute capability 75 on Ubuntu" - schemaVersion: '1.0' - phases: - - name: build - steps: - - name: compile - action: ExecuteBash - inputs: - commands: - - cd /opt - - git clone https://github.com/ggerganov/llama.cpp.git - - cd llama.cpp - - | - CUDA_DOCKER_ARCH=compute_75 \ - LD_LIBRARY_PATH="/usr/local/cuda-12/lib64:$LD_LIBRARY_PATH" \ - GGML_CUDA=1 \ - PATH="/usr/local/cuda-12/bin:$PATH" \ - make -j - onFailure: Abort - timeoutSeconds: 1200 - ``` - - Once you're finished, you'll see all the created components you added on the list: - - ![screenshot-04](https://github.com/malzag/paddler/assets/12105347/c3d082a8-1971-471a-84a4-b806a14dd899) - -3. Add Infrastructure Configuration [source file](terraform/aws/aws_imagebuilder_infrastructure_configuration_llamacpp_gpu_compute_75.tf) - - Next, we'll create a new Infrastructure Configuration. Select it from the left-hand menu and click "Create". You'll need to use `g4dn.xlarge` instance type or any other instance type that supports CUDA. Name your configuration, select the IAM role you created in step 1, and select the instance, for example: - - ![screenshot-05](https://github.com/malzag/paddler/assets/12105347/9f5777b9-721e-4760-884b-e117b2bbc8a3) - -4. Add Distribution Configuration [source file](terraform/aws/aws_imagebuilder_distribution_configuration_compute_75.tf) - - Select Distribution settings in the left-hand menu to create a Distribution Configuration. It specifies how the AMI should be distributed (on what type of base AMI it will be published). Select Amazon Machine Image, name the configuration, and save: - - ![screenshot-06](https://github.com/malzag/paddler/assets/12105347/1f01e63d-db21-4bb4-906b-df4ea51e43b7) - -5. Add Image Pipeline [source file](terraform/aws/aws_imagebuilder_image_pipeline_llamacpp_gpu_compute_75.tf) - - Next, we'll add the Image Pipeline. It will use the Components, Infrastructure Configuration, and Distribution Configuration we prepared previously. Select "Imagie Pipeline" from the left-hand menu and click "Create". Name your image pipeline, and select the desired build schedule. - - As the second step, create a new recipe. Choose AMI, name the recipe: - - ![screenshot-07](https://github.com/malzag/paddler/assets/12105347/1d89b1ca-265b-4195-88e5-a965e124858f) - - Next, select the previously created components: - - ![screenshot-08](https://github.com/malzag/paddler/assets/12105347/c0fef492-dd04-40d6-b3d1-066c7baaf2d3) - -6. The next step is to build the image. You should be able to run the pipeline: - - ![screenshot-09](https://github.com/malzag/paddler/assets/12105347/c1e54bcd-9f8f-44bb-a1e1-e6bde546fbc4) - -7. Launch test EC2 Instance. - - When launching EC2 instance, the llama.cpp image we prepared should be available under `My AMIs` list: - - ![screenshot-10](https://github.com/malzag/paddler/assets/12105347/7e56bb7e-f458-4b4a-89c2-51dd35e656e9) - - -## Summary - -Check out `infra/terraform/aws` if you have any issues. Feel free to open an issue if you find a bug in the tutorial or have ideas on how to improve it. - -Contributions are always welcomed!