The Intel Cloud Optimization Modules (ICOMs) are open-source codebases with codified Intel AI software optimizations and instructions built specifically for each Cloud Service Provider (CSP). The ICOMs are built with production AI developers in mind, leveraging popular AI frameworks within the context of cloud services.
LLMs (Large Language Models) are becoming ubiquitous, but in many cases, you don't need the full capability of the latest GPT model. Additionally, when you have a specific task at hand, the performance of the biggest GPT model might not be optimal. Often, fine-tuning a small LLM on your dataset is sufficient. In this guide, you will learn how to fine-tune a GPT2-small (124M parameter) model on a cluster of CPUs on AWS. The objective here is not to arrive at a chatGPT-like AI model, but rather to understand how to set up distributed training so that you can fine-tune to your specific objective. The end result of training here will result in a base LLM that can generate words (or tokens), but it will only be suitable for your use-case when you train it on your specific task and dataset.
The GPT2-Small model will be trained on the OpenWebText dataset in a distributed setting, using 3rd or 4th Gen. Intel® Xeon® Scalable Processors. The project builds upon the initial codebase of nanoGPT, by Andrej Karpathy.
The nanoGPT implementation utilizes a custom dataloader mechanism that randomly samples the designated dataset for fine-tuning. This approach results in a higher volume of data being sampled as the number of nodes grows in the distributed system. For instance, if you have a 10GB dataset and wish to perform distributed training on half of it (5GB), the dataloader can be configured to utilize 5GB/3 of data per node. The data is sampled randomly per node and tokenized all at once, saving time by bypassing iterative tokenization during each training batch. Consequently, the implementation does not have a concept of "epochs" but rather relies on steps to pass all of the selected data through the model.
- AWS Prerequisites: Ensure that the correct AWS and hardware prerequisites are in place.
- Install Dependencies: Begin by installing the necessary dependencies and ensure that all required libraries and tools are setup correctly.
- Download the OpenWebText Data and Train on a Single CPU: Obtain the OpenWebText dataset from Hugging Face Hub. Preprocess and format the data appropriately for compatibility with nanoGPT implementation. Optionally, save both the raw and preprocessed data in an AWS S3 storage bucket. Test the fine-tuning script on a single CPU to understand the basic workflow and to make all dependencies are installed correctly.
- Preparing for Distributed Training: Configuring the necessary infrastructure like EC2 instances and security groups for distributed training on multiple CPUs.
- Fine-Tuning on Multiple CPUs: Once the distributed training environment is ready, perform fine-tuning on the cluster of Xeon CPUs to train the model quickly.
- Running Inference: Just as a quick inference example, we will run 1 sample through our trained model to generate some new text.
- Cleaning Up AWS Resources: Shut down EC2 instances, delete security groups, and erase S3 storage.
- Follow Up: Register for office hours and much more!
Before proceeding, ensure you have an AWS account and the necessary permissions to launch EC2 instances, create Amazon Machine Images (AMIs), create security groups, and create S3 storage buckets.
We used 3x m6i.4xlarge EC2 instances with Ubuntu 22.04 and 250 GB of storage each. The m6i.4xlarge instances have 16 vCPUs and 64 GB of memory. However, if you have access to 4th Gen. Xeon CPUs (R7iz) on AWS, these have significant additional built-in accelerations for deep learning, like Advanced Matrix Instructions (AMX). To maximize performance during fine-tuning, we recommend using bfloat16
precision when using 4th Gen. Xeon CPUs
In order to get started, you must first launch an EC2 instance and open it up in a command prompt. You can do so from the AWS console with the instructions that are found here.
If you are using a 4th Gen. Xeon CPU, you can verify that you have the AMX instruction set by running:
lscpu | grep amx
and you should see the following flags:
amx_bf16 amx_tile amx_int8
These flags indicate that the AMX instructions are available on your system, which are essential for leveraging mixed precision training and using bfloat16
. Please keep in mind that, for now, the AMX instruction set is only supported by 4th Gen. Xeon CPUs.
You are ready to set up the environment for fine-tuning the GPT2-small model.
Update the package manager and install tcmalloc for extra performance
sudo apt update
sudo apt install libgoogle-perftools-dev unzip -y
(Optional) If you wish to upload your dataset and processed files to S3, you can install AWS CLI. First, download and unzip the AWS CLI package:
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && unzip awscliv2.zip
Then, install AWS CLI using the provided script:
sudo ./aws/install
Ensure that the AWS CLI was installed properly by running the following:
aws --version
If you see the version details, that means you have succesfully installed AWS CLI. Remember to clean thing up after installation:
rm -r awscliv2.zip aws
To configure your AWS CLI with your credentials, you can run:
aws configure
Now let's set up a conda environment for fine-tuning GPT. First, download and install conda based on your operating system. You can find the download instructions here. The current commands for Linux are:
wget https://repo.anaconda.com/archive/Anaconda3-2023.07-1-Linux-x86_64.sh
bash ./Anaconda3-2023.07-1-Linux-x86_64.sh
To begin using conda, you have two options: restart the shell or execute the following command:
source ~/.bashrc
Running this command will source the ~/.bashrc file, which has the same effect as restarting the shell. This enables you to access and use conda for managing your Python environments and packages seamlessly.
Once conda is installed, create a virtual environment and activate it:
conda create -n cluster_env python=3.10
conda activate cluster_env
We have now prepared our environment and can move onto downloading data and training our GPT2-small model.
Clone this repo and install its dependencies:
git clone https://github.com/intel/intel-cloud-optimizations-aws
cd intel-cloud-optimizations-aws/distributed-training/nlp/src
pip install -r requirements.txt
In order to run distributed training, you can use the Intel® oneAPI Collective Communications Library (oneCCL). Download the appropriate wheel file and install it using the following commands:
wget https://intel-extension-for-pytorch.s3.amazonaws.com/torch_ccl/cpu/oneccl_bind_pt-1.13.0%2Bcpu-cp310-cp310-linux_x86_64.whl
pip install oneccl_bind_pt-1.13.0+cpu-cp310-cp310-linux_x86_64.whl
And you can delete the wheel file after installation:
rm oneccl_bind_pt-1.13.0+cpu-cp310-cp310-linux_x86_64.whl
Next, you can move onto downloading and processing the full OpenWebText dataset. This is all accomplished with one script.
python data/openwebtext/prepare.py --full
Note: The script can also upload both the raw data and processed files to S3. Ensure that you have prepared a S3 bucket before running this script. You also need to pass the bucket name to the script using
--bucket
argument as follows
python data/openwebtext/prepare.py --full --bucket <bucket_name>
The complete dataset takes up approximately 54GB in the Hugging Face .cache
directory and contains about 8 million documents (8,013,769). During the tokenization process, the storage usage might increase to around 120GB. The entire process can take anywhere from 1 to 3 hours, depending on your CPU's performance.
Upon successful completion of the script, two files will be generated:
train.bin
: This file will be approximately 17GB (~9B tokens) in size.val.bin
: This file will be around 8.5MB (~4M tokens) in size.
You should be able to run
ls -ltrh data/openwebtext/
and see the output like:
For future use on other systems, you can directly download the processed BIN files from S3 by executing download.py script. Follow the previous steps, and instead of running prepare.py, execute the download.py script as follows:
python data/openwebtext/download.py --bucket <bucket_name>
To streamline the training process, we will use the Hugging Face Accelerate library. Once you have the processed .bin
files, you are ready to generate the training config file by running the following accelerate command:
accelerate config --config_file ./single_config.yaml
When you run the above command, you will be prompted to answer a series of questions to configure the training process. Here's a step-by-step guide on how to proceed:
First, select This machine
as we are not using Amazon SageMaker.
In which compute environment are you running?
Please select a choice using the arrow or number keys, and selecting with enter
➔ This machine
AWS (Amazon SageMaker)
Next, since we are initially running the script on a single machine, select No distributed training.
Which type of machine are you using?
Please select a choice using the arrow or number keys, and selecting with enter
➔ No distributed training
multi-CPU
multi-XPU
multi-GPU
multi-NPU
TPU
You will be prompted to answer a few of yes/no questions. Here are the prompts and answers:
Do you want to run your training on CPU only (even if a GPU / Apple Silicon device is available)? [yes/NO]:yes
Do you want to use Intel PyTorch Extension (IPEX) to speed up training on CPU? [yes/NO]:yes
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
Do you want to use DeepSpeed? [yes/NO]: NO
At the very end, you will be asked to select mixed precision. Select bf16
on 4th Gen. Xeon CPUs; otherwise, you can select fp16
.
Do you wish to use FP16 or BF16 (mixed precision)?
Please select a choice using the arrow or number keys, and selecting with enter
no
➔ fp16
bf16
fp8
This will generate a config file and save it as single_config.yaml
in your current working directory.
We are now ready to start fine-tuning the GPT2-small model. To start the finetuning process, you can run the main.py
script. But instead of running it directly, you can use the accelerate launch
command along with the generated config file because accelerate
automatically selects the appropriate number of cores, device, and mixed precision settings based on the configuration file, streamlining the process and optimizing performance. You can begin training at this point with:
accelerate launch --config_file ./single_config.yaml main.py
This command will initiate the fine-tuning process, utilizing the settings specified in the single_config.yaml
file.
Note: By default, main.py uses the gpt2_train_cfg.yaml training configuration file:
data_dir: ./data/openwebtext
block_size: 1024
optimizer_config:
learning_rate: 6e-4
weight_decay: 1e-1
beta1: 0.9
beta2: 0.95
trainer_config:
device: cpu
mixed_precision: bf16 # fp32 or bf16.
eval_interval: 5 # how frequently to perform evaluation
log_interval: 1 # how frequently to print logs
eval_iters: 2 # how many iterations to perform during evaluation
eval_only: False
batch_size: 32
max_iters: 10 # total iterations
model_path: ckpt.pt
snapshot_path: snapshot.pt
gradient_accumulation_steps: 2
grad_clip: 1.0
decay_lr: True
warmup_iters: 2
lr_decay_iters: 10
max_lr: 6e-4
min_lr: 6e-5
You can review the file for batch_size
, device
, max_iters
, etc. and make changes as needed. If you prefer to use a different configuration file, you can make one and give it a new name like new_config.yaml
and pass it to main.py
using the --config-name
flag as follows:
accelerate launch --config_file ./single_config.yaml main.py --config-name new_config.yaml
Note: Accelerate by default will use the maximum number of physical cores (virtual cores excluded) by default. For experimental reasons, to control the number of threads, you can set
--num_cpu_threads_per_process
to the number of threads you wish to use. For example, if you want to run the script with only 4 threads:
accelerate launch --config_file ./single_config.yaml --num_cpu_threads_per_process 4 main.py
The script will train the model for a specified number of max_iters
iterations and perform evaluations at regular eval_interval
. If the evaluation score surpasses the previous model's performance, the current model will be saved in the current working directory under the name ckpt.pt
. It will also save the snapshot of the train progress under the name snapshot.pt
. You can easily customize these settings by modifying the values in the gpt2_train_cfg.yaml file.
We performed 10 iterations of training, successfully completing the process. During this training, the model was trained on a total of 320 samples. This was achieved with a batch size of 32 and taking approximately 32 minutes to complete.
The total dataset consists of approximately 8 million training samples, which would take a lot longer to train. However, the OpenWebText dataset was not built for a downstream task -- it is meant to replicate the entire training dataset used for the base model of GPT-2. There are many smaller datasets like the Alpaca dataset (with 52K samples) that would be quite feasible on a distributed setup similar to the one described here.
Note: In this fine-tuning process, we have opted not to use the standard PyTorch DataLoader. Instead, we have implemented a
get_batch
method that returns a batch of random samples from the dataset each time it is called. This implementation has been directly copied from the nanoGPT implementation.
Due to this specific implementation, we do not have the concept of epochs in the training process and instead are using iterations, where each iteration fetches a batch of random samples.
Next, we need to prepare a new accelerate
config for multi-CPU setup. But before setting up the multi-CPU environment, ensure you have the IP address of your machine handy. To obtain it, run the following command:
hostname -i
With the IP address ready, execute the following command to generate the new accelerate config for the multi-CPU setup:
accelerate config --config_file ./multi_config.yaml
When configuring the multi-CPU setup using accelerate config
, you will be prompted with several questions. To select the appropriate answers based on your environment. Here's a step-by-step guide on how to proceed:
First, select This machine
as we are not using Amazon SageMaker.
In which compute environment are you running?
Please select a choice using the arrow or number keys, and selecting with enter
➔ This machine
AWS (Amazon SageMaker)
Choose multi-CPU
as the type of machine for our setup.
Which type of machine are you using?
Please select a choice using the arrow or number keys, and selecting with enter
No distributed training
➔ multi-CPU
multi-XPU
multi-GPU
multi-NPU
TPU
Next, you can enter the number of instances you will be using. For example, here we have 3 (including the master node).
How many different machines will you use (use more than 1 for multi-node training)? [1]:
Concerning the rank, since we are initially running this from the master node, enter 0
. For each machine, you will need to change the rank accordingly.
What is the rank of this machine?
Please select a choice using the arrow or number keys, and selecting with enter
➔ 0
1
2
Next, you will need to provide the private IP address of the machine where you are running the accelerate launch
command, that we found earlier with hostname -i
.
What is the IP address of the machine that will host the main process?
Next, you can enter the port number to be used to communication. Commonly used port is 29500, but you can choose any available port.
What is the port you will use to communicate with the main process?
You will be prompted with a few more questions. Provide the required information as per your setup.
The prompt of
How many CPU(s) should be used for distributed training?
is actually about CPU sockets. Generally, each machine will have only 1 CPU socket. However, in the case of bare metal instances, you may have 2 CPU sockets per instance. Enter the appropriate number of sockets based on your instance configuration.
After completing the configuration, you will be ready to launch the multi-CPU fine-tuning process. The final output should look something like:
------------------------------------------------------------------------------------------------------------------------------------------
In which compute environment are you running?
This machine
------------------------------------------------------------------------------------------------------------------------------------------
Which type of machine are you using?
multi-CPU
How many different machines will you use (use more than 1 for multi-node training)? [1]: 3
------------------------------------------------------------------------------------------------------------------------------------------
What is the rank of this machine?
0
What is the IP address of the machine that will host the main process? xxx.xxx.xxx.xxx
What is the port you will use to communicate with the main process? 29500
Are all the machines on the same local network? Answer `no` if nodes are on the cloud and/or on different network hosts [YES/no]: no
What rendezvous backend will you use? ('static', 'c10d', ...): static
Do you want to use Intel PyTorch Extension (IPEX) to speed up training on CPU? [yes/NO]:yes
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
How many CPU(s) should be used for distributed training? [1]:1
------------------------------------------------------------------------------------------------------------------------------------------
Do you wish to use FP16 or BF16 (mixed precision)?
bf16
You now should have generated a new config file named multi_config.yaml
in your current working directory. Before creating an AMI from this volume, make sure to delete the snapshot.pt
file. If this file exists, the main.py
script will resume training from the snapshot, which might not be desired when creating an AMI.
rm snapshot.pt
Now that we have this running in a single system, let's try to run it on multiple systems. To prepare for distributed training and ensure a consistent setup across all systems, follow these steps:
-
Create an AMI: Start by creating an Amazon Machine Image (AMI) from the existing instance where you have successfully run the fine-tuning on a single system. This AMI will capture the entire setup, including the dependencies, configurations, codebase, and dataset. To create an AMI, refer to Create a Linux AMI from an instance.
-
Security Group: While waiting for the AMI creation, let's continue by creating a security group that enables communication among the member nodes. This security group should be configured to allow inbound and outbound traffic on the necessary ports for effective communication between the master node and the worker nodes.
In the security group configuration, ensure that you have allowed all traffic originating from the security group itself. This setting allows seamless communication between the instances within the security group.
Please refer to the following screenshot as an example:
By setting up the security group in this manner, you ensure that all necessary traffic can flow between the master node and the worker nodes during distributed training.
-
Launch new instances: Use the created AMI to launch new instances, specifying the desired number of instances based on the number of systems you want to use for distributed training. This ensures that all the instances have the same environment and setup. To initiate new EC2 instances, there are two options available: using the AWS console, or AWS CLI. If you have AWS CLI configured, you can launch instances by executing the following command:
aws ec2 run-instances --image-id ami-xxxxxxxx --count 2 --instance-type m6i.4xlarge --key-name <MyKeyPair> --security-group-ids sg-xxxxxxxx --subnet-id subnet-xxxxxx
replacing the X's with the numbers associated to your AWS configurations, and replacing
<MyKeyPair>
with your key pair. -
Passwordless SSH: Set up passwordless SSH from the master node to all the worker nodes. To enable passwordless SSH, configure the master instance's SSH public key to be authorized on all other nodes. This will ensure SSH access without prompts between the master and worker nodes. To enable passwordless SSH, follow these steps:
-
Verify SSH Access: First, check if you can SSH into the other nodes from the master node. Use the private IP address and the appropriate username for each node.
ssh <username>@<ip-address>
Successful SSH connections will indicate that the inbound rules of the security group are correctly set up. In case of any issues, check the network settings.
-
Generate SSH Key Pair: On the master node, run the following command to generate an SSH key pair:
ssh-keygen
You will be prompted to enter a passphrase for the key. You can choose to enter a passphrase or leave it blank for no passphrase. For simplicity in this guide, it is recommended to leave it blank. The key pair will be generated and saved in the
~/.ssh
directory, with two files:~/.ssh/id_rsa
(private key) and~/.ssh/id_rsa.pub
(public key). For security, set appropriate permissions on the private key:chmod 600 ~/.ssh/id_rsa
-
Propagate the Public Key to Remote Systems: To transfer the public key to the remote hosts, use the
ssh-copy-id
command. If password authentication is currently enabled, this is the easiest way to copy the public key:ssh-copy-id <username>@<private-ip-address>
This command will copy the public key to the specified remote host. You will have to run this command from the master node to copy the public key to all other nodes.
-
Verify Passwordless SSH: After copying the public key to all nodes, verify that you can connect using the key pair:
ssh <username>@<private-ip-address>
If you can successfully log in without entering a password, it means passwordless SSH is set up correctly.
By following above steps, you will establish passwordless SSH between the master node and all worker nodes, ensuring smooth communication and coordination during distributed training. If you encounter any difficulties, additional information can be found here.
-
Next, to continue setting up the cluster, you will need to edit the SSH configuration file located at ~/.ssh/config
on the master node. The configuration file should look like this:
Host 10.*.*.*
StrictHostKeyChecking no
Host node1
HostName 10.0.xxx.xxx
User ubuntu
Host node2
HostName 10.0.xxx.xxx
User ubuntu
The StrictHostKeyChecking no
line disables strict host key checking, allowing the master node to SSH into the worker nodes without prompting for verification.
With these settings, you can check your passwordless SSH by executing ssh node1
or ssh node2
to connect to any node without any additional prompts.
Additionally, on the master node, you will create a host file (~/hosts
) that includes the names of all the nodes you want to include in the training process, as defined in the SSH configuration above. Use localhost
for the master node itself as you will launch the training script from the master node. The hosts
file will look like this:
localhost
node1
node2
This setup will allow you to seamlessly connect to any node in the cluster for distributed training.
Before beginning the fine-tuning process, it is important to update the machine_rank
value on each machine. Follow these steps for each worker machine:
- SSH into the worker machine.
- Locate and open the
multi_config.yaml
file. - Update the value of the
machine_rank
variable in the file. Assign the rank to the worker nodes starting from 1.- For the master node, set the rank to 0.
- For the first worker node, set the rank to 1.
- For the second worker node, set the rank to 2.
- Continue this pattern for additional worker nodes.
By updating the machine_rank
, you ensure that each machine is correctly identified within the distributed training setup. This is crucial for the successful execution of the fine-tuning process.
To train PyTorch models in a distributed setting on Intel hardware, we utilize Intel's MPI (Message Passing Interface) implementation. This implementation provides flexible, efficient, and scalable cluster messaging on Intel architecture. The Intel® oneAPI HPC Toolkit includes all the necessary components, including oneccl_bindings_for_pytorch
, which is installed alongside the MPI toolset.
To use oneccl_bindings_for_pytorch
, you simply need to source the environment by running the following command:
oneccl_bindings_for_pytorch_path=$(python -c "from oneccl_bindings_for_pytorch import cwd; print(cwd)")
source $oneccl_bindings_for_pytorch_path/env/setvars.sh
This command sets up the environment variables required for utilizing oneccl_bindings_for_pytorch
and enables distributed training using Intel MPI.
Note: In a distributed setting,
mpirun
can be used to run any program, not just for distributed training. It allows you to execute parallel applications across multiple nodes or machines, leveraging the capabilities of MPI (Message Passing Interface).
Finally, it's time to run the fine-tuning process on multi-CPU setup. The following command be used to launch distributed training:
mpirun -f ~/hosts -n 3 -ppn 1 -genv LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libtcmalloc.so" accelerate launch --config_file ./multi_config.yaml --num_cpu_threads_per_process 8 main.py
Some notes on the arguments for mpirun
to consider:
-n
: This parameter represents the number of CPUs or nodes. In our case, we specified-n 3
to run on 3 nodes. Typically, it is set to the number of nodes you are using. However, in the case of bare metal instances with 2 CPU sockets per board, you would use2n
to account for the 2 sockets.-ppn
: The "process per node" parameter determines how many training jobs you want to start on each node. We only want 1 instance of each training to be run on each node, so we set this to-ppn 1
.-genv
: This argument allows you to set an environment variable that will be applied to all processes. We used it to set theLD_PRELOAD
environment variable to use thelibtcmaclloc
performance library.num_cpu_threads_per_process
: Thenum_cpu_threads_per_process
argument specifies the number of CPU threads that PyTorch will use per process. We set this to use 8 threads in our case. When running deep learning tasks, it is best practice to use only the physical cores of your processor (which in our case is 8).
Here is what the final output for distributed training would look like.
By adopting distributed training techniques, we witness a remarkable improvement in data processing efficiency. In approximately 29 minutes, we process three times the amount of data as compared to non-distributed training methods. Additionally, we get a lower loss value indicating better model generalization. This substantial speed boost and better generalization is a testament to the immense advantages of leveraging distributed training. Distributed training is of paramount importance in modern machine learning and deep learning scenarios. Its significance lies in the following aspects:
-
Faster Training: As demonstrated in the output, distributed training significantly reduces the training time for large datasets. It allows parallel processing across multiple nodes, which accelerates the training process and enables efficient utilization of computing resources.
-
Scalability: With distributed training, the model training process can easily scale to handle massive datasets, complex architectures, and larger batch sizes. This scalability is crucial for handling real-world, high-dimensional data.
-
Model Generalization: Distributed training enables access to diverse data samples from different nodes, leading to improved model generalization. This, in turn, enhances the model's ability to perform well on unseen data.
Training the model on the Alpaca Dataset using a single CPU is estimated to take around 87 hours, approximately 3.5 days. However, the advantage of utilizing a distributed cluster with 3 CPUs becomes apparent, as the training duration can be reduced to less than 30 hours. This represents a significant improvement in efficiency, showcasing that a distributed setup can facilitate the rapid fine-tuning of Large Language Model (LLMs) on private datasets containing 50K - 100K samples in just a matter of days.
Overall, distributed training is an indispensable technique that empowers data scientists, researchers, and organizations to efficiently tackle complex machine learning tasks and achieve superior results.
One last thing before we sign-off. Now that we have trained our model, let's try to generate some text.
python sample.py --ckpt_path=ckpt.pt
The script is designed to generate sample text containing 100 tokens. By default, the input prompt for generating these samples is the It is interesting
character. However, you also have the option to specify your own prompt by using the --prompt
argument as follows:
python sample.py --ckpt_path=ckpt.pt --prompt="This is new prompt "
Below is one sample generated text from the It is interesting
input:
Input Prompt: It is interesting
--------------- Generated Text ---------------
It is interesting that the government has a good idea of what the government is doing, and it does it.
I don't think the government has a good idea of what the government is doing.
I don't think the government has a good idea of what the government is doing.
I think the government has a good idea of what the government is doing, and it does it.
The second is that the government is not the government. The government is not the government.
----------------------------------------
This example does illustrate that the language model can generate text, but it is not useful in its current form until fine-tuned on downstream tasks. While there is repetition in the tokens here, this module's primary focus was on the successful distributed training process and leveraging the capabilities of the Intel hardware effectively.
During the fine-tuning process on the cluster of three 3rd Gen. Xeon CPUs, we successfully trained the GPT2-small model. The performance achieved is notable, and is expected to perform at around 3x this rate when using 4th Gen. Xeon CPUs.
Ensure that you properly remove and clean up all the resources created during the course of following this module. To delete EC2 instances and a security group using the AWS CLI, you can use the following commands:
- Delete EC2 instances:
aws ec2 terminate-instances --instance-ids <instance_id1> <instance_id2> ... <instance_idN>
Replace <instance_id1>
, <instance_id2>
, ..., <instance_idN>
with the actual instance IDs you want to terminate. You can specify multiple instance IDs separated by spaces.
- Delete Security Group:
aws ec2 delete-security-group --group-id <security_group_id>
Replace <security_group_id>
with the ID of the security group you want to delete.
Please be cautious when using these commands, as they will permanently delete the specified EC2 instances and security group. Double-check the instance IDs and security group ID to avoid accidental deletions.
- Delete S3 storage If you saved any data to an S3 bucket, you can delete the data with the following command:
aws s3 rm s3://<bucket>/folder --recursive
where <bucket>
is your particular S3 storage bucket.
-
Register for Office Hours here for help on your ICOM implementation.
-
Learn more about all of our Intel Cloud Optimization Modules here.
-
Come chat with us on our Intel DevHub Discord server to keep interacting with fellow developers.
-
Stay connected with us on social media: