Clusters in AWS ParallelCluster share similar components: a head-node, compute nodes (typically P or Trn EC2 family of instances) and one or multiple shared filesystems (FSx for Lustre). You will find below a section on the architectures themselves and how to deploy them. After this section, you will be brief on key elements of these templates (or things you wanna know to avoid potential mistakes).
To create the cluster use the command below and replace CLUSTER_CONFIG_FILE
by the path to the cluster configuration file (see next section) and NAME_OF_YOUR_CLUSTER
by the name of your cluster (realpotato
is a cool name).
pcluster create-cluster --cluster-configuration CLUSTER_CONFIG_FILE --cluster-name NAME_OF_YOUR_CLUSTER --region us-east-1
You can follow the documentation to review the list of all AWS ParallelCluster commands.
Each reference architectures provides an example of cluster for different use cases. The architectures most commonly used are:
distributed-training-gpu
: base template, uses the default AMI with no software installed.distributed-training-p4de_custom_ami
: base cluster with a custom AMI to install custom software.distributed-training-p4de_postinstall_scripts
: same as above but uses post-install scripts to install Docker, Pyxis and Enroot.
Alternatively you can refer to these architectures for more specific use cases:
distributed-training-p4de_batch-inference-g5_custom_ami
: multi-queue template with p4de for training and g5 for inference. It assumes a custom AMI.distributed-training-trn1_custom_ami
: uses Trainium instances for distributed training. Assumes a custom AMI.
The templates contain placeholder variables that you need to replace before use.
PLACEHOLDER_CUSTOM_AMI_ID
: if using a custom AMI then replace with the custom AMI ID (ami-12356790abcd
).PLACEHOLDER_PUBLIC_SUBNET
: change to the id of a public subnet to host the head-node (subnet-12356790abcd
).PLACEHOLDER_PRIVATE_SUBNET
: change to the id of a public subnet to host the compute nodes (subnet-12356790abcd
).PLACEHOLDER_SSH_KEY
: ID of the SSH key you'd like to use to connect to the head-node. You can also use AWS Systems Manager Session Manager (SSM).PLACEHOLDER_CAPACITY_RESERVATION_ID
: if using a capacity reservation put the ID here (cr-12356790abcd
).
Compute is represented through the following:
- Head-node: login and controller node that users will use to submit jobs. It is set to an m5.8xlarge..
- Compute-gpu: is the queue (or partition) to run your ML training jobs. The instances are either p4de.24xlarge or trn1.32xlarge which are recommended for training, especially for LLMs or large models. The default number of instances in the queue has been set to 4 and can be changed as necessary.
- Inference-gpu: is an optional queue that can be used to run inference workloads and uses g5.12xlarge.
Storage comes in 3 flavors:
- Local: head and compute nodes have 200GiB of EBS volume mounted on
/
. In addition, the headnode has an EBS volume of200GiB
mounted on/apps
The compute nodes have NVMe drives striped in RAID0 and mounted as/local_scratch
. - File network storage: The head-node shares
/home
and/apps
to the whole cluster through NFS. These directories are automatically mounted on every instance in the cluster and accessible through the same path./home
is a regular home directory,/apps
is a shared directory where applications or shared files can be stored. Please note that none should be used for data intensive tasks. - High performance filesystem: An FSx for Lustre filesystem can be access from every cluster node on
/fsx
. This is where users would store their datasets. This file system has been sized to 4.8TiB and provides 1.2GB/s of aggregated throughput. You can modify its size and the throughput per TB provisioned in the config file following the service documentation.
Applications will make use of Elastic Fabric Adapter (EFA) for distributed training. In addition, instances will be placed to one another through the use of placement groups or assistance from AWS.
Placement groups are only relevant for distributed training, not inference. You may remove the placement groups declaration in the config file if requested. In which case you will need to delete these lines
PlacementGroup:
Enabled: true
You can chose to use a custom image or post-install scripts to install your application stack.
- Custom images: the image needs to be pre-built before creating a cluster. They are preferred for drivers, kernel modules or libraries regularly used and seeing little to no updates. This option is preferred to ensure repeatability. You can use custom images as follows:
If not using a custom image, remove the
Image: Os: alinux2 #system type CustomAmi: PLACEHOLDER_CUSTOM_AMI_ID #replace by custom imageAMI ID
CustomAmi
field. - Post-install scripts: these scripts will be executed at instance boot (head+compute). This option is recommended for quick testing and will increase instance boot time. You can run post-install scripts through
CustomActions
for the head node and the compute nodes.