Skip to content

Latest commit

 

History

History
304 lines (214 loc) · 16.1 KB

File metadata and controls

304 lines (214 loc) · 16.1 KB

DAOS Cluster Example

This example Terraform configuration demonstrates how to use the DAOS Terraform Modules to deploy a DAOS cluster consisting of servers and clients.

Pre-deployment

If you have not completed the pre-deployment steps please complete those steps before continuing to run this Terraform example.

Quickstart in Cloudshell

Click the button below to run this example in a Cloudshell tutorial. The tutorial will walk through each of the steps described in this README.md file.

DAOS on GCP Setup

Terraform Files

List of Terraform files in this example

Filename Description
main.tf Main Terrform configuration file containing resource definitions
variables.tf Variable definitions for variables used in main.tf
versions.tf Provider definitions
terraform.tfvars.perf.example Pre-Configured set of set of variables focused on performance
terraform.tfvars.tco.example Pre-Configured set of set of variables focused on lower total cost of ownership

Deploy a DAOS Cluster With This Example

The following sections describe how to deploy a DAOS cluster with this example Terraform configuration.

Create a terraform.tfvars file

Before you run any terraform commands you need to create a terraform.tfvars file in the terraform/examples/daos_cluster directory.

The terraform.tfvars file will contain the variable values for the configuration.

To ensure a successful deployment of a DAOS cluster there are two terraform.tfvars.*.example files that you can choose from.

You will need to decide which of these files to copy to terraform.tfvars.

The terraform.tfvars.tco.example file

The terraform.tfvars.tco.example contains variables for a DAOS cluster deployment with

  • 16 DAOS Client instances
  • 4 DAOS Server instances Each server instance has sixteen 375GB NVMe SSDs

To use the terraform.tfvars.tco.example file

cp terraform.tfvars.tco.example terraform.tfvars

The terraform.tfvars.perf.example file

The terraform.tfvars.perf.example contains variables for a DAOS cluster deployment with

  • 16 DAOS Client instances
  • 4 DAOS Server instances Each server instances has four 375GB NVMe SSDs

To use the terraform.tfvars.perf.example file run

cp terraform.tfvars.perf.example terraform.tfvars

Update terraform.tfvars with your project id

Now that you have a terraform.tfvars file you need to replace the <project_id> placeholder in the file with your GCP project id.

To update the project id in terraform.tfvars run

PROJECT_ID=$(gcloud config list --format 'value(core.project)')
sed -i "s/<project_id>/${PROJECT_ID}/g" terraform.tfvars

Deploy the DAOS cluster

Billing Notification!

Running this example will incur charges in your project.

To avoid surprises, be sure to monitor your costs associated with running this example.

Don't forget to shut down the DAOS cluster with terraform destroy when you are finished.

To deploy the DAOS cluster

terraform init
terraform plan -out=tfplan
terraform apply tfplan

Perform DAOS administration tasks

After your DAOS cluster has been deployed you can log into the first DAOS client instance to perform administrative tasks.

Log into the first DAOS client instance

Verify that the daos-client and daos-server instances are running

gcloud compute instances list \
  --filter="name ~ daos" \
  --format="value(name,INTERNAL_IP)"

Log into the first client instance

gcloud compute ssh daos-client-0001

Verify that all daos-server instances have joined

sudo dmg system query -v

The State column should display "Joined" for all servers.

Rank UUID                                 Control Address   Fault Domain      State  Reason
---- ----                                 ---------------   ------------      -----  ------
0    0796c576-5651-4e37-aa15-09f333d2d2b8 10.128.0.35:10001 /daos-server-0001 Joined
1    f29f7058-8abb-429f-9fd3-8b13272d7de0 10.128.0.77:10001 /daos-server-0003 Joined
2    09fc0dab-c238-4090-b3f8-da2bd4dce108 10.128.0.81:10001 /daos-server-0002 Joined
3    2cc9140b-fb12-4777-892e-7d190f6dfb0f 10.128.0.30:10001 /daos-server-0004 Joined

Create a Pool

Check free NVMe storage.

sudo dmg storage query usage

From the output you can see there are 4 servers each with 1.6TB free. That means there is a total of 6.4TB free.

Hosts            SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used
-----            --------- -------- -------- ---------- --------- ---------
daos-server-0001 48 GB     48 GB    0 %      1.6 TB     1.6 TB    0 %
daos-server-0002 48 GB     48 GB    0 %      1.6 TB     1.6 TB    0 %
daos-server-0003 48 GB     48 GB    0 %      1.6 TB     1.6 TB    0 %
daos-server-0004 48 GB     48 GB    0 %      1.6 TB     1.6 TB    0 %

Create one pool that uses the entire 6.4TB.

sudo dmg pool create -z 6.4TB -t 3 --label=pool1

For more information about pools see

Create a Container

Create a container in the pool

daos container create --type=POSIX --properties=rf:0 --label=cont1 pool1

For more information about containers see https://docs.daos.io/latest/overview/storage/#daos-container

Mount the container

Mount the container with dfuse

MOUNT_DIR="${HOME}/daos/cont1"
mkdir -p "${MOUNT_DIR}"
dfuse --singlethread --pool=pool1 --container=cont1 --mountpoint="${MOUNT_DIR}"
df -h -t fuse.daos

You can now store files in the DAOS container mounted on ${HOME}/daos/cont1.

For more information about DFuse see the DAOS FUSE section of the User Guide.

Use the Storage

The cont1 container is now mounted on ${HOME}/daos/cont1

Create a 20GiB file which will be stored in the DAOS filesystem.

cd ${HOME}/daos/cont1
time LD_PRELOAD=/usr/lib64/libioil.so \
  dd if=/dev/zero of=./test21G.img bs=1G count=20

Unmount the container

fusermount -u ${HOME}/daos/cont1

Remove DAOS cluster deployment

To destroy the DAOS cluster run

terraform destroy

This will shut down all DAOS server and client instances.

Terraform Documentation

Documentation for the terraform/examples/daos_cluster Terraform configuration.

Copyright 2022 Intel Corporation

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Requirements

Name Version
terraform >= 0.14.5
google >= 3.54.0

Providers

No providers.

Modules

Name Source Version
daos_client ../../modules/daos_client n/a
daos_server ../../modules/daos_server n/a

Resources

No resources.

Inputs

Name Description Type Default Required
allow_insecure Sets the allow_insecure setting in the transport_config section of the daos_*.yml files bool false no
client_gvnic Use Google Virtual NIC (gVNIC) network interface on DAOS clients bool false no
client_instance_base_name MIG instance base names to use string "daos-client" no
client_labels Set of key/value label pairs to assign to daos-client instances any {} no
client_machine_type GCP machine type. ie. c2-standard-16 string "c2-standard-16" no
client_mig_name MIG name string "daos-client" no
client_number_of_instances Number of daos clients to bring up number 4 no
client_os_disk_size_gb OS disk size in GB number 20 no
client_os_disk_type OS disk type ie. pd-ssd, pd-standard string "pd-ssd" no
client_os_family OS GCP image family string "daos-client-hpc-centos-7" no
client_os_project OS GCP image project name. Defaults to project_id if null. string null no
client_preemptible If preemptible instances string false no
client_service_account Service account to attach to the instance. See https://www.terraform.io/docs/providers/google/r/compute_instance_template.html#service_account.
object({
email = string,
scopes = set(string)
})
{
"email": null,
"scopes": [
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring.write",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/trace.append",
"https://www.googleapis.com/auth/cloud-platform"
]
}
no
client_template_name MIG template name string "daos-client" no
network_name Name of the GCP network to use string "default" no
project_id The GCP project to use string n/a yes
region The GCP region to create and test resources in string n/a yes
server_daos_crt_timeout crt_timeout number 300 no
server_daos_disk_count Number of local ssd's to use number 16 no
server_daos_disk_type Daos disk type to use. For now only suported one is local-ssd string "local-ssd" no
server_daos_scm_size scm_size number 200 no
server_gvnic Use Google Virtual NIC (gVNIC) network interface bool false no
server_instance_base_name MIG instance base names to use string "daos-server" no
server_labels Set of key/value label pairs to assign to daos-server instances any {} no
server_machine_type GCP machine type. ie. e2-medium string "n2-custom-36-215040" no
server_mig_name MIG name string "daos-server" no
server_number_of_instances Number of daos servers to bring up number 4 no
server_os_disk_size_gb OS disk size in GB number 20 no
server_os_disk_type OS disk type ie. pd-ssd, pd-standard string "pd-ssd" no
server_os_family OS GCP image family string "daos-server-centos-7" no
server_os_project OS GCP image project name. Defaults to project_id if null. string null no
server_pools List of pools and containers to be created
list(object({
name = string
size = string
tier_ratio = number
user = string
group = string
acls = list(string)
properties = map(any)
containers = list(object({
name = string
type = string
user = string
group = string
acls = list(string)
properties = map(any)
user_attributes = map(any)
}))
}))
[] no
server_preemptible If preemptible instances string false no
server_service_account Service account to attach to the instance. See https://www.terraform.io/docs/providers/google/r/compute_instance_template.html#service_account.
object({
email = string,
scopes = set(string)
})
{
"email": null,
"scopes": [
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring.write",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/trace.append",
"https://www.googleapis.com/auth/cloud-platform"
]
}
no
server_template_name MIG template name string "daos-server" no
subnetwork_name Name of the GCP sub-network to use string "default" no
subnetwork_project The GCP project where the subnetwork is defined string null no
zone The GCP zone to create and test resources in string n/a yes

Outputs

No outputs.