This implements a data ingestor required for efficient efficient and parallel data ingestion to the Avere vFXT as shown in the following diagram:
The data ingestor has tools such as the msrsync
utility and the parallelcp
script. To install a data ingestor VM containing all of these parallel data ingestion tools, we will have the client VM pull and run the data ingestor install script from the Avere vFXT mount using one of the generic virtual machine clients. Before deploying the client VM, first setup the Avere vFXT with the install file:
-
If you have not already done so, ssh to the controller, and mount to the Avere vFXT:
-
Run the following commands:
sudo -s apt-get update apt-get install nfs-common mkdir -p /nfs/node0 chown nobody:nogroup /nfs/node0
-
Edit
/etc/fstab
to add the following lines but using your vFXT node IP addresses. Add more lines if your cluster has more than three nodes.10.0.0.12:/msazure /nfs/node0 nfs hard,nointr,proto=tcp,mountproto=tcp,retry=30 0 0
-
To mount all shares, type
mount -a
-
-
On the controller, download the dataingestor bootstrap script:
mkdir -p /nfs/node0/bootstrap cd /nfs/node0/bootstrap curl --retry 5 --retry-delay 5 -o /nfs/node0/bootstrap/bootstrap.dataingestor.sh https://raw.githubusercontent.com/Azure/Avere/main/src/clientapps/dataingestor/bootstrap.dataingestor.sh
-
From your controller, verify your dataingestor setup by running the following verify script. If the script shows success, you are ready to deploy. Otherwise you will need to fix each error listed.
curl -o- https://raw.githubusercontent.com/Azure/Avere/main/src/clientapps/dataingestor/dataingestorVerify.sh | bash
-
Deploy the clients by clicking the "Deploy to Azure" button below, but set the following settings:
Click the following links to learn more about the data ingestor tools:
-
msrsync - available from GitHub at https://github.com/jbd/msrsync
-
parallelcp - mentioned in the ingestion guide.
To learn more about parallel ingestion, please refer to the ingestion guide.