Name	Name	Last commit message	Last commit date
parent directory ..
helm	helm
terraform	terraform
README.md	README.md
airflow.cfg	airflow.cfg
create_buckets.sh	create_buckets.sh
create_dataflow_jobs.sh	create_dataflow_jobs.sh
create_final_tables.sh	create_final_tables.sh
create_temp_tables.sh	create_temp_tables.sh
extractor_transformer_dockerfile	extractor_transformer_dockerfile

Aptos-ETL Infrastructure-as-code

Overview

This repository contains the necessary infrastructure as code to set up Aptos Fullnodes for an ETL (Extract, Transform, Load) project. It provisions and configures a secure environment on Google Cloud Platform (GCP) for efficient data handling and processing.

Structure

terraform: Contains the Terraform configuration for provisioning the GCP environment, setup Aptos Fullnodes, and ETL applications (aptos-indexing-coordinator and aptos-extracting-transformer).
- tfvars/prod.tfvars: Contains values for the Terraform variables for the production environment.
- vars.tf: Defines Terraform variables and local variables.
- provider.tf: Specifies the Terraform providers, backend bucket, and versions.
- bq.tf: Manages BigQuery resources.
- composer.tf: Manages Cloud Composer resources.
- dataflow.tf: Manages Dataflow resources.
- gcs.tf: Manages Google Cloud Storage bucket resources.
- networks.tf: Manages GCP network resources.
- pubsub.tf: Manages Cloud Pub/Sub resources.
- sa.tf: Manages Google Cloud IAM (Service Account) resources.
- helm.tf: Manages Helm charts resources.
- kubernetes.tf: Manages Kubernetes resrouces
helm: Contains Helm charts for Aptos Fullnodes, and ETL applications
- aptos-fullnodes: Helm chart to deploy aptos fullnodes
- aptos-indexing-coordinator: Helm chart to deploy Aptos indexing coordinator
- aptos-extractor-transformer: Helm chart to deply Aptos extracting transformer
- keda: Heml chart to deploy KEDA to autoscale the ETL applications

Resources and Scripts

Terraform-Provisioned Resources

The Terraform code in the terraform/ directory provisions the following resources:

Virtual Private Cloud (VPC) network and subnets
Firewall rules to ensure secure access
Google Kubernetes Engine (GKE) cluster with:
- Aptos Fullnodes
- Aptos Indexing Coordinator
- Aptos Extractor Transformer

KEDA Autoscaler

Cloud NAT for secure outbound internet access
IAM service accounts and roles for secure operations
BigQuery datasets and tables with predefined schemas
Dataflow jobs for data processing
Cloud Composer pipelines for workflow orchestration
Pub/Sub messaging services
GCS buckets for object storage

Deployment Environment

This project can be deployed from:

A personal laptop
A virtual machine (VM)
Google Cloud Shell

As long as the prerequisites are met and the Google Cloud SDK is properly configured, Terraform commands can be executed from any of these environments.

Prerequisites

Terraform >= 1.5.7
A GCP bucket for storing Terraform states
A Google Cloud Platform account
Authenticate Terraform to Google Cloud Platform:
```
gcloud auth application-default login
```
Helm >= 3.16.1
GCP Bucket: A Google Cloud Storage bucket is required to store the Terraform state files

Setup Guide

1. Setting up Google Cloud SDK

Install the Google Cloud SDK by following the official documentation.
Authenticate with Google Cloud:
```
gcloud auth login
```

2. Creating a New Google Cloud Project

Create a new project:
```
gcloud projects create [PROJECT_ID] --name="[PROJECT_NAME]"
```
Replace [PROJECT_ID] with a unique ID for your project, and [PROJECT_NAME] with a descriptive name.
Set the newly created project as the active project:
```
gcloud config set project [PROJECT_ID]
```
List your existing billing accounts:
```
gcloud billing accounts list
```
This command will display a list of billing accounts you have access to, along with their IDs.
If you don't have a billing account or want to create a new one, follow the instructions in the official Google Cloud documentation to create a billing account.

Note: Creating a new billing account typically requires going through the Google Cloud Console, as it involves setting up payment methods and cannot be done entirely through the CLI.
Once you have a billing account ID, link it to your project:
```
gcloud billing projects link [PROJECT_ID] --billing-account=[BILLING_ACCOUNT_ID]
```
Replace [BILLING_ACCOUNT_ID] with the ID of the billing account you want to use.

3. Enabling Required APIs

Enable the necessary APIs in the GCP console:

gcloud services enable \
  dataflow.googleapis.com \
  container.googleapis.com \
  pubsub.googleapis.com \
  composer.googleapis.com \
  bigquery.googleapis.com \
  compute.googleapis.com \
  storage.googleapis.com \
  iam.googleapis.com

4. Creating a GCS Bucket for Terraform State

To create a new GCS bucket, you can use the following command:

gcloud storage buckets create gs://BUCKET_NAME --project=PROJECT_ID --location=BUCKET_LOCATION

Replace the following:

BUCKET_NAME: A globally unique name for your bucket
PROJECT_ID: Your Google Cloud project ID
BUCKET_LOCATION: The location for your bucket (e.g., us-central1)

You can also specify additional flags:

--uniform-bucket-level-access: Enables uniform bucket-level access
--public-access-prevention: Sets public access prevention
--default-storage-class: Sets the default storage class (e.g., STANDARD, NEARLINE, COLDLINE, ARCHIVE)

For example:

gcloud storage buckets create gs://my-terraform-state --project=my-project-id --location=us-central1 --uniform-bucket-level-access --public-access-prevention=enforced --default-storage-class=STANDARD

For more detailed information on creating buckets, including available regions and storage classes, refer to the official Google Cloud documentation on creating storage buckets.

5. Configuring Terraform

Now that you have set up your Google Cloud project and created a bucket for the Terraform state, you need to configure the Terraform files.

Navigate to the terraform directory in the Terraform codebase.
Open the provider.tf file and update the following:
```
terraform {
  backend "gcs" {
    bucket = "[BUCKET_NAME]"  # Replace with your bucket name
    prefix = "terraform/state"
  }
}
```
Open the tfvars/prod.tfvars file and update the following:
```
 project_id = <GCP project name>
```
Replace <BUCKET NAME> with the name of the GCS bucket you created, and <PROJECT ID> with your Google Cloud project ID.
Review and adjust other variables in the provider.tf file as needed.

6. Specify Inserter and Extractor app versions

Define the image repository and tag for the Coordinator app using the coordinator_image_tag and coordinator_repo variables located in helm.tf. Similarly, define the image repository and tag for the Transformer app using the transformer_image_tag and transformer_repo variables in helm.tf.

You can find the latest application releases at the following repository: https://github.com/bcwt/aptos-etl # This is to be changed

7. Supple values for all the variables

Populate the necessary variables in the tfvars/prod.tfvars file with appropriate values for your environment.

The enabled_network variable controls which network to be created.

8. Initializing and Applying Terraform

Initialize Terraform:
```
terraform init
```

Review the planned changes:

terraform plan -var-file=tfvars/prod.tfvars

Run heml repo update
```
helm repo update
```
Note: This step is required as sometimes the helm provider would fail to retrieve the helm repository from the Internet.

Apply the Terraform configuration:

terraform apply -var-file=tfvars/prod.tfvars

Note: Due to the Dataflow API timed out issuw with Terraform, the Apply might fail with error message similar as:

│ Error: Error waiting for job with job ID "2024-10-20_02_04_40-5400230924078250603" to be running: the job with ID "2024-10-20_02_04_40-5400230924078250603" has terminated with state "JOB_STATE_FAILED" instead of expected state "JOB_STATE_RUNNING"

then rerun the terraform apply -var-file=tfvars/prod.tfvars command again.

Project Teardown

Remove the kubernetes_namespace resource from the Terraform state file:
```
terraform state list | grep kubernetes_namespace | xargs -n 1 terraform state rm
```
Note: This command assumes that you have the bash compatible shell.

Remove all resources:

terraform destroy -var-file=tfvars/prod.tfvars

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iac

iac

README.md

Aptos-ETL Infrastructure-as-code

Overview

Structure

Resources and Scripts

Terraform-Provisioned Resources

Deployment Environment

Prerequisites

Setup Guide

1. Setting up Google Cloud SDK

2. Creating a New Google Cloud Project

3. Enabling Required APIs

4. Creating a GCS Bucket for Terraform State

5. Configuring Terraform

6. Specify Inserter and Extractor app versions

7. Supple values for all the variables

8. Initializing and Applying Terraform

Project Teardown

Additional Resources

Files

iac

Directory actions

More options

Directory actions

More options

Latest commit

History

iac

Folders and files

parent directory

README.md

Aptos-ETL Infrastructure-as-code

Overview

Structure

Resources and Scripts

Terraform-Provisioned Resources

Deployment Environment

Prerequisites

Setup Guide

1. Setting up Google Cloud SDK

2. Creating a New Google Cloud Project

3. Enabling Required APIs

4. Creating a GCS Bucket for Terraform State

5. Configuring Terraform

6. Specify Inserter and Extractor app versions

7. Supple values for all the variables

8. Initializing and Applying Terraform

Project Teardown

Additional Resources