This is the codebase for S3 to BQ migration tool.
Please point the cloud configuration files to respective environment variable:
$export GOOGLE_APPLICATION_CREDENTIALS=/path/to/gcp_config
$export AWS_SECRET_ACCESS_KEY=<secret_key>
$export AWS_ACCESS_KEY_ID=<key_id>
$gcloud iam service-accounts add-iam-policy-binding \
[Project-Number][email protected] \
--member serviceAccount:service-[Project-Number]@cloudcomposer-accounts.iam.gserviceaccount.com \
--role roles/composer.ServiceAgentV2Ext
$gcloud composer environments create <env-name> \
--location <loc> \
$gcloud composer environments storage dags import \
--environment <env-name> --location <loc> \
--source ./core/composer/bq_dag.py
A sample configuration file looks like:
input:
source: s3://s3-to-bq-testing/*
output:
project_id: s3tobq
dataset_name: test_dataset
dataset_location: asia-northeast1
table_name: test_final_table
table_format:
fields:
- { name: "location", type: "STRING"}
- { name: "name", type: "STRING", mode: "REQUIRED"}
- { name: "age", type: "INTEGER"}
intermediate_table_name: "test_table"
SQL: "SELECT * FROM test_dataset.test_table"
To deploy the Cloud Run container:
$cd core
$gcloud builds submit --tag gcr.io/<PROJECT_ID>/<service-name>
$gcloud run deploy <service-name> \
--image gcr.io/<PROJECT_ID>/<service-name> \
--allow-unauthenticated
$gcloud projects add-iam-policy-binding [PROJECT_ID] \
--member=serviceAccount:[PROJECT_NUMBER][email protected] \
--role=roles/eventarc.eventReceiver
$gcloud projects add-iam-policy-binding [PROJECT_ID] \
--member=serviceAccount:service-[PROJECT_NUMBER]@gcp-sa-pubsub.iam.gserviceaccount.com \
--role=roles/iam.serviceAccountTokenCreator
SERVICE_ACCOUNT="$(gsutil kms serviceaccount -p [PROJECT_ID])"
$gcloud projects add-iam-policy-binding [PROJECT_ID] \
--member="serviceAccount:${SERVICE_ACCOUNT}" \
--role='roles/pubsub.publisher'
$gcloud eventarc triggers create <trigger-name> \
--destination-run-service=<service-name> \
--destination-run-region=<loc> \
--event-filters="type=google.cloud.storage.object.v1.finalized" \
--event-filters="bucket=<intermediate-gcs-bucket-name>" \
--service-account=[PROJECT_NUMBER][email protected] \
--location <loc>
- We only support Cloud Composer 2.
- This took currently only supports files with 1024.00K characters, including comments and white space characters.