A Serverless (compute) approach to scraping Instagram feeds. This application runs on Cloud Run and pulls images and captions from selected Instagram users. It stores these in a Cloud SQL database.
Note that storing data in Cloud SQL isn't a truly serverless data solution.
🥧 This looks like a lot of setup, but it should only take about 5 minutes. It's just a bunch of copy-and-paste scripts to run in cloud shell. 🍰
You can run these steps from any terminal that has gcloud and docker, but the easiest way is to run all the following commands in cloud shell. You'll need a GitHub.com personal account. Recommended: create a new GCP project before proceeding.
Replace <your_github_username>
with your account (e.g. davidstanke
):
export GITHUB_USER=<your_github_username>
Don't clone this repo directly; instead, click "Use this template" to make a copy (or click here). Call it instapuller
. Then clone your copy of the repo, and add a "staging" branch:
git clone https://github.com/${GITHUB_USER}/instapuller && cd instapuller
git checkout -b staging
git push -u origin staging
Alternative setup: you can use Artifact Registry instead of Container Registry:
- enable artifact registry API and create a registry
- configure docker (see "setup instructions") on Artifact Registry UI
- replace all instances of
gcr.io/$PROJECT/instapuller
with your*.pkg.dev
registry
# set some convenience variables
export PROJECT=$(gcloud config list --format 'value(core.project)')
export PROJECT_NUMBER=$(gcloud projects list --filter="$PROJECT" --format="value(PROJECT_NUMBER)")
export GCB_SERVICE_ACCT="${PROJECT_NUMBER}@cloudbuild.gserviceaccount.com"
export RUN_SERVICE_ACCT="${PROJECT_NUMBER}[email protected]"
# Enable APIs and grant IAM permissions
gcloud services enable cloudbuild.googleapis.com run.googleapis.com sqladmin.googleapis.com sql-component.googleapis.com
gcloud projects add-iam-policy-binding $PROJECT --member=serviceAccount:$GCB_SERVICE_ACCT --role=roles/run.admin
gcloud iam service-accounts add-iam-policy-binding $RUN_SERVICE_ACCT --member=serviceAccount:$GCB_SERVICE_ACCT --role=roles/iam.serviceAccountUser
# Create CloudSQL databases
export PASSWORD=$(openssl rand -base64 15)
gcloud sql instances create instapuller --zone=us-central1-c --root-password=${PASSWORD}
gcloud sql databases create instapuller-prod --instance=instapuller --charset=utf8mb4
gcloud sql databases create instapuller-staging --instance=instapuller --charset=utf8mb4
# Create initial application container
docker build -t gcr.io/$PROJECT/instapuller .
docker push gcr.io/$PROJECT/instapuller
# Create Cloud Run services
gcloud run deploy instapuller-prod --image=gcr.io/$PROJECT/instapuller --region=us-central1 --platform=managed --allow-unauthenticated --set-env-vars=DB_USER=root,DB_PASS=${PASSWORD},DB_NAME=instapuller-prod,CLOUD_SQL_CONNECTION_NAME=$PROJECT:us-central1:instapuller --set-cloudsql-instances=$PROJECT:us-central1:instapuller
gcloud run deploy instapuller-staging --image=gcr.io/$PROJECT/instapuller --region=us-central1 --platform=managed --allow-unauthenticated --set-env-vars=DB_USER=root,DB_PASS=${PASSWORD},DB_NAME=instapuller-staging,CLOUD_SQL_CONNECTION_NAME=$PROJECT:us-central1:instapuller --set-cloudsql-instances=$PROJECT:us-central1:instapuller
echo -e "======\nHere are the URLs of your Cloud Run services:\n-----\n$(gcloud run services list --platform=managed --format='value(URL)')\n====="
Open both URLs in a browser to verify that they work!
NOTE: the first load may be slow b/c the application will create the database on first request.
gcloud builds submit --substitutions=_DEPLOY_ENVIRONMENT=staging,SHORT_SHA=$(date +%Y%m%d_%H%M%S)
gcloud builds submit --substitutions=_DEPLOY_ENVIRONMENT=prod,SHORT_SHA=$(date +%Y%m%d_%H%M%S)
Then revisit the application URLs. They should look unchanged.
For this, you'll use the Cloud Build Triggers page in the GCP console.
See the docs for Connecting to source repositories
- Use the "Cloud Build GitHub App" option and grant access if asked to do so.
- Select your copy of the
instapuller
repo - On the "create a push trigger" step, click Skip for now (we'll add the trigger via gcloud)
# On commit to `main`, deploy to prod:
gcloud beta builds triggers create github \
--repo-name=instapuller \
--repo-owner=${GITHUB_USER} \
--branch-pattern="^main$" \
--build-config="cloudbuild.yaml" \
--description="On commit to main, deploy to prod service" \
--substitutions="_DEPLOY_ENVIRONMENT=prod"
# On commit to `staging`, deploy to staging:
gcloud beta builds triggers create github \
--repo-name=instapuller \
--repo-owner=${GITHUB_USER} \
--branch-pattern="^staging$" \
--build-config="cloudbuild.yaml" \
--description="On commit to staging, deploy to staging service" \
--substitutions="_DEPLOY_ENVIRONMENT=staging"
Test it out! Make a commit to branch staging
and push to GitHub; you should see your changes reflected on your staging service. Merge that branch to main
and you should see the changes on prod.
Bonus: configure preview environments for each pull request
[TODO: document the GCF functions]