Bigquery remote function allows user to deploy their custom services or libraries written in any language other than SQL and javascript, which are not present as bigquery user defined functions. BQ remote functions provide direct integration with cloud function or cloud run
This repository has string format Java code, which can be deployed on cloud run or cloud function, and can be invoked using SQL queries from BigQuery.
Bigquery sends HTTP request POST request to cloud run as input json format and expects endpoint to return code in output json format and in case of failure, sends back error messages.
-
Set Environment variables:
PROJECT_NAME=$(gcloud config get-value project) INSTANCE_NAME=string-format REGION=us-central1 JAVA_VERSION=java11 SERVICE_ENTRY_POINT=com.google.cloud.pso.bqremotefunc.StringFormat
-
clone this git repo on GCP project and got to directory
cd examples/bq-remote-function/string_formatter
-
Deploy the code as cloud run using below commands:
gcloud functions deploy $INSTANCE_NAME \ --project=$PROJECT_NAME \ --gen2 \ --region=$REGION \ --runtime=$JAVA_VERSION \ --entry-point=$SERVICE_ENTRY_POINT \ --trigger-http
-
Copy the https url from cloud run UI
-
Create a remote function in BigQuery.
- Create a connection of type CLOUD_RESOURCE
replace connection name in below command and run on cloud shell.
bq mk --connection \ --display_name=<connection-name> \ --connection_type=CLOUD_RESOURCE \ --project_id=$PROJECT_ID \ --location=$REGION <connection-name>
- Create a remote function in BigQuery Editor with below query (replace the variables based on your environment)
CREATE or Replace FUNCTION `<project-id>.<dataset>.<function-name>` (text STRING) RETURNS STRING REMOTE WITH CONNECTION `<BQ connection name> OPTIONS (endpoint = '<HTTP end point of the cloud run service>');
-
Use the remote function in a query just like any other user-defined functions.
SELECT
`<project-id>.<dataset>.<function-name>`(col_name)
from
(select
*
from
unnest(['text1','text2','text3']) as col_name );
- Expected Output
text1_test
text2_test
text3_test
Go to GCP Cloud run, click the instance created select LOGS on action bar, when the instance is invoked from BigQuery, you will see the logs printed, parallely in METRICS section, you can check the request count, container utilisation and billable time.
The cost can be calculated using pricing calculator for both Cloud Run and BigQuery utilization by entering CPU, memory and concurrent requests count.
To destroy delete the cloud run instance and bq remote function.
BQ remote function fails to support payload >10mb , and accepts certain data types.
For more Cloud Run samples beyond Java, see the main list in the Cloud Run Samples repository.