This notebook solution utilizes dataproc templates for migrating databases from Microsoft SQL Server to BIGQUERY.
The notebook contains step by step process for a downtime based migration.
Refer Setup Vertex AI - PySpark to setup new Jupyter notebook in vertexAI.
Once the setup is done navigate to /notebooks/mssql2bq
folder and open
mssql-to-bigquery-notebook.
This notebook is built on top of:
- Vertex AI Jupyter Notebook
- Google Cloud's Dataproc Serverless
- Dataproc Templates which are maintained in this github project.
- Automatically generates list of tables from metadata. Alternatively, user can supply list of tables.
- Identifies current primary key column name, and partitioned read properties.
- Automatically uses partition reads if exceeds threshold.
- Divides migration into batches and parallely migrates multiple tables.
- Notebook allows you to choose modes i.e. appending data or overwrite.
- Bigquery load automatically creates table if the table does not exists.
Below configurations are required before proceeding further.
PROJECT
: GCP project-idREGION
: GCP regionGCS_STAGING_LOCATION
: Cloud Storage staging location to be used for this notebook to store artifactsSUBNET
: VPC subnetJARS
: list of jars. For this notebook mssql and Spark Bigquery connector jars are required in addition to the dataproc template jarsMAX_PARALLELISM
: Parameter for number of jobs to run in parallel default value is 5
SQL_SERVER_HOST
: SQL Server instance ip addressSQL_SERVER_PORT
: SQL Server instance portSQL_SERVER_USERNAME
: SQL Server usernameSQL_SERVER_PASSWORD
: SQL Server passwordSQL_SERVER_DATABASE
: Name of database that you want to migrateSQL_SERVER_TABLE_LIST
: List of tables you want to migrate eg ['schema.table1','schema.table2'] else provide an empty list for migration of specific schemas or the whole database. Example: [].SQL_SERVER_SCHEMA_LIST
: List of schemas. Use this if you'ld like to migrate all tables associated with specific schemas eg. ['schema1','schema2']. Otherwise, leave this parameter empty. Example: []. This comes in handy when don't want to provide names of all the tables separately but would rather prefer migrating all tables from a schema.
BIGQUERY_DATASET
: BigQuery Target DatasetBIGQUERY_MODE
: Mode of operation at target append/overwrite
This notebook requires the MSSQL and POSTGRES connector jars. Installation information is present in the notebook