Jupyter Notebook Solution for migrating MSSQL (SQL Server) to POSTGRES Database using Dataproc Templates
Notebook solution utilizing dataproc templates for migrating databases from MSSQL to POSTGRES. Notebook contains step by step process for a downtime based migration.
Refer Setup Vertex AI - PySpark to setup new Jupyter notebook in vertexAI.
Once the setup is done navigate to /notebooks/mssql2postgresql
folder and open
mssql-to-postgres-notebook.
This notebook is built on top of:
- Vertex AI Jupyter Notebook
- Google Cloud's Dataproc Serverless
- Dataproc Templates which are maintained in this github project.
- Automatically discovers all the SQL Server tables.
- Can automatically generates table schema in Postgresql, corresponding to each table.
- Divides the migration into multiple batches and automatically computes metadata.
- Parallely migrates mutiple SQL Server tables to Postgresql.
- Simple, easy to use and customizable.
Below configurations are required before proceeding further.
PROJECT
: GCP project-idREGION
: GCP regionGCS_STAGING_LOCATION
: Cloud Storage staging location to be used for this notebook to store artifactsSUBNET
: VPC subnetJARS
: list of jars. For this notebook mssql and postgres connector jars are required in addition with the dataproc template jarsMAX_PARALLELISM
: Parameter for number of jobs to run in parallel default value is 2
MSSQL_HOST
: MSSQL instance ip addressMSSQL_PORT
: MSSQL instance portMSSQL_USERNAME
: MSSQL usernameMSSQL_PASSWORD
: MSSQL passwordMSSQL_DATABASE
: name of database that you want to migrateMSSQLTABLE_LIST
: list of tables you want to migrate eg: 'table1','table2' else keep empty for migration whole databaseNUMBER_OF_PARTITIONS
: The maximum number of partitions that can be used for parallelism in table reading and writing. Same value will be used for both input and output jdbc connection. Default set to 10
POSTGRES_HOST
: MSSQL instance ip addressPOSTGRES_PORT
: MSSQL instance portPOSTGRES_USERNAME
: MSSQL usernamePOSTGRES_PASSWORD
: MSSQL passwordPOSTGRES_DATABASE
: name of database that you want to migrate toOUTPUT_MODE
: Output write mode (one of: append,overwrite,ignore,errorifexists)(Defaults to overwrite)BATCH_SIZE
: JDBC output batch size. Default set to 1000
This notebook requires the MSSQL and POSTGRES connector jars. Installation information is present in the notebook