dataproc-templates/notebooks/oracle2spanner at main · GoogleCloudPlatform/dataproc-templates

History

Name		Name	Last commit message	Last commit date
parent directory ..
OracleToSpanner_notebook.ipynb		OracleToSpanner_notebook.ipynb
README.md		README.md

README.md

Jupyter Notebook Solution for migrating Oracle database to Cloud Spanner using Dataproc Templates

Notebook solution utilizing dataproc templates for migrating databases from Oracle to Cloud Spanner. It contains step by step process for migrating Oracle database to Cloud Spanner.

Refer Setup Vertex AI - PySpark to setup new Jupyter notebook in vertexAI. Once the setup is done navigate to /notebooks/oracle2spanner folder and open OracleToSpanner_notebook.

Overview

This notebook is built on top of:

Vertex AI Jupyter Notebook
Google Cloud's Dataproc Serverless
Dataproc Templates which are maintained in this github project.

Key Benefits

Automatically discovers all the Oracle tables.
Can automatically generates table schema in Cloud Spanner, corresponding to each table.
Divides the migration into multiple batches and automatically computes metadata.
Parallely migrates mutiple Oracle tables to Cloud Spanner.
Simple, easy to use and customizable.

Requirements

Below configurations are required before proceeding further.

Common Parameters

PROJECT : GCP project-id
REGION : GCP region
GCS_STAGING_LOCATION : Cloud Storage staging location to be used for this notebook to store artifacts
SUBNET : VPC subnet
JARS : List of jars. For this notebook Oracle connector jar is required in addition with the Dataproc template jars
MAX_PARALLELISM : Parameter for number of jobs to run in parallel default value is 5

Oracle to Cloud Spanner Parameters

ORACLE_HOST : Oracle instance ip address
ORACLE_PORT : Oracle instance port
ORACLE_USERNAME : Oracle username
ORACLE_PASSWORD : Oracle password
ORACLE_DATABASE : Name of database/service for Oracle connection
ORACLE_TABLE_LIST : List of tables you want to migrate eg: ['table1','table2'] else provide an empty list for migration whole database eg : []
SPANNER_OUTPUT_MODE: <Append | Overwrite>
SPANNER_INSTANCE : Cloud Spanner instance name
SPANNER_DATABASE : Cloud Spanner database name
SPANNER_TABLE_PRIMARY_KEYS : Provide dictionary of format {"table_name":"primary_key"} for tables which do not have primary key in Oracle

Required JAR files

This notebook requires the Oracle connector jar. Installation information is present in the notebook

Limitations:

Does not work with Cloud Spanner's Postgresql Interface

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

oracle2spanner

oracle2spanner

README.md

Jupyter Notebook Solution for migrating Oracle database to Cloud Spanner using Dataproc Templates

Overview

Key Benefits

Requirements

Common Parameters

Oracle to Cloud Spanner Parameters

Required JAR files

Limitations:

Files

oracle2spanner

Directory actions

More options

Directory actions

More options

Latest commit

History

oracle2spanner

Folders and files

parent directory

README.md

Jupyter Notebook Solution for migrating Oracle database to Cloud Spanner using Dataproc Templates

Overview

Key Benefits

Requirements

Common Parameters

Oracle to Cloud Spanner Parameters

Required JAR files

Limitations: