Skip to content

Releases: GoogleCloudPlatform/dataproc-templates

v1.0.2-beta

24 Oct 22:03
60d7f3d
Compare
Choose a tag to compare

1.0.2-beta (2024-10-24)

This release includes all the changes from pre-releases.
Therefore includes all the changes in v0.6.0-beta, v1.0.0-beta, v1.0.1-beta

Bug Fixes

v1.0.1-beta

24 Oct 19:57
dd639ff
Compare
Choose a tag to compare
v1.0.1-beta Pre-release
Pre-release

1.0.1-beta (2024-10-24)

Bug Fixes

  • update build versions for Java and Python (#1005) (33bff2d)

v1.0.0-beta

24 Oct 19:04
760dc5c
Compare
Choose a tag to compare
v1.0.0-beta Pre-release
Pre-release

1.0.0-beta (2024-10-24)

⚠ BREAKING CHANGES

  • Use spark bigtable connector instead of hbase connector (#1004)
  • Use spark bigtable connector instead of hbase (#989)
  • GCS to BigTable multiple column family ingestion (#958)

Features

  • 965 upgrade dataproc serverless version to 1.2 python (#995) (817ec52)
  • GCS to BigTable multiple column family ingestion (#958) (d92c02e)
  • Initial commit for Spark runtime update to 1.2 (#963) (c53dc88)
  • Mongo To BigQuery java template (#954) (671aa21)
  • Spanner JDBC driver can use postgresql dialect (#957) (4b022fb)
  • Use spark bigtable connector instead of hbase (#989) (98f0caa)
  • Use spark bigtable connector instead of hbase connector (#1004) (3f13e3c)

Bug Fixes

v0.6.0-beta

05 Sep 20:25
5776b8d
Compare
Choose a tag to compare
v0.6.0-beta Pre-release
Pre-release

0.6.0-beta (2024-09-05)

Features

  • Add new dataproc templates for Elasticsearch (#927) (7b1f23f)
  • MySqlToSapanner_notebook integration with colab enterprise (#940) (c4338cc)
  • Provide optional parameter service account name while submitting serverless jobs (#936) (7593aae)
  • Run directly from Colab, github and Vertex AI (#937) (3aff47d)
  • Support Partitioning for BigQueryToGSC Template (#855) (0d7152e)

Bug Fixes

  • Update Jenkinsfile added secret manager to cluster integ tests (#925) (5a74e67)

Documentation

  • adding blog for cassandra to bigquery (#935) (3ce420b)
  • adding blog for gcs to mongodb (#923) (6c7ad23)
  • adding blog link to main readme (3a522a6)
  • adding blog link to main readme (#941) (3a522a6)
  • provide steps to setup packages for notebook to run in colab (#945) (2be3e41)

v0.5.0-beta

04 Mar 17:29
45160f3
Compare
Choose a tag to compare

0.5.0-beta (2024-03-04)

Features

  • Add arguments to run MongToBQ template (f0bfd4c)
  • Add integration test for MongoToBigQuery Python template (906e26f)
  • Add preliminary documentation for MongoToBigQuery template (99146fa)
  • added support for running python templates on an existing dataproc cluster (bed0d62)
  • Built java KafkaToBQDstream template for Dataproc (220cd0e)
  • Built Java KafkaToGCSDstream template for Dataproc (d6c6232)
  • Connection to MongoDB established via the template (8ba9e18)
  • Initialise MongoToBQ Python template test file (1345dfc)
  • MongoToBigQuery template initiated with config to run from main.py (4178ec9)
  • MongoToBQ template writes successfully to destination table (21f44ba)

Bug Fixes

  • Add Usage details of MongoToBQ template to README (3cb21f1)
  • integration test for MongoToBigQuery python (955db75)
  • removed dupliacted code for read and write operations (b3e8f98)
  • removed dupliacted code for read and write operations (9d5c5f4)

Documentation

  • Add recommended jar versions to run MongoToGCS template (#871) (3d1b729)
  • added instructions for CLUSTER mode in python README (ecb2955)
  • added jenkins status for python cluster integration tests (73967d1)
  • added links in README for KafkaToBQDstream template (bfc076c)
  • added links in README for KafkaToGCSDstream template (8def950)
  • adding blog for cassandra to gcs (5ab3342)
  • adding blog for cassandra to gcs (47a478b)
  • adding blogpost for MongodbToGCS (2960f71)
  • adding updates in a few python templates (1597691)
  • clarify serverless and cluster modes, improve template execution instructions (7a8e529)
  • enhanced instructions for better clarity on serverless and cluster modes (5ae1999)
  • hyperlink blog guide for RedshiftToGCS java template (4dcf0a5)
  • updating GCS to Cloud Storage in Java templates (426b772)
  • updating gcs to cloud storage in remaining python templates (ec9925b)

v0.4.0-beta

07 Aug 15:46
3444c52
Compare
Choose a tag to compare

0.4.0-beta (2023-08-07)

Features

  • 756 postgre sql to big query (#805) (121bf93)
  • add bq_dataset_region parameter (a9647ad)
  • Add ORACLE_SCHEMA to parameterize scripts (969e314)
  • added init file (3273fe7)
  • Added Paramterized script file (2951e3a)
  • HiveToBiqquery parameterize script (2eec146)
  • include logging (ca1f9be)
  • include logging (2714b7f)
  • include logging (cf0a81c)
  • service account as env variable (9c93710)
  • service account as env variable (57292f9)
  • update nb constants & script name (cb07530)
  • update nb with IS_PARAMETERIZED flag (b4dcd35)
  • update notebook constants (5c9bdc2)
  • update working dir (ea6a52b)
  • update working dir for parameterization (5840c88)
  • Updated notebook to handle parameters (08dd1c4)
  • updated readme for parametrized script (82ff6ad)
  • updated readme for parametrized script (afac2a6)
  • updated run_notebook (c84883f)

Bug Fixes

  • add log_output (09dc88a)
  • black formatting (fcab888)
  • correct pip install cmd (f9c17c5)
  • correct project id variable (5c61b94)
  • correct sqlachemy execute (14cb3d3)
  • Deal with upgraded google_cloud_pipeline_components.experimental vs v1 (4551fab)
  • Deal with upgraded google_cloud_pipeline_components.experimental vs v1 (2304179)
  • deletes gcs folder created after all tests run (66e3000)
  • except statement (b88758a)
  • Fix merge conflict (deb3a43)
  • Fix merge from main (5a75ae0)
  • Fix merge from main (6ad8fe6)
  • Fix merge from main (2dd4be5)
  • Fix merge from main (fcc8ca6)
  • Fix merge from main (90a65ec)
  • ignored some tests in general template (ae74321)
  • Implement de-duplication (b1c9f8a)
  • implement de-duplication changes (1aeef92)
  • implement de-duplication changes (02c2508)
  • implement get_common_args function (6525dc5)
  • implement get_env_vars in base script (0734fb7)
  • integration test - change custom container image (#810) (8943508)
  • make subnet optional variable (64c0e68)
  • make subnet optional variable (294eea5)
  • modified except block (8871b25)
  • move max_parallelism from common_args (1f88912)
  • Notebook MYSQLTABLE_LIST to MYSQL_TABLE_LIST (5abe159)
  • Notebook MYSQLTABLE_LIST to MYSQL_TABLE_LIST (e22fb72)
  • oracle table list parsing corrected (5f33fb4)
  • oracle table list parsing corrected (7f1e913)
  • raise exception on job failure (e5b1a99)
  • remove common args from script (fc125f2)
  • remove duplicate get_env_var (73bad48)
  • remove log_level from nb_parameters (634d926)
  • remove log_level from nb_parameters (fe49799)
  • remove project id constant (e9173bb)
  • removed duplicate arugment (e0c3262)
  • removed pubsub output project parameter (24a9d18)
  • removed pubsub output project parameter (838c23a)
  • removed unused outputProjectID (5f106c4)
  • removing version parameter while running with cluster (d9faf34)
  • renamed init file (aa60431)
  • Resolve conflicting changes (8ab7e0d)
  • small fixes in notebook ([155c5c8](https://github.com/GoogleCloudPlatfor...
Read more

v0.3.0-beta

12 May 16:47
9c2728c
Compare
Choose a tag to compare

0.3.0-beta (2023-05-12)

Features

Bug Fixes

  • Exclude GRPC jar file from Spanner notebooks (60a27c5)
  • execute validateInput from main (f403fad)
  • improved usage of validateInput for CassandraToBQ & GCSToSpanner(77f54f1)
    (ae24e62)
  • Patch to handle tags in cloudbuild.yaml (#751) (d757278)
  • updated grpc version, corrected working directory and fixed sqlalchemy breaking changes (3ee71cd)
  • updated unit test cases for CassandraToBQ (c1b2ef7)
  • updated unit test cases for GCSToSpanner (19a052b)
  • updated unit test cases for RedshiftToGCS (038e5e8)

Documentation

  • added CassandratoGCS blogpost link to README files (766b789)
  • added integration test badges (2fd5108)
  • fix JDBC To Spanner property name (fcde248)

v0.2.0-beta

11 Apr 11:21
2a540f5
Compare
Choose a tag to compare

0.2.0-beta (2023-04-11)

Features

  • Added all supported Spark CSV options to BigQuery and Cassandra README files (33539b0)
  • Added all supported Spark CSV options to GCS reader Python templates (5c38e65)
  • Added all supported Spark CSV options to GCS source README file (0001a40)
  • add integration test for JDBCToJDBC (5f16cde)
  • Add JUnit test for DataplexGCStoBQ template (6d19473)
  • Added automation bots to execute Jenkins build jobs (#712) (806119f)
  • Cater for legacy CSV options in text_to_bigquery.py (517c163)
  • GCStoMongo : fixed typos and variable values in utc (d65faa0)
  • GCStoMongo new template - updated template name in print statement (1ab03d5)

Bug Fixes

  • Added validation for additional input fields to the template (dbaa408)
  • added processing time as template property (57b4d61)
  • container name for GCSTOBIGTABLE (a6f876f)
  • Made timeout value configurable via templateProperty (7dafc64)
  • Modified subscription name (c8dc953)
  • modified timeout property name (87e5aa0)
  • remove sensitive information from logs (4eda3f1)
  • update Dataproc version to 1.1 (2627351)
  • updated the ad-hoc folder path in Jenkins integration-test-python job (fa8eec1)

Documentation

  • #361 Adding medium article link in Readme.md for OracleToBQ notebook (d8dbbf6)
  • Readme: fixed links (85adaa9)

v0.1.0-beta

17 Mar 12:42
ebacb52
Compare
Choose a tag to compare

0.1.0-beta (2023-03-17)

This is the first beta release of Dataproc Serverless Templates. Below are the list of migration templates now available on Java/Python.

Features

  • CassandraToBigQuery
  • CassandraToGCS
  • DataplexGCStoBQ
  • GCSToBigQuery
  • GCSToBigTable
  • GCSToGCS
  • GCSToJDBC
  • GCSToSpanner
  • HBaseToGCS
  • HiveToBigQuery
  • HiveToGCS
  • JDBCToBigQuery
  • JDBCToGCS
  • JDBCToSpanner
  • KafkaToBQ
  • KafkaToGCS
  • KafkaToPubSub
  • MongoToGCS
  • PubSubToBigQuery
  • PubSubToBigTable
  • PubSubToGCS
  • RedshiftToGCS
  • S3ToBigQuery
  • SnowflakeToGCS
  • SpannerToGCS

Documentation

  • DataplexGCStoBQ(blogpost link)
  • GCSToBigQuery (blogpost link)
  • GCSToBigTable (blogpost link)
  • GCSToGCS (blogpost link)
  • GCSToJDBC (blogpost link)
  • GCSToSpanner (blogpost link)
  • HBaseToGCS(blogpost link)
  • HiveToBigQuery(blogpost link)
  • HiveToGCS (blogpost link)
  • JDBCToBigQuery (blogpost link)
  • JDBCToGCS (blogpost link)
  • KafkaToBQ (blogpost link)
  • KafkaToGCS (blogpost link)
  • PubSubToBigQuery (blogpost link)
  • PubSubToBigTable (blogpost link)
  • PubSubToGCS (blogpost link)
  • S3ToBigQuery (blogpost link)
  • SnowflakeToGCS (blogpost link)
  • SpannerToGCS (blogpost link)
  • TextToBigQuery (blogpost link)
  • MongoToGCS (blogpost link)
  • RedshiftToGCS (blogpost link)
  • JDBCToJDBC (blogpost link)
  • GCSToMongo (blogpost link)
  • BigQueryToGCS (blogpost link)