Releases: GoogleCloudPlatform/dataproc-templates
Releases · GoogleCloudPlatform/dataproc-templates
v1.0.2-beta
1.0.2-beta (2024-10-24)
This release includes all the changes from pre-releases.
Therefore includes all the changes in v0.6.0-beta, v1.0.0-beta, v1.0.1-beta
Bug Fixes
v1.0.1-beta
v1.0.0-beta
1.0.0-beta (2024-10-24)
⚠ BREAKING CHANGES
- Use spark bigtable connector instead of hbase connector (#1004)
- Use spark bigtable connector instead of hbase (#989)
- GCS to BigTable multiple column family ingestion (#958)
Features
- 965 upgrade dataproc serverless version to 1.2 python (#995) (817ec52)
- GCS to BigTable multiple column family ingestion (#958) (d92c02e)
- Initial commit for Spark runtime update to 1.2 (#963) (c53dc88)
- Mongo To BigQuery java template (#954) (671aa21)
- Spanner JDBC driver can use postgresql dialect (#957) (4b022fb)
- Use spark bigtable connector instead of hbase (#989) (98f0caa)
- Use spark bigtable connector instead of hbase connector (#1004) (3f13e3c)
Bug Fixes
v0.6.0-beta
0.6.0-beta (2024-09-05)
Features
- Add new dataproc templates for Elasticsearch (#927) (7b1f23f)
- MySqlToSapanner_notebook integration with colab enterprise (#940) (c4338cc)
- Provide optional parameter service account name while submitting serverless jobs (#936) (7593aae)
- Run directly from Colab, github and Vertex AI (#937) (3aff47d)
- Support Partitioning for BigQueryToGSC Template (#855) (0d7152e)
Bug Fixes
Documentation
v0.5.0-beta
0.5.0-beta (2024-03-04)
Features
- Add arguments to run MongToBQ template (f0bfd4c)
- Add integration test for MongoToBigQuery Python template (906e26f)
- Add preliminary documentation for MongoToBigQuery template (99146fa)
- added support for running python templates on an existing dataproc cluster (bed0d62)
- Built java KafkaToBQDstream template for Dataproc (220cd0e)
- Built Java KafkaToGCSDstream template for Dataproc (d6c6232)
- Connection to MongoDB established via the template (8ba9e18)
- Initialise MongoToBQ Python template test file (1345dfc)
- MongoToBigQuery template initiated with config to run from main.py (4178ec9)
- MongoToBQ template writes successfully to destination table (21f44ba)
Bug Fixes
- Add Usage details of MongoToBQ template to README (3cb21f1)
- integration test for MongoToBigQuery python (955db75)
- removed dupliacted code for read and write operations (b3e8f98)
- removed dupliacted code for read and write operations (9d5c5f4)
Documentation
- Add recommended jar versions to run MongoToGCS template (#871) (3d1b729)
- added instructions for CLUSTER mode in python README (ecb2955)
- added jenkins status for python cluster integration tests (73967d1)
- added links in README for KafkaToBQDstream template (bfc076c)
- added links in README for KafkaToGCSDstream template (8def950)
- adding blog for cassandra to gcs (5ab3342)
- adding blog for cassandra to gcs (47a478b)
- adding blogpost for MongodbToGCS (2960f71)
- adding updates in a few python templates (1597691)
- clarify serverless and cluster modes, improve template execution instructions (7a8e529)
- enhanced instructions for better clarity on serverless and cluster modes (5ae1999)
- hyperlink blog guide for RedshiftToGCS java template (4dcf0a5)
- updating GCS to Cloud Storage in Java templates (426b772)
- updating gcs to cloud storage in remaining python templates (ec9925b)
v0.4.0-beta
0.4.0-beta (2023-08-07)
Features
- 756 postgre sql to big query (#805) (121bf93)
- add bq_dataset_region parameter (a9647ad)
- Add ORACLE_SCHEMA to parameterize scripts (969e314)
- added init file (3273fe7)
- Added Paramterized script file (2951e3a)
- HiveToBiqquery parameterize script (2eec146)
- include logging (ca1f9be)
- include logging (2714b7f)
- include logging (cf0a81c)
- service account as env variable (9c93710)
- service account as env variable (57292f9)
- update nb constants & script name (cb07530)
- update nb with IS_PARAMETERIZED flag (b4dcd35)
- update notebook constants (5c9bdc2)
- update working dir (ea6a52b)
- update working dir for parameterization (5840c88)
- Updated notebook to handle parameters (08dd1c4)
- updated readme for parametrized script (82ff6ad)
- updated readme for parametrized script (afac2a6)
- updated run_notebook (c84883f)
Bug Fixes
- add log_output (09dc88a)
- black formatting (fcab888)
- correct pip install cmd (f9c17c5)
- correct project id variable (5c61b94)
- correct sqlachemy execute (14cb3d3)
- Deal with upgraded google_cloud_pipeline_components.experimental vs v1 (4551fab)
- Deal with upgraded google_cloud_pipeline_components.experimental vs v1 (2304179)
- deletes gcs folder created after all tests run (66e3000)
- except statement (b88758a)
- Fix merge conflict (deb3a43)
- Fix merge from main (5a75ae0)
- Fix merge from main (6ad8fe6)
- Fix merge from main (2dd4be5)
- Fix merge from main (fcc8ca6)
- Fix merge from main (90a65ec)
- ignored some tests in general template (ae74321)
- Implement de-duplication (b1c9f8a)
- implement de-duplication changes (1aeef92)
- implement de-duplication changes (02c2508)
- implement get_common_args function (6525dc5)
- implement get_env_vars in base script (0734fb7)
- integration test - change custom container image (#810) (8943508)
- make subnet optional variable (64c0e68)
- make subnet optional variable (294eea5)
- modified except block (8871b25)
- move max_parallelism from common_args (1f88912)
- Notebook MYSQLTABLE_LIST to MYSQL_TABLE_LIST (5abe159)
- Notebook MYSQLTABLE_LIST to MYSQL_TABLE_LIST (e22fb72)
- oracle table list parsing corrected (5f33fb4)
- oracle table list parsing corrected (7f1e913)
- raise exception on job failure (e5b1a99)
- remove common args from script (fc125f2)
- remove duplicate get_env_var (73bad48)
- remove log_level from nb_parameters (634d926)
- remove log_level from nb_parameters (fe49799)
- remove project id constant (e9173bb)
- removed duplicate arugment (e0c3262)
- removed pubsub output project parameter (24a9d18)
- removed pubsub output project parameter (838c23a)
- removed unused outputProjectID (5f106c4)
- removing version parameter while running with cluster (d9faf34)
- renamed init file (aa60431)
- Resolve conflicting changes (8ab7e0d)
- small fixes in notebook ([155c5c8](https://github.com/GoogleCloudPlatfor...
v0.3.0-beta
0.3.0-beta (2023-05-12)
Features
- Added pubsublitetobigtable java template - (656b485) (f187870)
- Notebooks util and Parametrizations (42e0019) (2d11bcc)
- New notebook Oracle to Postgres (#561) (15b7cc6)
Bug Fixes
- Exclude GRPC jar file from Spanner notebooks (60a27c5)
- execute validateInput from main (f403fad)
- improved usage of validateInput for CassandraToBQ & GCSToSpanner(77f54f1)
(ae24e62) - Patch to handle tags in cloudbuild.yaml (#751) (d757278)
- updated grpc version, corrected working directory and fixed sqlalchemy breaking changes (3ee71cd)
- updated unit test cases for CassandraToBQ (c1b2ef7)
- updated unit test cases for GCSToSpanner (19a052b)
- updated unit test cases for RedshiftToGCS (038e5e8)
Documentation
v0.2.0-beta
0.2.0-beta (2023-04-11)
Features
- Added all supported Spark CSV options to BigQuery and Cassandra README files (33539b0)
- Added all supported Spark CSV options to GCS reader Python templates (5c38e65)
- Added all supported Spark CSV options to GCS source README file (0001a40)
- add integration test for JDBCToJDBC (5f16cde)
- Add JUnit test for DataplexGCStoBQ template (6d19473)
- Added automation bots to execute Jenkins build jobs (#712) (806119f)
- Cater for legacy CSV options in text_to_bigquery.py (517c163)
- GCStoMongo : fixed typos and variable values in utc (d65faa0)
- GCStoMongo new template - updated template name in print statement (1ab03d5)
Bug Fixes
- Added validation for additional input fields to the template (dbaa408)
- added processing time as template property (57b4d61)
- container name for GCSTOBIGTABLE (a6f876f)
- Made timeout value configurable via templateProperty (7dafc64)
- Modified subscription name (c8dc953)
- modified timeout property name (87e5aa0)
- remove sensitive information from logs (4eda3f1)
- update Dataproc version to 1.1 (2627351)
- updated the ad-hoc folder path in Jenkins integration-test-python job (fa8eec1)
Documentation
v0.1.0-beta
0.1.0-beta (2023-03-17)
This is the first beta release of Dataproc Serverless Templates. Below are the list of migration templates now available on Java/Python.
Features
- CassandraToBigQuery
- CassandraToGCS
- DataplexGCStoBQ
- GCSToBigQuery
- GCSToBigTable
- GCSToGCS
- GCSToJDBC
- GCSToSpanner
- HBaseToGCS
- HiveToBigQuery
- HiveToGCS
- JDBCToBigQuery
- JDBCToGCS
- JDBCToSpanner
- KafkaToBQ
- KafkaToGCS
- KafkaToPubSub
- MongoToGCS
- PubSubToBigQuery
- PubSubToBigTable
- PubSubToGCS
- RedshiftToGCS
- S3ToBigQuery
- SnowflakeToGCS
- SpannerToGCS
Documentation
- DataplexGCStoBQ(blogpost link)
- GCSToBigQuery (blogpost link)
- GCSToBigTable (blogpost link)
- GCSToGCS (blogpost link)
- GCSToJDBC (blogpost link)
- GCSToSpanner (blogpost link)
- HBaseToGCS(blogpost link)
- HiveToBigQuery(blogpost link)
- HiveToGCS (blogpost link)
- JDBCToBigQuery (blogpost link)
- JDBCToGCS (blogpost link)
- KafkaToBQ (blogpost link)
- KafkaToGCS (blogpost link)
- PubSubToBigQuery (blogpost link)
- PubSubToBigTable (blogpost link)
- PubSubToGCS (blogpost link)
- S3ToBigQuery (blogpost link)
- SnowflakeToGCS (blogpost link)
- SpannerToGCS (blogpost link)
- TextToBigQuery (blogpost link)
- MongoToGCS (blogpost link)
- RedshiftToGCS (blogpost link)
- JDBCToJDBC (blogpost link)
- GCSToMongo (blogpost link)
- BigQueryToGCS (blogpost link)