Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema changes for connecto #324

Open
wants to merge 13 commits into
base: 2.4
Choose a base branch
from

Conversation

bharathsreenivas
Copy link
Contributor

No description provided.

Ubuntu and others added 13 commits June 3, 2019 02:11
* Add files via upload

* change to resolve tinkerpoo dependencies

* New branch for Spark 2.3.0

* Fix typo of PartitionKeyDefinition internal variable in config

* Resolve merge conflict for partitionkeydefinition internal param typo + resolve tinkerpop dependecies + Move to version 1.1.1

* Version bump to 1.1.1

* Bump up Java SDK version to 1.16.0

* Version bump to 1.1.2 + bump bulk executor library version to 1.0.6

* Bump bulk executor library to 1.0.6 + sdk to 1.16.0

* isStreaming=true bug fix

* Create readme.md

* Including Azure Databricks notebooks

Including Azure Databricks notebooks

* Include HTML version of Lambda Speed Layer

* 170 & 171 with remove white spaces

* 170 & 171 with remove white spaces

* Set theme jekyll-theme-minimal

* Add files via upload

* 170 & 171 and fix build

* Create readme2.md

* Delete _config.yml

* Update README.md

* Delete readme2.md

* Adding images and beginning of user guide

* Fixed TOC

* Optimization for bulk import

* Add comments for config params

* Update README.md

* Update README.md

* Update README.md

* Add comments re streaming change feed

* Optimize client retry policy for bulk execution + add exception throw on bulk API master resources exception

* Add exception throw on bulk API master resources exception

* Move to 2.0.0 Bulk Executor library version

* Update README.md

* Fix null value read error

* Add process id to user agent suffix

* Upgrade to 2.0.0 BulkExecutor + Fix null read issue + Add process id to user agent suffix

* Disable setting retry policy to 0 before bulk import

* Allow schema agnostic read from change feed - introduce InferStreamSchema config

* Disable setting retry policy to 0 before bulk import/update

* Increase BulkExecutor initialization retry policy

* Remove redundant user agent suffix set

* Add high bulk executor initialization retry policy + remove gateway mode connection policy override

* Expose write throughput budget config

* Allow schema agnostic read from change feed - introduce InferStreamSchema config

* Expose write throughput budget config

* Add high bulk executor initialization retry policy + remove gateway mode connection policy override

* Moved releases to Maven

* Move to Cosmos DB async SDK 1.0.0

* Moved releases to Maven

* fix Azure#202 to save null value properties as null in CosmosDB instead of omiting them

* fix Azure#202 to save null value properties as null in CosmosDB instead of omiting them

* Debug statements for testing

* Fix structured streaming from CosmosDB - change feed from beginning

* Sample schema using query custom with top

* Delete current tokens and next tokens checkpoint directories to ensure change feed starts from beginning if set

* CosmosDBForeachWriter to push data generated by StreamingQuery to CosmosDB

* Fix structured streaming sink - 'write' on streaming dataframe issue + Set correct default SDK connection policy defaults

* Bump up sync & async Cosmos DB SDK versions

* Refactor code + address PR concerns

* Fix signature issues in CosmosDBRowConverter

* Minor fix to package.scala to remove warning

* bump up jackson-core version to be in parity with java sdk - fix write batch size config parsing

* Remove relocation of org.w3c.dom to fix jackson-binder bug

* Fix structured streaming from CosmosDB - change feed from beginning

* Sample schema using query custom with top

* Delete current tokens and next tokens checkpoint directories to ensure change feed starts from beginning if set

* CosmosDBForeachWriter to push data generated by StreamingQuery to CosmosDB

* Fix structured streaming sink - 'write' on streaming dataframe issue + Set correct default SDK connection policy defaults

* Bump up sync & async Cosmos DB SDK versions

* Refactor code + address PR concerns

* Minor fix to package.scala to remove warning

* Fix signature issues in CosmosDBRowConverter

* bump up jackson-core version to be in parity with java sdk - fix write batch size config parsing

* Remove relocation of org.w3c.dom to fix jackson-binder bug

* Bump Spark connector version to 1.2.0 + Version change of sync CosmosDB Java SDK to 1.16.1

* Remove debug statements

* Update README

* Bump version of Spark connector in UserAgent

* Update connector Maven releases in README

* Update connector Maven release in README

* Bump rxjava version to 1.3.3

* Adding HTML notebooks

* Adding sample for writing stream data

* Deleting

* Add files via upload

* Bump up version to 1.2.1 + Unshade slf4j dependency + Modify Bulk import API exception throwing

* Bump up rx-java dependency to 1.3.3 version

* Unshade SLF4j dependency

* Modify bulk import API exception throw

* Bump up version to 1.2.1

* Add connection request timeout config

* Add connection request timeout config

* Add request timeout config to Aync SDK client creation

* Add config for max concurrency per partition key range

* Fix client creation to create only one client per executor process and set the default maxPoolSize to 500

* Fix client creation to create only one client per executor process and set the default maxPoolSize to 500

* Upgrade version to 1.2.2

* upgrade version to 1.2.2

* Update README.md

* Update README.md

* Update README.md

* BulkExecutor dependency version 2.1.0 + Logging updates

* Update connnector version in Constants

* Add ResponseContinuationTokenLimitInKb config + change default consistency level to Eventual

* Add ResponseContinuationTokenLimitInKb config + change default consistency level to eventual

* Bump to 1.2.4 version

* Expose application name as a config to be added to user agent in SDK

* BulkExecutor dependency version 2.1.0

* Logging updates

* Add ResponseContinuationTokenLimitInKb config

* Change default consistency level to Eventual

* Expose application name as a config to be added to user agent in SDK

* Bump connector version to 1.2.4

* Quick fix:

* Bump connector to 1.2.5 + Bump Java sync SDK version to 1.16.3

* Bump connector version to 1.2.5 + Bump java sync SDK to 1.16.3

* Add monitoring sample with app insights

* rename monitoring sample readme file

* Add content to readme file for monitoring sample

* Fixes for monitoring sample

* Change style in monitoring sample

* Bump up Java sync SDK version to 1.16.4

* Bump up Spark connector version to 1.2.6

* Bump connector version to 1.2.6 + Bump java sync SDK version to 1.16.4

* Use a map of db accounts to documentclients instead of one fixed client

* Update readme.md

* bump up the connector version to 1.2.7

* Solved problems with unsafe data structures (InternalRow and UnsafeArrayData) used mainly in streaming processes

* The change contains 1 - Add db offer support , 2 - update the bulk executor version, 3 - fix null array schema conversion

* Change the default page size to be 1000

* Add files via upload

* Add files via upload

* Add files via upload

* Add files via upload

* Delete Read_Stream_Twitter_Data.dbc

* Add files via upload

* Add files via upload

* Fix query folding by wrapping each filter in brackets

* Pattern match not-exhaustive in some cases

* GitHub#265: Moving CosmosDB Spark connector to latest java bulk executor to fix Maven coordinate resolution issues with Guava - Azure#265

* Fixing slf4j dpeendency to match the one Spark 2.3.0 is using. This caused the next issue in Databricks.

* Also reving version of teh spark connector to 1.3.1

* Update README.md

* Update README.md

* Update the Current version constant to 1.3.4

* Update the connector version to 1.3.4

* Update FilterConverter.scala

* Update README.md

Mismatch between readme and documentation. 
https://docs.microsoft.com/en-us/azure/cosmos-db/spark-connector

* Upgrade azure-cosmosdb dependency version to 2.4.0 + Upgrade jackson-databind dependency version to 2.9.8

* Upgrade azure-documentdb dependency version to 2.1.1

* Upgrade connector version to 2.4.0_2.11-1.3.5

* Update README.md

* Update README.md

* Update README.md

* Change Filter push down to handle escaping string literals

* Some reorg/update links

* Add bug fix for host+key to client and add support for nested property folding

* Use a new client with every new driver operation, Update to 1.4.0-SNAPSHOT

* read into graphframes sample

* Add HDFS operation retries

* Bump Java sync SDK to 2.2.3 + bump Cosmos BulkExecutor to 2.5.0

* Fix isNull filter conversion bug

* added support for reading credentials from env var

* fixed style

* made helper method private

* dependency version bump

* Fixes to folding

* Fix isNull filter conversion bug

* added support for reading credentials from env var

* Add logging for streaming, update version to 1.4

* Fix null value query regression
* Add files via upload

* change to resolve tinkerpoo dependencies

* New branch for Spark 2.3.0

* Fix typo of PartitionKeyDefinition internal variable in config

* Resolve merge conflict for partitionkeydefinition internal param typo + resolve tinkerpop dependecies + Move to version 1.1.1

* Version bump to 1.1.1

* Bump up Java SDK version to 1.16.0

* Version bump to 1.1.2 + bump bulk executor library version to 1.0.6

* Bump bulk executor library to 1.0.6 + sdk to 1.16.0

* isStreaming=true bug fix

* Create readme.md

* Including Azure Databricks notebooks

Including Azure Databricks notebooks

* Include HTML version of Lambda Speed Layer

* 170 & 171 with remove white spaces

* 170 & 171 with remove white spaces

* Set theme jekyll-theme-minimal

* Add files via upload

* 170 & 171 and fix build

* Create readme2.md

* Delete _config.yml

* Update README.md

* Delete readme2.md

* Adding images and beginning of user guide

* Fixed TOC

* Optimization for bulk import

* Add comments for config params

* Update README.md

* Update README.md

* Update README.md

* Add comments re streaming change feed

* Optimize client retry policy for bulk execution + add exception throw on bulk API master resources exception

* Add exception throw on bulk API master resources exception

* Move to 2.0.0 Bulk Executor library version

* Update README.md

* Fix null value read error

* Add process id to user agent suffix

* Upgrade to 2.0.0 BulkExecutor + Fix null read issue + Add process id to user agent suffix

* Disable setting retry policy to 0 before bulk import

* Allow schema agnostic read from change feed - introduce InferStreamSchema config

* Disable setting retry policy to 0 before bulk import/update

* Increase BulkExecutor initialization retry policy

* Remove redundant user agent suffix set

* Add high bulk executor initialization retry policy + remove gateway mode connection policy override

* Expose write throughput budget config

* Allow schema agnostic read from change feed - introduce InferStreamSchema config

* Expose write throughput budget config

* Add high bulk executor initialization retry policy + remove gateway mode connection policy override

* Moved releases to Maven

* Move to Cosmos DB async SDK 1.0.0

* Moved releases to Maven

* fix Azure#202 to save null value properties as null in CosmosDB instead of omiting them

* fix Azure#202 to save null value properties as null in CosmosDB instead of omiting them

* Debug statements for testing

* Fix structured streaming from CosmosDB - change feed from beginning

* Sample schema using query custom with top

* Delete current tokens and next tokens checkpoint directories to ensure change feed starts from beginning if set

* CosmosDBForeachWriter to push data generated by StreamingQuery to CosmosDB

* Fix structured streaming sink - 'write' on streaming dataframe issue + Set correct default SDK connection policy defaults

* Bump up sync & async Cosmos DB SDK versions

* Refactor code + address PR concerns

* Fix signature issues in CosmosDBRowConverter

* Minor fix to package.scala to remove warning

* bump up jackson-core version to be in parity with java sdk - fix write batch size config parsing

* Remove relocation of org.w3c.dom to fix jackson-binder bug

* Fix structured streaming from CosmosDB - change feed from beginning

* Sample schema using query custom with top

* Delete current tokens and next tokens checkpoint directories to ensure change feed starts from beginning if set

* CosmosDBForeachWriter to push data generated by StreamingQuery to CosmosDB

* Fix structured streaming sink - 'write' on streaming dataframe issue + Set correct default SDK connection policy defaults

* Bump up sync & async Cosmos DB SDK versions

* Refactor code + address PR concerns

* Minor fix to package.scala to remove warning

* Fix signature issues in CosmosDBRowConverter

* bump up jackson-core version to be in parity with java sdk - fix write batch size config parsing

* Remove relocation of org.w3c.dom to fix jackson-binder bug

* Bump Spark connector version to 1.2.0 + Version change of sync CosmosDB Java SDK to 1.16.1

* Remove debug statements

* Update README

* Bump version of Spark connector in UserAgent

* Update connector Maven releases in README

* Update connector Maven release in README

* Bump rxjava version to 1.3.3

* Adding HTML notebooks

* Adding sample for writing stream data

* Deleting

* Add files via upload

* Bump up version to 1.2.1 + Unshade slf4j dependency + Modify Bulk import API exception throwing

* Bump up rx-java dependency to 1.3.3 version

* Unshade SLF4j dependency

* Modify bulk import API exception throw

* Bump up version to 1.2.1

* Add connection request timeout config

* Add connection request timeout config

* Add request timeout config to Aync SDK client creation

* Add config for max concurrency per partition key range

* Fix client creation to create only one client per executor process and set the default maxPoolSize to 500

* Fix client creation to create only one client per executor process and set the default maxPoolSize to 500

* Upgrade version to 1.2.2

* upgrade version to 1.2.2

* Update README.md

* Update README.md

* Update README.md

* BulkExecutor dependency version 2.1.0 + Logging updates

* Update connnector version in Constants

* Add ResponseContinuationTokenLimitInKb config + change default consistency level to Eventual

* Add ResponseContinuationTokenLimitInKb config + change default consistency level to eventual

* Bump to 1.2.4 version

* Expose application name as a config to be added to user agent in SDK

* BulkExecutor dependency version 2.1.0

* Logging updates

* Add ResponseContinuationTokenLimitInKb config

* Change default consistency level to Eventual

* Expose application name as a config to be added to user agent in SDK

* Bump connector version to 1.2.4

* Quick fix:

* Bump connector to 1.2.5 + Bump Java sync SDK version to 1.16.3

* Bump connector version to 1.2.5 + Bump java sync SDK to 1.16.3

* Add monitoring sample with app insights

* rename monitoring sample readme file

* Add content to readme file for monitoring sample

* Fixes for monitoring sample

* Change style in monitoring sample

* Bump up Java sync SDK version to 1.16.4

* Bump up Spark connector version to 1.2.6

* Bump connector version to 1.2.6 + Bump java sync SDK version to 1.16.4

* Use a map of db accounts to documentclients instead of one fixed client

* Update readme.md

* bump up the connector version to 1.2.7

* Solved problems with unsafe data structures (InternalRow and UnsafeArrayData) used mainly in streaming processes

* The change contains 1 - Add db offer support , 2 - update the bulk executor version, 3 - fix null array schema conversion

* Change the default page size to be 1000

* Add files via upload

* Add files via upload

* Add files via upload

* Add files via upload

* Delete Read_Stream_Twitter_Data.dbc

* Add files via upload

* Add files via upload

* Fix query folding by wrapping each filter in brackets

* Pattern match not-exhaustive in some cases

* GitHub#265: Moving CosmosDB Spark connector to latest java bulk executor to fix Maven coordinate resolution issues with Guava - Azure#265

* Fixing slf4j dpeendency to match the one Spark 2.3.0 is using. This caused the next issue in Databricks.

* Also reving version of teh spark connector to 1.3.1

* Update README.md

* Update README.md

* Update the Current version constant to 1.3.4

* Update the connector version to 1.3.4

* Update FilterConverter.scala

* Update README.md

Mismatch between readme and documentation. 
https://docs.microsoft.com/en-us/azure/cosmos-db/spark-connector

* Upgrade azure-cosmosdb dependency version to 2.4.0 + Upgrade jackson-databind dependency version to 2.9.8

* Upgrade azure-documentdb dependency version to 2.1.1

* Upgrade connector version to 2.4.0_2.11-1.3.5

* Update README.md

* Update README.md

* Update README.md

* Change Filter push down to handle escaping string literals

* Some reorg/update links

* Add bug fix for host+key to client and add support for nested property folding

* Use a new client with every new driver operation, Update to 1.4.0-SNAPSHOT

* read into graphframes sample

* Add HDFS operation retries

* Bump Java sync SDK to 2.2.3 + bump Cosmos BulkExecutor to 2.5.0

* Fix isNull filter conversion bug

* added support for reading credentials from env var

* fixed style

* made helper method private

* dependency version bump

* Fixes to folding

* Fix isNull filter conversion bug

* added support for reading credentials from env var

* Add logging for streaming, update version to 1.4

* Fix null value query regression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant