Merge remote-tracking branch 'upstream/main' into do-not-retry-401

Signed-off-by: Tor Hødnebø <[email protected]> Signed-off-by: Tor Hødnebø <[email protected]>
databricks · Jul 3, 2024 · cff5fcd · cff5fcd
2 parents 2dd2e99 + 93e207e
commit cff5fcd
Show file tree

Hide file tree

Showing 32 changed files with 271 additions and 266 deletions.
diff --git a/.github/.github/pull_request_template.md b/.github/.github/pull_request_template.md
@@ -1,7 +1,7 @@
 <!-- We welcome contributions. All patches must include a sign-off. Please see CONTRIBUTING.md for details -->
 
 
-## What type of PR is this? 
+## What type of PR is this?
 <!-- Check all that apply, delete what doesn't apply. -->
 
 - [ ] Refactor
@@ -13,8 +13,8 @@
 
 ## How is this tested?
 
-- [ ] Unit tests 
-- [ ] E2E Tests 
+- [ ] Unit tests
+- [ ] E2E Tests
 - [ ] Manually
 - [ ] N/A
 

diff --git a/.github/workflows/code-quality-checks.yml b/.github/workflows/code-quality-checks.yml
@@ -161,6 +161,5 @@ jobs:
       #----------------------------------------------
       - name: Mypy
         run: |
-          cd src # Need to be in the actual databricks/ folder or mypy does the wrong thing.
           mkdir .mypy_cache # Workaround for bad error message "error: --install-types failed (no mypy cache directory)"; see https://github.com/python/mypy/issues/10768#issuecomment-2178450153
-          poetry run mypy --config-file ../pyproject.toml --install-types --non-interactive --namespace-packages databricks
+          poetry run mypy --install-types --non-interactive src
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
@@ -61,4 +61,4 @@ jobs:
       - name: Build and publish to pypi
         uses: JRubics/[email protected]
         with:
-          pypi_token: ${{ secrets.PROD_PYPI_TOKEN }}
+          pypi_token: ${{ secrets.PROD_PYPI_TOKEN }}
diff --git a/.gitignore b/.gitignore
@@ -207,4 +207,4 @@ build/
 .vscode
 
 # don't commit authentication info to source control
-test.env
+test.env
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -74,7 +74,7 @@ If you set your `user.name` and `user.email` git configs, you can sign your comm
 This project uses [Poetry](https://python-poetry.org/) for dependency management, tests, and linting.
 
 1. Clone this respository
-2. Run `poetry install` 
+2. Run `poetry install`
 
 ### Run tests
 
@@ -167,5 +167,4 @@ Modify the dependency specification (syntax can be found [here](https://python-p
 - `poetry update`
 - `rm poetry.lock && poetry install`
 
-Sometimes `poetry update` can freeze or run forever. Deleting the `poetry.lock` file and calling `poetry install` is guaranteed to update everything but is usually _slower_ than `poetry update` **if `poetry update` works at all**. 
-
+Sometimes `poetry update` can freeze or run forever. Deleting the `poetry.lock` file and calling `poetry install` is guaranteed to update everything but is usually _slower_ than `poetry update` **if `poetry update` works at all**.
diff --git a/docs/parameters.md b/docs/parameters.md
@@ -43,7 +43,7 @@ SELECT * FROM table WHERE field = %(value)s
 
 ## Python Syntax
 
-This connector follows the [PEP-249 interface](https://peps.python.org/pep-0249/#id20). The expected structure of the parameter collection follows the paramstyle of the variables in your query. 
+This connector follows the [PEP-249 interface](https://peps.python.org/pep-0249/#id20). The expected structure of the parameter collection follows the paramstyle of the variables in your query.
 
 ### `named` paramstyle Usage Example
 
@@ -85,7 +85,7 @@ The result of the above two examples is identical.
 
 Databricks Runtime expects variable markers to use either `named` or `qmark` paramstyles. Historically, this connector used `pyformat` which Databricks Runtime does not support. So to assist assist customers transitioning their codebases from `pyformat` → `named`, we can dynamically rewrite the variable markers before sending the query to Databricks. This happens only when `use_inline_params=False`.
 
- This dynamic rewrite will be deprecated in a future release. New queries should be written using the `named` paramstyle instead. And users should update their client code to replace `pyformat` markers with `named` markers. 
+ This dynamic rewrite will be deprecated in a future release. New queries should be written using the `named` paramstyle instead. And users should update their client code to replace `pyformat` markers with `named` markers.
 
 For example:
 
@@ -106,7 +106,7 @@ SELECT field1, field2, :param1 FROM table WHERE field4 = :param2
 
 Under the covers, parameter values are annotated with a valid Databricks SQL type. As shown in the examples above, this connector accepts primitive Python types like `int`, `str`, and `Decimal`. When this happens, the connector infers the corresponding Databricks SQL type (e.g. `INT`, `STRING`, `DECIMAL`) automatically. This means that the parameters passed to `cursor.execute()` are always wrapped in a `TDbsqlParameter` subtype prior to execution.
 
-Automatic inferrence is sufficient for most usages. But you can bypass the inference by explicitly setting the Databricks SQL type in your client code. All supported Databricks SQL types have `TDbsqlParameter` implementations which you can import from `databricks.sql.parameters`. 
+Automatic inferrence is sufficient for most usages. But you can bypass the inference by explicitly setting the Databricks SQL type in your client code. All supported Databricks SQL types have `TDbsqlParameter` implementations which you can import from `databricks.sql.parameters`.
 
 `TDbsqlParameter` objects must always be passed within a list. Either paramstyle (`:named` or `?`) may be used. However, if your query uses the `named` paramstyle, all `TDbsqlParameter` objects must be provided a `name` when they are constructed.
 
@@ -158,7 +158,7 @@ Rendering parameters inline is supported on all versions of DBR since these quer
 
 ## SQL Syntax
 
-Variables in your SQL query can look like `%(param)s` or like `%s`. 
+Variables in your SQL query can look like `%(param)s` or like `%s`.
 
 #### Example
 
@@ -172,7 +172,7 @@ SELECT * FROM table WHERE field = %s
 
 ## Python Syntax
 
-This connector follows the [PEP-249 interface](https://peps.python.org/pep-0249/#id20). The expected structure of the parameter collection follows the paramstyle of the variables in your query. 
+This connector follows the [PEP-249 interface](https://peps.python.org/pep-0249/#id20). The expected structure of the parameter collection follows the paramstyle of the variables in your query.
 
 ### `pyformat` paramstyle Usage Example
 
@@ -210,7 +210,7 @@ with sql.connect(..., use_inline_params=True) as conn:
 
 The result of the above two examples is identical.
 
-**Note**: `%s` is not compliant with PEP-249 and only works due to the specific implementation of our inline renderer. 
+**Note**: `%s` is not compliant with PEP-249 and only works due to the specific implementation of our inline renderer.
 
 **Note:** This `%s` syntax overlaps with valid SQL syntax around the usage of `LIKE` DML. For example if your query includes a clause like `WHERE field LIKE '%sequence'`, the parameter inlining function will raise an exception because this string appears to include an inline marker but none is provided. This means that connector versions below 3.0.0 it has been impossible to execute a query that included both parameters and LIKE wildcards. When `use_inline_params=False`, we will pass `%s` occurrences along to the database, allowing it to be used as expected in `LIKE` statements.
 

diff --git a/examples/README.md b/examples/README.md
@@ -7,7 +7,7 @@ We provide example scripts so you can see the connector in action for basic usag
     - DATABRICKS_TOKEN
 
 Follow the quick start in our [README](../README.md) to install `databricks-sql-connector` and see
-how to find the hostname, http path, and access token. Note that for the OAuth examples below a 
+how to find the hostname, http path, and access token. Note that for the OAuth examples below a
 personal access token is not needed.
 
 
@@ -38,7 +38,7 @@ To run all of these examples you can clone the entire repository to your disk. O
 - **`set_user_agent.py`** shows how to customize the user agent header used for Thrift commands. In
 this example the string `ExamplePartnerTag` will be added to the the user agent on every request.
 - **`staging_ingestion.py`** shows how the connector handles Databricks' experimental staging ingestion commands `GET`, `PUT`, and `REMOVE`.
-- **`sqlalchemy.py`** shows a basic example of connecting to Databricks with [SQLAlchemy 2.0](https://www.sqlalchemy.org/). 
+- **`sqlalchemy.py`** shows a basic example of connecting to Databricks with [SQLAlchemy 2.0](https://www.sqlalchemy.org/).
 - **`custom_cred_provider.py`** shows how to pass a custom credential provider to bypass connector authentication. Please install databricks-sdk prior to running this example.
 - **`v3_retries_query_execute.py`** shows how to enable v3 retries in connector version 2.9.x including how to enable retries for non-default retry cases.
-- **`parameters.py`** shows how to use parameters in native and inline modes.
+- **`parameters.py`** shows how to use parameters in native and inline modes.
diff --git a/examples/insert_data.py b/examples/insert_data.py
@@ -18,4 +18,4 @@
     result = cursor.fetchall()
 
     for row in result:
-      print(row)
+      print(row)
diff --git a/examples/persistent_oauth.py b/examples/persistent_oauth.py
@@ -23,10 +23,10 @@
 class SampleOAuthPersistence(OAuthPersistence):
   def persist(self, hostname: str, oauth_token: OAuthToken):
     """To be implemented by the end user to persist in the preferred storage medium.
-    
+
     OAuthToken has two properties:
         1. OAuthToken.access_token
-        2. OAuthToken.refresh_token 
+        2. OAuthToken.refresh_token
 
     Both should be persisted.
     """

diff --git a/examples/query_cancel.py b/examples/query_cancel.py
@@ -19,13 +19,13 @@ def execute_really_long_query():
           print("It looks like this query was cancelled.")
 
     exec_thread = threading.Thread(target=execute_really_long_query)
-    
+
     print("\n Beginning to execute long query")
     exec_thread.start()
-    
+
     # Make sure the query has started before cancelling
     print("\n Waiting 15 seconds before canceling", end="", flush=True)
-    
+
     seconds_waited = 0
     while seconds_waited < 15:
       seconds_waited += 1
@@ -34,15 +34,15 @@ def execute_really_long_query():
 
     print("\n Cancelling the cursor's operation. This can take a few seconds.")
     cursor.cancel()
-    
+
     print("\n Now checking the cursor status:")
     exec_thread.join(5)
 
     assert not exec_thread.is_alive()
     print("\n The previous command was successfully canceled")
 
     print("\n Now reusing the cursor to run a separate query.")
-    
+
     # We can still execute a new command on the cursor
     cursor.execute("SELECT * FROM range(3)")
 

diff --git a/examples/query_execute.py b/examples/query_execute.py
@@ -10,4 +10,4 @@
     result = cursor.fetchall()
 
     for row in result:
-      print(row)
+      print(row)
diff --git a/examples/sqlalchemy.py b/examples/sqlalchemy.py
@@ -8,7 +8,7 @@
 
 Our dialect implements the majority of SQLAlchemy 2.0's API. Because of the extent of SQLAlchemy's
 capabilities it isn't feasible to provide examples of every usage in a single script, so we only
-provide a basic one here. Learn more about usage in README.sqlalchemy.md in this repo. 
+provide a basic one here. Learn more about usage in README.sqlalchemy.md in this repo.
 """
 
 # fmt: off
@@ -89,17 +89,17 @@ class SampleObject(Base):
 
 # Output SQL is:
 # CREATE TABLE pysql_sqlalchemy_example_table (
-#         bigint_col BIGINT NOT NULL, 
-#         string_col STRING, 
-#         tinyint_col SMALLINT, 
-#         int_col INT, 
-#         numeric_col DECIMAL(10, 2), 
-#         boolean_col BOOLEAN, 
-#         date_col DATE, 
-#         datetime_col TIMESTAMP, 
-#         datetime_col_ntz TIMESTAMP_NTZ, 
-#         time_col STRING, 
-#         uuid_col STRING, 
+#         bigint_col BIGINT NOT NULL,
+#         string_col STRING,
+#         tinyint_col SMALLINT,
+#         int_col INT,
+#         numeric_col DECIMAL(10, 2),
+#         boolean_col BOOLEAN,
+#         date_col DATE,
+#         datetime_col TIMESTAMP,
+#         datetime_col_ntz TIMESTAMP_NTZ,
+#         time_col STRING,
+#         uuid_col STRING,
 #         PRIMARY KEY (bigint_col)
 # ) USING DELTA
 

diff --git a/examples/staging_ingestion.py b/examples/staging_ingestion.py
@@ -24,7 +24,7 @@
 
 Additionally, the connection can only manipulate files within the cloud storage location of the authenticated user.
 
-To run this script: 
+To run this script:
 
 1. Set the INGESTION_USER constant to the account email address of the authenticated user
 2. Set the FILEPATH constant to the path of a file that will be uploaded (this example assumes its a CSV file)

diff --git a/examples/v3_retries_query_execute.py b/examples/v3_retries_query_execute.py
@@ -5,7 +5,7 @@
 # This flag will be deprecated in databricks-sql-connector~=3.0.0 as it will become the default.
 #
 # The new retry behaviour is defined in src/databricks/sql/auth/retry.py
-# 
+#
 # The new retry behaviour allows users to force the connector to automatically retry requests that fail with codes
 # that are not retried by default (in most cases only codes 429 and 503 are retried by default). Additional HTTP
 # codes to retry are specified as a list passed to `_retry_dangerous_codes`.
@@ -16,7 +16,7 @@
 # the SQL gateway / load balancer. So there is no risk that retrying the request would result in a doubled
 # (or tripled etc) command execution. These codes are always accompanied by a Retry-After header, which we honour.
 #
-# However, if your use-case emits idempotent queries such as SELECT statements, it can be helpful to retry 
+# However, if your use-case emits idempotent queries such as SELECT statements, it can be helpful to retry
 # for 502 (Bad Gateway) codes etc. In these cases, there is a possibility that the initial command _did_ reach
 # Databricks compute and retrying it could result in additional executions. Retrying under these conditions uses
 # an exponential back-off since a Retry-After header is not present.