Skip to content

Commit

Permalink
Upgrade mypy (#406)
Browse files Browse the repository at this point in the history
* Upgrade mypy

This commit removes the flag (and cd step) from f53aa37 which we added to get mypy to treat namespaces correctly. This was apparently a bug in mypy, or behavior they decided to change. To get the new behavior, we must upgrade mypy. (This also allows us to remove a couple `# type: ignore` comment that are no longer needed.)

This commit runs changes the version of mypy and runs `poetry lock`. It also conforms the whitespace of files in this project to the expectations of various tools and standard (namely: removing trailing whitespace as expected by git and enforcing the existence of one and only one newline at the end of a file as expected by unix and github.) It also uses https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade codebase due to a change in mypy behavior. For a similar reason, it also fixes a new type (or otherwise) errors:

* "Return type 'Retry' of 'new' incompatible with return type 'DatabricksRetryPolicy' in supertype 'Retry'"
* databricks/sql/auth/retry.py:225: error: object has no attribute update  [attr-defined]
* /test_param_escaper.py:31: DeprecationWarning: invalid escape sequence \) [as it happens, I think it was also wrong for the string not to be raw, because I'm pretty sure it wants all of its backslashed single-quotes to appear literally with the backslashes, which wasn't happening until now]
* ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject [this is like a numpy version thing, which I fixed by being stricter about numpy version]

---------

Signed-off-by: wyattscarpenter <[email protected]>

* Incorporate suggestion.

I decided the most expedient way of dealing with this type error was just adding the type ignore comment back in, but with a  `[attr-defined]` specifier this time. I mean, otherwise I would have to restructure the code or figure out the proper types for a TypedDict for the dict and I don't think that's worth it at the moment.

Signed-off-by: wyattscarpenter <[email protected]>

---------

Signed-off-by: wyattscarpenter <[email protected]>
  • Loading branch information
wyattscarpenter authored Jul 3, 2024
1 parent f53aa37 commit 93e207e
Show file tree
Hide file tree
Showing 32 changed files with 272 additions and 267 deletions.
6 changes: 3 additions & 3 deletions .github/.github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<!-- We welcome contributions. All patches must include a sign-off. Please see CONTRIBUTING.md for details -->


## What type of PR is this?
## What type of PR is this?
<!-- Check all that apply, delete what doesn't apply. -->

- [ ] Refactor
Expand All @@ -13,8 +13,8 @@

## How is this tested?

- [ ] Unit tests
- [ ] E2E Tests
- [ ] Unit tests
- [ ] E2E Tests
- [ ] Manually
- [ ] N/A

Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/code-quality-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,5 @@ jobs:
#----------------------------------------------
- name: Mypy
run: |
cd src # Need to be in the actual databricks/ folder or mypy does the wrong thing.
mkdir .mypy_cache # Workaround for bad error message "error: --install-types failed (no mypy cache directory)"; see https://github.com/python/mypy/issues/10768#issuecomment-2178450153
poetry run mypy --config-file ../pyproject.toml --install-types --non-interactive --namespace-packages databricks
poetry run mypy --install-types --non-interactive src
2 changes: 1 addition & 1 deletion .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,4 @@ jobs:
- name: Build and publish to pypi
uses: JRubics/[email protected]
with:
pypi_token: ${{ secrets.PROD_PYPI_TOKEN }}
pypi_token: ${{ secrets.PROD_PYPI_TOKEN }}
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -207,4 +207,4 @@ build/
.vscode

# don't commit authentication info to source control
test.env
test.env
5 changes: 2 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ If you set your `user.name` and `user.email` git configs, you can sign your comm
This project uses [Poetry](https://python-poetry.org/) for dependency management, tests, and linting.

1. Clone this respository
2. Run `poetry install`
2. Run `poetry install`

### Run tests

Expand Down Expand Up @@ -167,5 +167,4 @@ Modify the dependency specification (syntax can be found [here](https://python-p
- `poetry update`
- `rm poetry.lock && poetry install`

Sometimes `poetry update` can freeze or run forever. Deleting the `poetry.lock` file and calling `poetry install` is guaranteed to update everything but is usually _slower_ than `poetry update` **if `poetry update` works at all**.

Sometimes `poetry update` can freeze or run forever. Deleting the `poetry.lock` file and calling `poetry install` is guaranteed to update everything but is usually _slower_ than `poetry update` **if `poetry update` works at all**.
12 changes: 6 additions & 6 deletions docs/parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ SELECT * FROM table WHERE field = %(value)s

## Python Syntax

This connector follows the [PEP-249 interface](https://peps.python.org/pep-0249/#id20). The expected structure of the parameter collection follows the paramstyle of the variables in your query.
This connector follows the [PEP-249 interface](https://peps.python.org/pep-0249/#id20). The expected structure of the parameter collection follows the paramstyle of the variables in your query.

### `named` paramstyle Usage Example

Expand Down Expand Up @@ -85,7 +85,7 @@ The result of the above two examples is identical.

Databricks Runtime expects variable markers to use either `named` or `qmark` paramstyles. Historically, this connector used `pyformat` which Databricks Runtime does not support. So to assist assist customers transitioning their codebases from `pyformat``named`, we can dynamically rewrite the variable markers before sending the query to Databricks. This happens only when `use_inline_params=False`.

This dynamic rewrite will be deprecated in a future release. New queries should be written using the `named` paramstyle instead. And users should update their client code to replace `pyformat` markers with `named` markers.
This dynamic rewrite will be deprecated in a future release. New queries should be written using the `named` paramstyle instead. And users should update their client code to replace `pyformat` markers with `named` markers.

For example:

Expand All @@ -106,7 +106,7 @@ SELECT field1, field2, :param1 FROM table WHERE field4 = :param2

Under the covers, parameter values are annotated with a valid Databricks SQL type. As shown in the examples above, this connector accepts primitive Python types like `int`, `str`, and `Decimal`. When this happens, the connector infers the corresponding Databricks SQL type (e.g. `INT`, `STRING`, `DECIMAL`) automatically. This means that the parameters passed to `cursor.execute()` are always wrapped in a `TDbsqlParameter` subtype prior to execution.

Automatic inferrence is sufficient for most usages. But you can bypass the inference by explicitly setting the Databricks SQL type in your client code. All supported Databricks SQL types have `TDbsqlParameter` implementations which you can import from `databricks.sql.parameters`.
Automatic inferrence is sufficient for most usages. But you can bypass the inference by explicitly setting the Databricks SQL type in your client code. All supported Databricks SQL types have `TDbsqlParameter` implementations which you can import from `databricks.sql.parameters`.

`TDbsqlParameter` objects must always be passed within a list. Either paramstyle (`:named` or `?`) may be used. However, if your query uses the `named` paramstyle, all `TDbsqlParameter` objects must be provided a `name` when they are constructed.

Expand Down Expand Up @@ -158,7 +158,7 @@ Rendering parameters inline is supported on all versions of DBR since these quer

## SQL Syntax

Variables in your SQL query can look like `%(param)s` or like `%s`.
Variables in your SQL query can look like `%(param)s` or like `%s`.

#### Example

Expand All @@ -172,7 +172,7 @@ SELECT * FROM table WHERE field = %s

## Python Syntax

This connector follows the [PEP-249 interface](https://peps.python.org/pep-0249/#id20). The expected structure of the parameter collection follows the paramstyle of the variables in your query.
This connector follows the [PEP-249 interface](https://peps.python.org/pep-0249/#id20). The expected structure of the parameter collection follows the paramstyle of the variables in your query.

### `pyformat` paramstyle Usage Example

Expand Down Expand Up @@ -210,7 +210,7 @@ with sql.connect(..., use_inline_params=True) as conn:

The result of the above two examples is identical.

**Note**: `%s` is not compliant with PEP-249 and only works due to the specific implementation of our inline renderer.
**Note**: `%s` is not compliant with PEP-249 and only works due to the specific implementation of our inline renderer.

**Note:** This `%s` syntax overlaps with valid SQL syntax around the usage of `LIKE` DML. For example if your query includes a clause like `WHERE field LIKE '%sequence'`, the parameter inlining function will raise an exception because this string appears to include an inline marker but none is provided. This means that connector versions below 3.0.0 it has been impossible to execute a query that included both parameters and LIKE wildcards. When `use_inline_params=False`, we will pass `%s` occurrences along to the database, allowing it to be used as expected in `LIKE` statements.

Expand Down
6 changes: 3 additions & 3 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ We provide example scripts so you can see the connector in action for basic usag
- DATABRICKS_TOKEN

Follow the quick start in our [README](../README.md) to install `databricks-sql-connector` and see
how to find the hostname, http path, and access token. Note that for the OAuth examples below a
how to find the hostname, http path, and access token. Note that for the OAuth examples below a
personal access token is not needed.


Expand Down Expand Up @@ -38,7 +38,7 @@ To run all of these examples you can clone the entire repository to your disk. O
- **`set_user_agent.py`** shows how to customize the user agent header used for Thrift commands. In
this example the string `ExamplePartnerTag` will be added to the the user agent on every request.
- **`staging_ingestion.py`** shows how the connector handles Databricks' experimental staging ingestion commands `GET`, `PUT`, and `REMOVE`.
- **`sqlalchemy.py`** shows a basic example of connecting to Databricks with [SQLAlchemy 2.0](https://www.sqlalchemy.org/).
- **`sqlalchemy.py`** shows a basic example of connecting to Databricks with [SQLAlchemy 2.0](https://www.sqlalchemy.org/).
- **`custom_cred_provider.py`** shows how to pass a custom credential provider to bypass connector authentication. Please install databricks-sdk prior to running this example.
- **`v3_retries_query_execute.py`** shows how to enable v3 retries in connector version 2.9.x including how to enable retries for non-default retry cases.
- **`parameters.py`** shows how to use parameters in native and inline modes.
- **`parameters.py`** shows how to use parameters in native and inline modes.
2 changes: 1 addition & 1 deletion examples/insert_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@
result = cursor.fetchall()

for row in result:
print(row)
print(row)
4 changes: 2 additions & 2 deletions examples/persistent_oauth.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,10 @@
class SampleOAuthPersistence(OAuthPersistence):
def persist(self, hostname: str, oauth_token: OAuthToken):
"""To be implemented by the end user to persist in the preferred storage medium.
OAuthToken has two properties:
1. OAuthToken.access_token
2. OAuthToken.refresh_token
2. OAuthToken.refresh_token
Both should be persisted.
"""
Expand Down
10 changes: 5 additions & 5 deletions examples/query_cancel.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@ def execute_really_long_query():
print("It looks like this query was cancelled.")

exec_thread = threading.Thread(target=execute_really_long_query)

print("\n Beginning to execute long query")
exec_thread.start()

# Make sure the query has started before cancelling
print("\n Waiting 15 seconds before canceling", end="", flush=True)

seconds_waited = 0
while seconds_waited < 15:
seconds_waited += 1
Expand All @@ -34,15 +34,15 @@ def execute_really_long_query():

print("\n Cancelling the cursor's operation. This can take a few seconds.")
cursor.cancel()

print("\n Now checking the cursor status:")
exec_thread.join(5)

assert not exec_thread.is_alive()
print("\n The previous command was successfully canceled")

print("\n Now reusing the cursor to run a separate query.")

# We can still execute a new command on the cursor
cursor.execute("SELECT * FROM range(3)")

Expand Down
2 changes: 1 addition & 1 deletion examples/query_execute.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@
result = cursor.fetchall()

for row in result:
print(row)
print(row)
24 changes: 12 additions & 12 deletions examples/sqlalchemy.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
Our dialect implements the majority of SQLAlchemy 2.0's API. Because of the extent of SQLAlchemy's
capabilities it isn't feasible to provide examples of every usage in a single script, so we only
provide a basic one here. Learn more about usage in README.sqlalchemy.md in this repo.
provide a basic one here. Learn more about usage in README.sqlalchemy.md in this repo.
"""

# fmt: off
Expand Down Expand Up @@ -89,17 +89,17 @@ class SampleObject(Base):

# Output SQL is:
# CREATE TABLE pysql_sqlalchemy_example_table (
# bigint_col BIGINT NOT NULL,
# string_col STRING,
# tinyint_col SMALLINT,
# int_col INT,
# numeric_col DECIMAL(10, 2),
# boolean_col BOOLEAN,
# date_col DATE,
# datetime_col TIMESTAMP,
# datetime_col_ntz TIMESTAMP_NTZ,
# time_col STRING,
# uuid_col STRING,
# bigint_col BIGINT NOT NULL,
# string_col STRING,
# tinyint_col SMALLINT,
# int_col INT,
# numeric_col DECIMAL(10, 2),
# boolean_col BOOLEAN,
# date_col DATE,
# datetime_col TIMESTAMP,
# datetime_col_ntz TIMESTAMP_NTZ,
# time_col STRING,
# uuid_col STRING,
# PRIMARY KEY (bigint_col)
# ) USING DELTA

Expand Down
2 changes: 1 addition & 1 deletion examples/staging_ingestion.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
Additionally, the connection can only manipulate files within the cloud storage location of the authenticated user.
To run this script:
To run this script:
1. Set the INGESTION_USER constant to the account email address of the authenticated user
2. Set the FILEPATH constant to the path of a file that will be uploaded (this example assumes its a CSV file)
Expand Down
4 changes: 2 additions & 2 deletions examples/v3_retries_query_execute.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# This flag will be deprecated in databricks-sql-connector~=3.0.0 as it will become the default.
#
# The new retry behaviour is defined in src/databricks/sql/auth/retry.py
#
#
# The new retry behaviour allows users to force the connector to automatically retry requests that fail with codes
# that are not retried by default (in most cases only codes 429 and 503 are retried by default). Additional HTTP
# codes to retry are specified as a list passed to `_retry_dangerous_codes`.
Expand All @@ -16,7 +16,7 @@
# the SQL gateway / load balancer. So there is no risk that retrying the request would result in a doubled
# (or tripled etc) command execution. These codes are always accompanied by a Retry-After header, which we honour.
#
# However, if your use-case emits idempotent queries such as SELECT statements, it can be helpful to retry
# However, if your use-case emits idempotent queries such as SELECT statements, it can be helpful to retry
# for 502 (Bad Gateway) codes etc. In these cases, there is a possibility that the initial command _did_ reach
# Databricks compute and retrying it could result in additional executions. Retrying under these conditions uses
# an exponential back-off since a Retry-After header is not present.
Expand Down
Loading

0 comments on commit 93e207e

Please sign in to comment.