[ PECO-2065 ] Create the async execution flow for the PySQL Connector #463

jprakash-db · 2024-10-29T18:42:54Z

Description

Implementing the flow for the asynchronous execution of the execute command

Functions added

execute_async

This is the main command that will be used to execute the query, otherwise the syntax is identical to the existing execute function

get_query_state

This function is used for the purpose of polling the status of the query, to know what is the status of execution

get_async_execution_result

This function is used to fetch the results that have been completed and then populate the ResultSet. The flow of handling the ResultSet onwards is the same

Testing Details

Added Integration tests

github-actions · 2024-10-29T18:43:07Z

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

… invalid operation handle still persists

github-actions · 2024-11-02T10:54:07Z

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

github-actions · 2024-11-02T10:55:23Z

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

…_result

github-actions · 2024-11-04T05:48:53Z

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

github-actions · 2024-11-04T06:32:51Z

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

src/databricks/sql/client.py

tests/e2e/test_driver.py

jackyhu-db · 2024-11-12T19:06:06Z

src/databricks/sql/client.py

@@ -1119,7 +1161,7 @@ def __init__(
        self._arrow_schema_bytes = execute_response.arrow_schema_bytes
        self._next_row_index = 0

-        if execute_response.arrow_queue:
+        if execute_response.arrow_queue or async_op:


why async_op depends on arrow? What if pyarrow is not installed

It does not depend on arrow, currently in our codebase all data is named as arrow_queue be it arrow queue or column queue. So in this statement I am checking if data is already present or if it is an async operation don't do anything.

My point is ResultSet should not couple with async_op as I do not see it has something related to async. If you want to force to use arrow_queue, please use force_arrow_queue or similar parameter instead of async_op in the constructor.

jackyhu-db · 2024-11-12T19:16:44Z

src/databricks/sql/client.py

@@ -796,13 +797,15 @@ def execute(
            cursor=self,
            use_cloud_fetch=self.connection.use_cloud_fetch,
            parameters=prepared_params,
+            async_op=async_op,
        )
        self.active_result_set = ResultSet(


result set is not ready yet when async_op is True, why do you set this? It should be set in theget_execution_result

The result set that is returned over here is empty and does not have any data.

I know, but this will make the code confusing and I do not think it is is necessary.

I did this to keep the same logical flow for both execute_async and execute. Like in execute the active_result_set has data and in execute_async since there is no data so it is none. Once data is available the active_result_set will again have data, so logically I felt it made sense

the return result comes from the returned value from execute_command in the sync flow, which means it is not ready until the sync completes, this is why I said it is confusing as it should be set in the completion of the async operation, this is standard practice/ways for most of the async code (on_complete = (result) => { setResult(result) )

I got it what Jacky meant here. This is about confusing the users regarding the interface. We can keep the logic internally same, but don't need to keep the interface same for async and sync. For async what matters is the operationHandle. We can have different interface for both, but internally can reuse the code if possible.

@gopalldb @jackyhu-db I have changed the code, based on these suggestions.

github-actions · 2024-11-17T11:14:14Z

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

gopalldb · 2024-11-20T09:02:51Z

src/databricks/sql/thrift_backend.py

+
+        resp = self.make_request(self._client.FetchResults, req)
+
+        t_result_set_metadata_resp = resp.resultSetMetadata


we don't need to check the state of response?

In the client.py we check the result status and then go ahead with fetching

github-actions · 2024-11-24T06:32:47Z

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

gopalldb · 2024-11-25T11:20:21Z

src/databricks/sql/client.py

+        """
+
+        Checks for the status of the async executing query and fetches the result if the query is finished
+        Otherwise it will keep polling the status of the query till there is a Not pending state


How is this method async if it is polling for result? Shouldn't be contract like this?

call execute_async()

call get_query_state() (do polling if needed)

if state=Finished, call get_async_execution_result(), if the result is not ready, it can return an empty result with pending state

jackyhu-db · 2024-11-25T17:08:39Z

src/databricks/sql/client.py

+        Denotes whether the execute command will execute the request asynchronously or not
+        By default it is set to False, if set True the execution request will be submitted and the code
+        will be non-blocking. User can later poll and request the result when ready
+


I do not think we need this parameter as async will be different interface exec_async

github-actions · 2024-11-25T17:27:55Z

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

github-actions · 2024-11-25T19:43:56Z

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Built the basic flow for the async pipeline - testing is remaining

e637408

jprakash-db requested a review from gopalldb October 29, 2024 18:42

jprakash-db self-assigned this Oct 29, 2024

Implemented the flow for the get_execution_result, but the problem of…

a174370

… invalid operation handle still persists

jprakash-db had a problem deploying to azure-prod November 2, 2024 10:53 — with GitHub Actions Failure

Missed adding some files in previous commit

925b2a3

jprakash-db had a problem deploying to azure-prod November 2, 2024 10:55 — with GitHub Actions Failure

Working prototype of execute_async, get_query_state and get_execution…

756ac17

…_result

jprakash-db had a problem deploying to azure-prod November 4, 2024 05:48 — with GitHub Actions Failure

jprakash-db marked this pull request as ready for review November 4, 2024 05:55

jprakash-db requested review from rcypher-databricks, yunbodeng-db, andrefurlan-db, jackyhu-db and benc-db as code owners November 4, 2024 05:55

Added integration tests for execute_async

beffa2f

jprakash-db temporarily deployed to azure-prod November 4, 2024 06:32 — with GitHub Actions Inactive

jackyhu-db reviewed Nov 12, 2024

View reviewed changes

src/databricks/sql/client.py Outdated Show resolved Hide resolved

jackyhu-db reviewed Nov 12, 2024

View reviewed changes

src/databricks/sql/client.py Show resolved Hide resolved

jackyhu-db reviewed Nov 12, 2024

View reviewed changes

src/databricks/sql/client.py Outdated Show resolved Hide resolved

jackyhu-db reviewed Nov 12, 2024

View reviewed changes

add docs for functions

8bf4442

jprakash-db temporarily deployed to azure-prod November 17, 2024 11:14 — with GitHub Actions Inactive

jprakash-db requested a review from jackyhu-db November 18, 2024 04:55

gopalldb reviewed Nov 20, 2024

View reviewed changes

Refractored the async code

b44b298

jprakash-db temporarily deployed to azure-prod November 24, 2024 06:32 — with GitHub Actions Inactive

jprakash-db requested a review from gopalldb November 25, 2024 04:30

gopalldb reviewed Nov 25, 2024

View reviewed changes

jackyhu-db reviewed Nov 25, 2024

View reviewed changes

Fixed java doc

69b32e9

jprakash-db had a problem deploying to azure-prod November 25, 2024 17:27 — with GitHub Actions Failure

jackyhu-db approved these changes Nov 25, 2024

View reviewed changes

jprakash-db temporarily deployed to azure-prod November 25, 2024 19:14 — with GitHub Actions Inactive

Reformatted

0511690

jprakash-db temporarily deployed to azure-prod November 25, 2024 19:43 — with GitHub Actions Inactive

jprakash-db merged commit 328aeb5 into main Nov 26, 2024
9 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ PECO-2065 ] Create the async execution flow for the PySQL Connector #463

[ PECO-2065 ] Create the async execution flow for the PySQL Connector #463

jprakash-db commented Oct 29, 2024 •

edited

Loading

github-actions bot commented Oct 29, 2024

github-actions bot commented Nov 2, 2024

github-actions bot commented Nov 2, 2024

github-actions bot commented Nov 4, 2024

github-actions bot commented Nov 4, 2024

jackyhu-db Nov 12, 2024

jprakash-db Nov 18, 2024

jackyhu-db Nov 18, 2024

jprakash-db Nov 25, 2024

jackyhu-db Nov 12, 2024

jprakash-db Nov 18, 2024

jackyhu-db Nov 18, 2024

jprakash-db Nov 18, 2024

jackyhu-db Nov 18, 2024

gopalldb Nov 22, 2024

jprakash-db Nov 25, 2024

github-actions bot commented Nov 17, 2024

gopalldb Nov 20, 2024

jprakash-db Nov 25, 2024

github-actions bot commented Nov 24, 2024

gopalldb Nov 25, 2024

jackyhu-db Nov 25, 2024 •

edited

Loading

github-actions bot commented Nov 25, 2024

github-actions bot commented Nov 25, 2024


		resp = self.make_request(self._client.FetchResults, req)

		t_result_set_metadata_resp = resp.resultSetMetadata

[ PECO-2065 ] Create the async execution flow for the PySQL Connector #463

[ PECO-2065 ] Create the async execution flow for the PySQL Connector #463

Conversation

jprakash-db commented Oct 29, 2024 • edited Loading

Description

Functions added

execute_async

get_query_state

get_async_execution_result

Testing Details

github-actions bot commented Oct 29, 2024

github-actions bot commented Nov 2, 2024

github-actions bot commented Nov 2, 2024

github-actions bot commented Nov 4, 2024

github-actions bot commented Nov 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Nov 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Nov 24, 2024

Choose a reason for hiding this comment

jackyhu-db Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Nov 25, 2024

github-actions bot commented Nov 25, 2024

jprakash-db commented Oct 29, 2024 •

edited

Loading

jackyhu-db Nov 25, 2024 •

edited

Loading