Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ PECO-2065 ] Create the async execution flow for the PySQL Connector #463

Merged
merged 9 commits into from
Nov 26, 2024

Conversation

jprakash-db
Copy link
Contributor

@jprakash-db jprakash-db commented Oct 29, 2024

Description

Implementing the flow for the asynchronous execution of the execute command

Functions added

execute_async

This is the main command that will be used to execute the query, otherwise the syntax is identical to the existing execute function

get_query_state

This function is used for the purpose of polling the status of the query, to know what is the status of execution

get_async_execution_result

This function is used to fetch the results that have been completed and then populate the ResultSet. The flow of handling the ResultSet onwards is the same

Testing Details

Added Integration tests

@jprakash-db jprakash-db requested a review from gopalldb October 29, 2024 18:42
@jprakash-db jprakash-db self-assigned this Oct 29, 2024
Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented Nov 2, 2024

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented Nov 2, 2024

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented Nov 4, 2024

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented Nov 4, 2024

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

tests/e2e/test_driver.py Show resolved Hide resolved
@@ -1119,7 +1161,7 @@ def __init__(
self._arrow_schema_bytes = execute_response.arrow_schema_bytes
self._next_row_index = 0

if execute_response.arrow_queue:
if execute_response.arrow_queue or async_op:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why async_op depends on arrow? What if pyarrow is not installed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not depend on arrow, currently in our codebase all data is named as arrow_queue be it arrow queue or column queue. So in this statement I am checking if data is already present or if it is an async operation don't do anything.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is ResultSet should not couple with async_op as I do not see it has something related to async. If you want to force to use arrow_queue, please use force_arrow_queue or similar parameter instead of async_op in the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this

@@ -796,13 +797,15 @@ def execute(
cursor=self,
use_cloud_fetch=self.connection.use_cloud_fetch,
parameters=prepared_params,
async_op=async_op,
)
self.active_result_set = ResultSet(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result set is not ready yet when async_op is True, why do you set this? It should be set in theget_execution_result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result set that is returned over here is empty and does not have any data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but this will make the code confusing and I do not think it is is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this to keep the same logical flow for both execute_async and execute. Like in execute the active_result_set has data and in execute_async since there is no data so it is none. Once data is available the active_result_set will again have data, so logically I felt it made sense

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the return result comes from the returned value from execute_command in the sync flow, which means it is not ready until the sync completes, this is why I said it is confusing as it should be set in the completion of the async operation, this is standard practice/ways for most of the async code (on_complete = (result) => { setResult(result) )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it what Jacky meant here. This is about confusing the users regarding the interface. We can keep the logic internally same, but don't need to keep the interface same for async and sync. For async what matters is the operationHandle. We can have different interface for both, but internally can reuse the code if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gopalldb @jackyhu-db I have changed the code, based on these suggestions.

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).


resp = self.make_request(self._client.FetchResults, req)

t_result_set_metadata_resp = resp.resultSetMetadata

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to check the state of response?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the client.py we check the result status and then go ahead with fetching

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@jprakash-db jprakash-db requested a review from gopalldb November 25, 2024 04:30
"""

Checks for the status of the async executing query and fetches the result if the query is finished
Otherwise it will keep polling the status of the query till there is a Not pending state

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this method async if it is polling for result? Shouldn't be contract like this?

  1. call execute_async()
  2. call get_query_state() (do polling if needed)
  3. if state=Finished, call get_async_execution_result(), if the result is not ready, it can return an empty result with pending state

Denotes whether the execute command will execute the request asynchronously or not
By default it is set to False, if set True the execution request will be submitted and the code
will be non-blocking. User can later poll and request the result when ready

Copy link
Collaborator

@jackyhu-db jackyhu-db Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we need this parameter as async will be different interface exec_async

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@jprakash-db jprakash-db merged commit 328aeb5 into main Nov 26, 2024
9 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants