Connectorx should take less time to execute sql queries but it is taking more time than sqlite3 module #251
Replies: 1 comment 2 replies
-
Hi @deepakpunia20 , ConnectorX is mainly targeting on the large query result fetching scenario. It speeds up the process by optimizing the client-side execution and saturating both network and machine resource through parallelism. When query execution is the bottleneck (for example the result size is small in your case, or the query is very complex), there will be overhead coming from metadata fetching. In ConnectorX, there are up to three info that will be fetched before issue the query to database:
In your situation the overhead comes from 2 and 3. In order to avoid the potentially costly COUNT query, we suggest to use Arrow as an intermediate destination from ConnectorX and convert it into Pandas using Arrow’s to_pandas API. For example: import connectorx as cx
table = cx.read_sql(db_uri, query, return_type="arrow")
df = table.to_pandas(split_blocks=False, date_as_object=False) Please feel free to have a try. It may reduce the time a bit. But since the query result in your case is too small, the overhead from 3 may still affect the end-to-end time. |
Beta Was this translation helpful? Give feedback.
-
Hi @wangxiaoying ,
Connectorx should take less time to execute sql queries but it is taking more time than sqlite3 module. Below is the example:
Beta Was this translation helpful? Give feedback.
All reactions