-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat[DatabricksConnector]: connector to connect to Databricks on Cloud #580
Conversation
WalkthroughThis update introduces the Changes
TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 12
Files ignored due to filter (2)
- poetry.lock
- pyproject.toml
Files selected for processing (8)
- examples/from_databricks.py (1 hunks)
- pandasai/connectors/init.py (2 hunks)
- pandasai/connectors/base.py (1 hunks)
- pandasai/connectors/databricks.py (1 hunks)
- pandasai/connectors/sql.py (2 hunks)
- tests/connectors/test_databricks.py (1 hunks)
- tests/connectors/test_snowflake.py (1 hunks)
- tests/connectors/test_sql.py (1 hunks)
Files skipped from review due to trivial changes (1)
- pandasai/connectors/init.py
Additional comments (Suppressed): 8
examples/from_databricks.py (2)
- 1-27: The code snippet is an example of how to use the new
DatabricksConnector
class. It creates aDatabricksConnector
instance with a configuration, initializes anOpenAI
instance, and uses these to create aSmartDataframe
. Thechat
method is then called on theSmartDataframe
instance and the response is printed.Please ensure that the token and other sensitive information are not hardcoded in the script but rather fetched from secure sources or environment variables. Hardcoding sensitive information can lead to security vulnerabilities.
- 8-20: Ensure that the host, database, token, port, table, httpPath, and where clause provided in the config dictionary are correct and valid for your Databricks setup. Invalid configurations could lead to connection failures or incorrect data retrieval.
pandasai/connectors/base.py (2)
66-75: The
DatabricksConnectorConfig
class is introduced to hold the configuration for a Databricks connection. Ensure that all necessary parameters are included and correctly named according to the Databricks API documentation. Also, consider adding default values where applicable.77-79: No changes have been made to the
BaseConnector
class in this hunk. The class continues to serve as an abstract base class for all connectors.tests/connectors/test_sql.py (2)
76-83: The new implementation of
test_columns_count_property
is more efficient and less error-prone than the previous one. Instead of executing a SQL query to get the number of columns, it now simply calls thelen
function on the columns returned by thehead
method. This change simplifies the code and reduces the risk of SQL injection attacks. However, ensure that thehead
method always returns a DataFrame object with acolumns
attribute.79-81: This mock setup for the DataFrame object and its columns is a good practice for unit testing. It isolates the test from external dependencies and makes it deterministic.
tests/connectors/test_snowflake.py (2)
76-83: The new implementation of
test_columns_count_property
is more efficient and less error-prone than the previous one. Instead of executing a SQL query to fetch the column count, it now directly uses thelen
function on the columns returned by thehead
method. This change reduces the complexity of the code and makes it easier to understand and maintain. However, ensure that thehead
method reliably returns a DataFrame with the correct columns.79-81: Mocking the
head
method of the connector to return a DataFrame with two columns is a good way to isolate the test from external dependencies. This ensures that the test only verifies the functionality of thecolumns_count
property and does not fail due to issues in thehead
method or the database connection.
tests/connectors/test_databricks.py
Outdated
self.config = DatabricksConnectorConfig( | ||
dialect="databricks", | ||
host="ehxzojy-ue47135", | ||
port=443, | ||
token="token", | ||
database="SNOWFLAKE_SAMPLE_DATA", | ||
httpPath="/sql/1.0/warehouses/1241rsa32", | ||
table="lineitem", | ||
where=[["column_name", "=", "value"]], | ||
).dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DatabricksConnectorConfig
instance is created with hardcoded values. This might not be a good practice for unit tests as it reduces flexibility and reusability of the test case. Consider using setUpClass method or a fixture to set these values, which can then be overridden in individual test cases if needed.
tests/connectors/test_databricks.py
Outdated
def test_constructor_and_properties(self): | ||
# Test constructor and properties | ||
self.assertEqual(self.connector._config, self.config) | ||
self.assertEqual(self.connector._engine, self.mock_engine) | ||
self.assertEqual(self.connector._connection, self.mock_connection) | ||
self.assertEqual(self.connector._cache_interval, 600) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_constructor_and_properties
checks the properties of the DatabricksConnector
object after its creation. However, it does not verify if the _load_connector_config
method was called during the initialization. This is an important part of the initialization process and should be tested.
self.assertTrue(self.connector._load_connector_config.called)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArslanSaleem agree, let's add it to also increase the coverage!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.
def test_columns_count_property(self): | ||
# Test columns_count property | ||
self.connector._columns_count = None | ||
mock_df = Mock() | ||
mock_df.columns = ["Column1", "Column2"] | ||
self.connector.head = Mock(return_value=mock_df) | ||
columns_count = self.connector.columns_count | ||
self.assertEqual(columns_count, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests test_rows_count_property
and test_columns_count_property
do not check if the head
method was called when the _rows_count
and _columns_count
properties are None. This is an important part of the logic and should be tested.
self.assertTrue(self.connector.head.called)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArslanSaleem let's add it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.
pandasai/connectors/databricks.py
Outdated
SnowFlake connectors are used to connect to SnowFlake Data Cloud. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class docstring seems to be copied from the SnowFlake connector and not updated. It should reflect that this is a Databricks connector.
- """
- SnowFlake connectors are used to connect to SnowFlake Data Cloud.
- """
+ """
+ Databricks connectors are used to connect to Databricks SQL Warehouse on Azure, AWS, and GCP.
+ """
pandasai/connectors/databricks.py
Outdated
Initialize the SnowFlake connector with the given configuration. | ||
|
||
Args: | ||
config (ConnectorConfig): The configuration for the SnowFlake connector. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the docstring for the __init__
method refers to SnowFlake instead of Databricks. This needs to be corrected.
- Initialize the SnowFlake connector with the given configuration.
+ Initialize the Databricks connector with the given configuration.
- config (ConnectorConfig): The configuration for the SnowFlake connector.
+ config (DatabricksConnectorConfig): The configuration for the Databricks connector.
pandasai/connectors/databricks.py
Outdated
if "database" not in config and os.getenv("SNOWFLAKE_DATABASE"): | ||
config["database"] = os.getenv("SNOWFLAKE_DATABASE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The environment variable being checked here is for Snowflake, not Databricks. This should be corrected to check for a Databricks-specific environment variable.
- if "database" not in config and os.getenv("SNOWFLAKE_DATABASE"):
- config["database"] = os.getenv("SNOWFLAKE_DATABASE")
+ if "database" not in config and os.getenv("DATABRICKS_DATABASE"):
+ config["database"] = os.getenv("DATABRICKS_DATABASE")
pandasai/connectors/databricks.py
Outdated
Return the string representation of the SnowFlake connector. | ||
|
||
Returns: | ||
str: The string representation of the SnowFlake connector. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring for the __repr__
method also refers to SnowFlake instead of Databricks. This needs to be corrected.
- Return the string representation of the SnowFlake connector.
+ Return the string representation of the Databricks connector.
- str: The string representation of the SnowFlake connector.
+ str: The string representation of the Databricks connector.
Codecov Report
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. @@ Coverage Diff @@
## main #580 +/- ##
==========================================
+ Coverage 83.22% 83.89% +0.67%
==========================================
Files 54 55 +1
Lines 2682 2689 +7
==========================================
+ Hits 2232 2256 +24
+ Misses 450 433 -17
... and 3 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
""" | ||
self._engine = create_engine( | ||
f"{config.dialect}://token:{config.token}@{config.host}:{config.port}?http_path={config.httpPath}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connection string is being created without any sanitization or encoding of the input parameters. This could potentially lead to security issues such as SQL injection attacks if the config parameters are user-supplied. Consider using a safer method to construct the connection string, such as URL encoding the parameters or using a library function designed for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai, could you make an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gventuri, sure! Here's an example of how you could safely construct the connection string using URL encoding:
from urllib.parse import quote_plus
class DatabricksConnector(SQLConnector):
# ...
def _init_connection(self, config: DatabricksConnectorConfig):
"""
Initialize Database Connection
Args:
config (DatabricksConnectorConfig): Configurations to load database
"""
safe_token = quote_plus(config.token)
safe_host = quote_plus(config.host)
safe_port = quote_plus(config.port)
safe_httpPath = quote_plus(config.httpPath)
self._engine = create_engine(
f"{config.dialect}://token:{safe_token}@{safe_host}:{safe_port}?http_path={safe_httpPath}"
)
self._connection = self._engine.connect()
In this example, I've used Python's built-in urllib.parse.quote_plus
function to URL encode the input parameters. This will ensure that any special characters in the parameters are properly escaped, preventing potential SQL injection attacks.
pandasai/connectors/databricks.py
Outdated
return ( | ||
f"<{self.__class__.__name__} dialect={self._config.dialect} " | ||
f"token={self._config.token} " | ||
f"host={self._config.host} port={self._config.port} " | ||
f"database={self._config.database} httpPath={str(self._config.httpPath)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exposing sensitive information like tokens in the string representation of the object can be a security risk. If this object is ever logged or printed in a debugging statement, it could expose the token to unauthorized users. Consider removing the token from the string representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArslanSaleem let's not log the sensitive info!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, only a couple of minor comments!
pandasai/connectors/sql.py
Outdated
query = sql.text(f"SELECT COUNT(*) FROM {self._config.table}") | ||
|
||
# Return the number of rows | ||
self._rows_count = self._connection.execute(query).fetchone()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this could create a potential attack. We are using bindparams to validate the table name. Also the suggestion from @coderabbitai might work!
pandasai/connectors/databricks.py
Outdated
Databricks connectors are used to connect to Databricks Data Cloud. | ||
""" | ||
|
||
def __init__(self, config: DatabricksConnectorConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This accepts also a dictionary, Union[DatabricksConnectorConfig, dict]
. In that case, we need to convert it to a DatabricksConnectorConfig.
|
||
""" | ||
self._engine = create_engine( | ||
f"{config.dialect}://token:{config.token}@{config.host}:{config.port}?http_path={config.httpPath}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai, could you make an example?
tests/connectors/test_databricks.py
Outdated
host="ehxzojy-ue47135", | ||
port=443, | ||
token="token", | ||
database="SNOWFLAKE_SAMPLE_DATA", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's replace with DATABRICKS
tests/connectors/test_databricks.py
Outdated
def test_constructor_and_properties(self): | ||
# Test constructor and properties | ||
self.assertEqual(self.connector._config, self.config) | ||
self.assertEqual(self.connector._engine, self.mock_engine) | ||
self.assertEqual(self.connector._connection, self.mock_connection) | ||
self.assertEqual(self.connector._cache_interval, 600) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArslanSaleem agree, let's add it to also increase the coverage!
tests/connectors/test_databricks.py
Outdated
expected_repr = ( | ||
"<DatabricksConnector dialect=databricks token=token " | ||
"host=ehxzojy-ue47135 port=443 " | ||
"database=SNOWFLAKE_SAMPLE_DATA httpPath=/sql/1.0/warehouses/1241rsa32" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover, let's use DATABRICKS
def test_columns_count_property(self): | ||
# Test columns_count property | ||
self.connector._columns_count = None | ||
mock_df = Mock() | ||
mock_df.columns = ["Column1", "Column2"] | ||
self.connector.head = Mock(return_value=mock_df) | ||
columns_count = self.connector.columns_count | ||
self.assertEqual(columns_count, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArslanSaleem let's add it!
def test_column_hash_property(self): | ||
# Test column_hash property | ||
mock_df = Mock() | ||
mock_df.columns = ["Column1", "Column2"] | ||
self.connector.head = Mock(return_value=mock_df) | ||
column_hash = self.connector.column_hash | ||
self.assertIsNotNone(column_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArslanSaleem let's try (maybe with a hardcoded value) the correctness of it!
@ArslanSaleem also, let's it also to the documentation (connectors.md)! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 10
Files ignored due to filter (2)
- poetry.lock
- pyproject.toml
Files selected for processing (8)
- examples/from_databricks.py (1 hunks)
- pandasai/connectors/base.py (3 hunks)
- pandasai/connectors/databricks.py (1 hunks)
- pandasai/connectors/snowflake.py (4 hunks)
- pandasai/connectors/sql.py (8 hunks)
- tests/connectors/test_databricks.py (1 hunks)
- tests/connectors/test_snowflake.py (4 hunks)
- tests/connectors/test_sql.py (4 hunks)
Files skipped from review due to trivial changes (1)
- tests/connectors/test_databricks.py
Additional comments (Suppressed): 17
pandasai/connectors/base.py (3)
67-76: The new
DatabricksConnectorConfig
class extendsSQLBaseConnectorConfig
and adds specific properties for Databricks connection such ashost
,port
,token
, andhttpPath
. Ensure that these properties are correctly used in theDatabricksConnector
class and that they match the expected configuration parameters for a Databricks connection.110-126: The
_populate_config_from_env
method populates the configuration dictionary with values from environment variables if they do not exist in the config. This is a good practice as it allows for flexibility in configuration management and adheres to the twelve-factor app methodology. However, ensure that sensitive data like tokens or passwords are securely managed when using environment variables.128-130: The
_init_connection
method signature has been changed to include aconfig
parameter of typeBaseConnectorConfig
. Make sure all calls to this method throughout the codebase have been updated to match the new signature.pandasai/connectors/snowflake.py (4)
2-9: The import statement for
os
has been removed. If the environment variables are still being used elsewhere in the code, this could cause a NameError.18-18: The function signature of
__init__
has changed from accepting aSnowFlakeConnectorConfig
to accepting aUnion[SQLConnectorConfig, dict]
. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.24-38: The method of populating the configuration from environment variables has been refactored. Instead of checking and assigning each variable individually, a dictionary of environment variable names is created and passed to a
_populate_config_from_env
method. This change improves the maintainability and readability of the code by reducing redundancy and complexity.85-91: The
__repr__
method no longer includes the username and password in its output. This is a good security practice as it prevents sensitive information from being accidentally logged or displayed.tests/connectors/test_snowflake.py (1)
- 9-13: The patch for
pandasai.connectors.sql.sql
has been removed in the new hunk. If this is intentional, ensure that all dependencies on this mock within the test cases have been properly updated or removed.pandasai/connectors/databricks.py (1)
- 49-50: As previously discussed, the connection string is being created without any sanitization or encoding of the input parameters. This could potentially lead to security issues such as SQL injection attacks if the config parameters are user-supplied. Consider using a safer method to construct the connection string, such as URL encoding the parameters or using a library function designed for this purpose.
from urllib.parse import quote_plus safe_token = quote_plus(config.token) safe_host = quote_plus(config.host) safe_port = quote_plus(str(config.port)) safe_httpPath = quote_plus(config.httpPath) self._engine = create_engine( f"{config.dialect}://token:{safe_token}@{safe_host}:{safe_port}?http_path={safe_httpPath}" )pandasai/connectors/sql.py (4)
12-14: The import of
sql
fromsqlalchemy
has been removed. Ensure that this does not affect any other parts of the codebase wheresql
might be used.95-101: The
__repr__
method no longer includes the username and password in its output. This is a good practice for security reasons as it prevents sensitive information from being logged or displayed inadvertently.389-390: The
__init__
methods of theMySQLConnector
andPostgreSQLConnector
classes now accept a dictionary in addition to an instance ofSQLConnectorConfig
. This change provides more flexibility in how the configuration can be provided. However, ensure that all calls to these constructors throughout the codebase have been updated to match the new signatures.399-407: The environment variable loading logic has been refactored into a separate
_populate_config_from_env
method. This change improves modularity and reduces code duplication. However, ensure that the_populate_config_from_env
method correctly handles cases where the environment variables are not set or are set to invalid values.tests/connectors/test_sql.py (4)
9-13: The patch for "pandasai.connectors.sql.sql" has been removed in the new hunk. If this is intentional and the SQL module is no longer used in the setup, then this change is fine. However, if the SQL module is still being used in the setup or elsewhere in the tests, this could cause issues. Please verify that the removal of this patch does not affect the functionality of the tests.
33-46: The constructor test now includes checks for
_load_connector_config
and_init_connection
methods to be called during the object creation. This is a good practice as it ensures that these important methods are indeed invoked when an instance ofSQLConnector
is created. No changes needed here.48-54: The expected representation string in the
__repr__
method test has changed. The username and password fields have been removed from the expected string. If this is due to a change in the__repr__
method of theSQLConnector
class where it no longer includes the username and password in its representation, then this change is fine. However, please verify that this change aligns with the current implementation of the__repr__
method.82-89: The test for the
columns_count
property has been modified. Instead of mocking a return value forself.mock_connection.execute.return_value.fetchone
, it now mocks a return value forself.connector.head
. This change seems to reflect a modification in thecolumns_count
property implementation, where it now relies on thehead
method instead of directly executing a SQL query. Please ensure that this change aligns with the current implementation of thecolumns_count
property.
@@ -3,6 +3,7 @@ | |||
""" | |||
|
|||
from abc import ABC, abstractmethod | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The os
module is imported but not used in the provided hunk. If it's not used elsewhere in the file, consider removing this import to keep the code clean and efficient.
- import os
@patch("pandasai.connectors.SnowFlakeConnector._load_connector_config") | ||
@patch("pandasai.connectors.SnowFlakeConnector._init_connection") | ||
def test_constructor_and_properties( | ||
self, mock_load_connector_config, mock_init_connection | ||
): | ||
# Test constructor and properties | ||
self.assertEqual(self.connector._config, self.config) | ||
self.assertEqual(self.connector._engine, self.mock_engine) | ||
self.assertEqual(self.connector._connection, self.mock_connection) | ||
self.assertEqual(self.connector._cache_interval, 600) | ||
SnowFlakeConnector(self.config) | ||
mock_load_connector_config.assert_called() | ||
mock_init_connection.assert_called() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor of SnowFlakeConnector
is now being tested with two additional mocked methods: _load_connector_config
and _init_connection
. Ensure these methods are not required to be tested separately as their functionality is now covered under the constructor test.
def test_columns_count_property(self): | ||
# Test columns_count property | ||
self.connector._columns_count = None | ||
self.mock_connection.execute.return_value.fetchone.return_value = ( | ||
8, | ||
) # Sample columns count | ||
mock_df = Mock() | ||
mock_df.columns = ["Column1", "Column2"] | ||
self.connector.head = Mock(return_value=mock_df) | ||
columns_count = self.connector.columns_count | ||
self.assertEqual(columns_count, 8) | ||
self.assertEqual(columns_count, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method to test columns_count
property has been changed. Previously, it was directly fetching the count from the database, but now it's using the head
method to get a dataframe and then counting the columns. This change might affect the performance if the table has a large number of rows. Consider reverting back to the old method if performance is a concern.
self.connector.head = Mock(return_value=mock_df) | ||
column_hash = self.connector.column_hash | ||
self.assertIsNotNone(column_hash) | ||
self.assertEqual( | ||
column_hash, | ||
"ea6a80582b83e1511f8be83412b13e7b86d20c45b96fcf9731f3b99dc3b568aa", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expected value of column_hash
is hardcoded in the test. This could lead to test failures if the implementation of generating the hash changes. Consider mocking the method used to generate the hash or dynamically generating the expected hash based on the mock data.
return ( | ||
f"<{self.__class__.__name__} dialect={self._config.dialect} " | ||
f"host={self._config.host} port={self._config.port} " | ||
f"database={self._config.database} httpPath={str(self._config.httpPath)}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __repr__
method is revealing potentially sensitive information such as the host and port. It's generally not a good practice to expose such details. Consider removing these from the representation.
- return (
- f"<{self.__class__.__name__} dialect={self._config.dialect} "
- f"host={self._config.host} port={self._config.port} "
- f"database={self._config.database} httpPath={str(self._config.httpPath)}"
- )
+ return f"<{self.__class__.__name__} database={self._config.database}>"
) | ||
|
||
# Run a SQL query to get the number of rows | ||
query = sql.text( | ||
"SELECT COUNT(*) FROM information_schema.columns " | ||
"WHERE table_name = :table_name" | ||
).bindparams(table_name=self._config.table) | ||
query = select(text("COUNT(*)")).select_from(text(self._config.table)) | ||
|
||
# Return the number of rows | ||
self._rows_count = self._connection.execute(query).fetchone()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SQL query to get the number of rows has been simplified. The new query directly counts the rows in the table, which should be more efficient than the previous approach of counting columns in the information_schema.columns
table. However, ensure that the user executing this query has sufficient permissions on the target table.
f"{self._config.dialect}" | ||
) | ||
|
||
# Run a SQL query to get the number of columns | ||
query = sql.text( | ||
"SELECT COUNT(*) FROM information_schema.columns " | ||
f"WHERE table_name = '{self._config.table}'" | ||
) | ||
|
||
# Return the number of columns | ||
self._columns_count = self._connection.execute(query).fetchone()[0] | ||
self._columns_count = len(self.head().columns) | ||
return self._columns_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The columns_count
method now uses the len
function on the columns returned by the head
method instead of running a SQL query. This change could improve performance by reducing the number of database queries. However, make sure that the head
method always returns the correct and complete set of columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Files selected for processing (1)
- pandasai/connectors/snowflake.py (4 hunks)
Additional comments (Suppressed): 4
pandasai/connectors/snowflake.py (4)
2-7: The import statement for
os
module has been removed. If the environment variables are being used elsewhere in the code, this could cause a NameError.18-18: The function signature of
__init__
method has been changed to accept either aSnowFlakeConnectorConfig
instance or a dictionary as the configuration parameter. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.24-38: The way environment variables are loaded into the config has been refactored. This is a good change as it reduces redundancy and makes the code more maintainable. However, ensure that the
_populate_config_from_env
method correctly populates the config from environment variables.85-91: The
__repr__
method no longer includes the username and password in its output. This is a good security practice as it prevents sensitive information from being logged or displayed inadvertently.
Summary by CodeRabbit
DatabricksConnector
to connect with Databricks Data Cloud, expanding the range of supported data sources.SnowFlakeConnector
andSQLConnector
classes for better configuration handling and security. Passwords are no longer displayed in string representations.DatabricksConnector
class ensuring its functionality and behavior.rows_count
andcolumns_count
methods for improved readability and performance.