feat[DatabricksConnector]: connector to connect to Databricks on Cloud #580

ArslanSaleem · 2023-09-20T14:47:39Z

Summary by CodeRabbit

New Feature: Introduced DatabricksConnector to connect with Databricks Data Cloud, expanding the range of supported data sources.
Refactor: Updated SnowFlakeConnector and SQLConnector classes for better configuration handling and security. Passwords are no longer displayed in string representations.
Test: Added comprehensive unit tests for the new DatabricksConnector class ensuring its functionality and behavior.
Refactor: Simplified SQL queries in rows_count and columns_count methods for improved readability and performance.
New Feature: Added support for environment variables in connector configurations, providing more flexibility in setting up connections.

coderabbitai · 2023-09-20T14:47:46Z

Walkthrough

This update introduces the DatabricksConnector to the PandasAI library, enabling users to connect to Databricks Data Cloud. It also refactors existing connectors and enhances test coverage. The changes improve code maintainability, security by excluding sensitive data from string representations, and performance by optimizing SQL queries.

Changes

File(s)	Summary
`examples/from_databricks.py`	Demonstrates usage of PandasAI with Snowflake.
`pandasai/connectors/.../base.py`, `pandasai/connectors/.../databricks.py`	Introduces `DatabricksConnectorConfig` and `DatabricksConnector` classes for Databricks support.
`pandasai/connectors/.../sql.py`, `pandasai/connectors/.../snowflake.py`	Refactors SQL and Snowflake connectors, optimizes SQL queries, and improves security in string representations.
`tests/connectors/.../test_databricks.py`	Adds unit tests for `DatabricksConnector`.
`tests/connectors/.../test_snowflake.py`, `tests/connectors/.../test_sql.py`	Enhances test coverage for SQL and Snowflake connectors.

🐇💻

Code hopping through the fields,

Connectors grow, our project yields.

Databricks joins the merry band,

With SQL queries close at hand.

Tests ensure we're on the right track,

As we code, there's no looking back!

So here's to progress, swift and bright,

Coding rabbits, take flight! 🚀🌙

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

Mention @coderabbitai in any review comment for bot assistance.
Note: Review comments are made on code diffs or files, not on the PR overview.

Pause Incremental Reviews

Insert @coderabbitai: ignore in the PR description to halt the bot's ongoing reviews. Remove the line to resume.

coderabbitai

Review Status

Actionable comments generated: 12

Commits

Files that changed from the base of the PR and between 7738feb and 70c71ed.

Files ignored due to filter (2)

poetry.lock
pyproject.toml

Files selected for processing (8)

examples/from_databricks.py (1 hunks)
pandasai/connectors/init.py (2 hunks)
pandasai/connectors/base.py (1 hunks)
pandasai/connectors/databricks.py (1 hunks)
pandasai/connectors/sql.py (2 hunks)
tests/connectors/test_databricks.py (1 hunks)
tests/connectors/test_snowflake.py (1 hunks)
tests/connectors/test_sql.py (1 hunks)

Files skipped from review due to trivial changes (1)

pandasai/connectors/init.py

Additional comments (Suppressed): 8

examples/from_databricks.py (2)

1-27: The code snippet is an example of how to use the new DatabricksConnector class. It creates a DatabricksConnector instance with a configuration, initializes an OpenAI instance, and uses these to create a SmartDataframe. The chat method is then called on the SmartDataframe instance and the response is printed.

Please ensure that the token and other sensitive information are not hardcoded in the script but rather fetched from secure sources or environment variables. Hardcoding sensitive information can lead to security vulnerabilities.

8-20: Ensure that the host, database, token, port, table, httpPath, and where clause provided in the config dictionary are correct and valid for your Databricks setup. Invalid configurations could lead to connection failures or incorrect data retrieval.

pandasai/connectors/base.py (2)

66-75: The DatabricksConnectorConfig class is introduced to hold the configuration for a Databricks connection. Ensure that all necessary parameters are included and correctly named according to the Databricks API documentation. Also, consider adding default values where applicable.

77-79: No changes have been made to the BaseConnector class in this hunk. The class continues to serve as an abstract base class for all connectors.

tests/connectors/test_sql.py (2)

76-83: The new implementation of test_columns_count_property is more efficient and less error-prone than the previous one. Instead of executing a SQL query to get the number of columns, it now simply calls the len function on the columns returned by the head method. This change simplifies the code and reduces the risk of SQL injection attacks. However, ensure that the head method always returns a DataFrame object with a columns attribute.

79-81: This mock setup for the DataFrame object and its columns is a good practice for unit testing. It isolates the test from external dependencies and makes it deterministic.

tests/connectors/test_snowflake.py (2)

76-83: The new implementation of test_columns_count_property is more efficient and less error-prone than the previous one. Instead of executing a SQL query to fetch the column count, it now directly uses the len function on the columns returned by the head method. This change reduces the complexity of the code and makes it easier to understand and maintain. However, ensure that the head method reliably returns a DataFrame with the correct columns.

79-81: Mocking the head method of the connector to return a DataFrame with two columns is a good way to isolate the test from external dependencies. This ensures that the test only verifies the functionality of the columns_count property and does not fail due to issues in the head method or the database connection.

coderabbitai · 2023-09-20T14:50:03Z

tests/connectors/test_databricks.py

+        self.config = DatabricksConnectorConfig(
+            dialect="databricks",
+            host="ehxzojy-ue47135",
+            port=443,
+            token="token",
+            database="SNOWFLAKE_SAMPLE_DATA",
+            httpPath="/sql/1.0/warehouses/1241rsa32",
+            table="lineitem",
+            where=[["column_name", "=", "value"]],
+        ).dict()


The DatabricksConnectorConfig instance is created with hardcoded values. This might not be a good practice for unit tests as it reduces flexibility and reusability of the test case. Consider using setUpClass method or a fixture to set these values, which can then be overridden in individual test cases if needed.

coderabbitai · 2023-09-20T14:50:03Z

tests/connectors/test_databricks.py

+    def test_constructor_and_properties(self):
+        # Test constructor and properties
+        self.assertEqual(self.connector._config, self.config)
+        self.assertEqual(self.connector._engine, self.mock_engine)
+        self.assertEqual(self.connector._connection, self.mock_connection)
+        self.assertEqual(self.connector._cache_interval, 600)
+


The test test_constructor_and_properties checks the properties of the DatabricksConnector object after its creation. However, it does not verify if the _load_connector_config method was called during the initialization. This is an important part of the initialization process and should be tested.

self.assertTrue(self.connector._load_connector_config.called)

@ArslanSaleem agree, let's add it to also increase the coverage!

Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.

tests/connectors/test_databricks.py

coderabbitai · 2023-09-20T14:50:03Z

tests/connectors/test_databricks.py

+    def test_columns_count_property(self):
+        # Test columns_count property
+        self.connector._columns_count = None
+        mock_df = Mock()
+        mock_df.columns = ["Column1", "Column2"]
+        self.connector.head = Mock(return_value=mock_df)
+        columns_count = self.connector.columns_count
+        self.assertEqual(columns_count, 2)


The tests test_rows_count_property and test_columns_count_property do not check if the head method was called when the _rows_count and _columns_count properties are None. This is an important part of the logic and should be tested.

self.assertTrue(self.connector.head.called)

@ArslanSaleem let's add it!

Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.

pandasai/connectors/sql.py

coderabbitai · 2023-09-20T14:50:04Z

pandasai/connectors/databricks.py

+    SnowFlake connectors are used to connect to SnowFlake Data Cloud.
+    """


The class docstring seems to be copied from the SnowFlake connector and not updated. It should reflect that this is a Databricks connector.

- """ - SnowFlake connectors are used to connect to SnowFlake Data Cloud. - """ + """ + Databricks connectors are used to connect to Databricks SQL Warehouse on Azure, AWS, and GCP. + """

coderabbitai · 2023-09-20T14:50:04Z

pandasai/connectors/databricks.py

+        Initialize the SnowFlake connector with the given configuration.
+
+        Args:
+            config (ConnectorConfig): The configuration for the SnowFlake connector.
+        """


Again, the docstring for the __init__ method refers to SnowFlake instead of Databricks. This needs to be corrected.

- Initialize the SnowFlake connector with the given configuration. + Initialize the Databricks connector with the given configuration. - config (ConnectorConfig): The configuration for the SnowFlake connector. + config (DatabricksConnectorConfig): The configuration for the Databricks connector.

coderabbitai · 2023-09-20T14:50:04Z

pandasai/connectors/databricks.py

+        if "database" not in config and os.getenv("SNOWFLAKE_DATABASE"):
+            config["database"] = os.getenv("SNOWFLAKE_DATABASE")


The environment variable being checked here is for Snowflake, not Databricks. This should be corrected to check for a Databricks-specific environment variable.

- if "database" not in config and os.getenv("SNOWFLAKE_DATABASE"): - config["database"] = os.getenv("SNOWFLAKE_DATABASE") + if "database" not in config and os.getenv("DATABRICKS_DATABASE"): + config["database"] = os.getenv("DATABRICKS_DATABASE")

coderabbitai · 2023-09-20T14:50:04Z

pandasai/connectors/databricks.py

+        Return the string representation of the SnowFlake connector.
+
+        Returns:
+        str: The string representation of the SnowFlake connector.
+        """


The docstring for the __repr__ method also refers to SnowFlake instead of Databricks. This needs to be corrected.

- Return the string representation of the SnowFlake connector. + Return the string representation of the Databricks connector. - str: The string representation of the SnowFlake connector. + str: The string representation of the Databricks connector.

codecov-commenter · 2023-09-20T14:50:11Z

Codecov Report

Merging #580 (637348a) into main (9d55463) will increase coverage by 0.67%.
Report is 4 commits behind head on main.
The diff coverage is 84.44%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##             main     #580      +/-   ##
==========================================
+ Coverage   83.22%   83.89%   +0.67%     
==========================================
  Files          54       55       +1     
  Lines        2682     2689       +7     
==========================================
+ Hits         2232     2256      +24     
+ Misses        450      433      -17

Files Changed	Coverage Δ
pandasai/connectors/sql.py	`64.81% <45.45%> (+4.92%)`	⬆️
pandasai/connectors/base.py	`87.05% <90.90%> (+0.57%)`	⬆️
pandasai/connectors/__init__.py	`100.00% <100.00%> (ø)`
pandasai/connectors/databricks.py	`100.00% <100.00%> (ø)`
pandasai/connectors/snowflake.py	`96.15% <100.00%> (+15.59%)`	⬆️

... and 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

coderabbitai

Review Status

Actionable comments generated: 3

Commits

Files that changed from the base of the PR and between 70c71ed and 358c612.

Files selected for processing (1)

pandasai/connectors/databricks.py (1 hunks)

pandasai/connectors/databricks.py

coderabbitai · 2023-09-20T14:54:10Z

pandasai/connectors/databricks.py

+
+        """
+        self._engine = create_engine(
+            f"{config.dialect}://token:{config.token}@{config.host}:{config.port}?http_path={config.httpPath}"


The connection string is being created without any sanitization or encoding of the input parameters. This could potentially lead to security issues such as SQL injection attacks if the config parameters are user-supplied. Consider using a safer method to construct the connection string, such as URL encoding the parameters or using a library function designed for this purpose.

@coderabbitai, could you make an example?

@gventuri, sure! Here's an example of how you could safely construct the connection string using URL encoding:

from urllib.parse import quote_plus class DatabricksConnector(SQLConnector): # ... def _init_connection(self, config: DatabricksConnectorConfig): """ Initialize Database Connection Args: config (DatabricksConnectorConfig): Configurations to load database """ safe_token = quote_plus(config.token) safe_host = quote_plus(config.host) safe_port = quote_plus(config.port) safe_httpPath = quote_plus(config.httpPath) self._engine = create_engine( f"{config.dialect}://token:{safe_token}@{safe_host}:{safe_port}?http_path={safe_httpPath}" ) self._connection = self._engine.connect()

In this example, I've used Python's built-in urllib.parse.quote_plus function to URL encode the input parameters. This will ensure that any special characters in the parameters are properly escaped, preventing potential SQL injection attacks.

coderabbitai · 2023-09-20T14:54:11Z

pandasai/connectors/databricks.py

+        return (
+            f"<{self.__class__.__name__} dialect={self._config.dialect} "
+            f"token={self._config.token} "
+            f"host={self._config.host} port={self._config.port} "
+            f"database={self._config.database} httpPath={str(self._config.httpPath)}"


Exposing sensitive information like tokens in the string representation of the object can be a security risk. If this object is ever logged or printed in a debugging statement, it could expose the token to unauthorized users. Consider removing the token from the string representation.

@ArslanSaleem let's not log the sensitive info!

Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.

gventuri

Great job, only a couple of minor comments!

gventuri · 2023-09-20T16:34:50Z

pandasai/connectors/sql.py

+        query = sql.text(f"SELECT COUNT(*) FROM {self._config.table}")

        # Return the number of rows
        self._rows_count = self._connection.execute(query).fetchone()[0]


Unfortunately this could create a potential attack. We are using bindparams to validate the table name. Also the suggestion from @coderabbitai might work!

examples/from_databricks.py

gventuri · 2023-09-20T16:38:46Z

pandasai/connectors/databricks.py

+    Databricks connectors are used to connect to Databricks Data Cloud.
+    """
+
+    def __init__(self, config: DatabricksConnectorConfig):


This accepts also a dictionary, Union[DatabricksConnectorConfig, dict]. In that case, we need to convert it to a DatabricksConnectorConfig.

pandasai/connectors/databricks.py

gventuri · 2023-09-20T16:40:40Z

pandasai/connectors/databricks.py

+
+        """
+        self._engine = create_engine(
+            f"{config.dialect}://token:{config.token}@{config.host}:{config.port}?http_path={config.httpPath}"


@coderabbitai, could you make an example?

gventuri · 2023-09-20T16:44:09Z

tests/connectors/test_databricks.py

+            host="ehxzojy-ue47135",
+            port=443,
+            token="token",
+            database="SNOWFLAKE_SAMPLE_DATA",


Let's replace with DATABRICKS

gventuri · 2023-09-20T16:44:50Z

tests/connectors/test_databricks.py

+    def test_constructor_and_properties(self):
+        # Test constructor and properties
+        self.assertEqual(self.connector._config, self.config)
+        self.assertEqual(self.connector._engine, self.mock_engine)
+        self.assertEqual(self.connector._connection, self.mock_connection)
+        self.assertEqual(self.connector._cache_interval, 600)
+


@ArslanSaleem agree, let's add it to also increase the coverage!

gventuri · 2023-09-20T16:45:16Z

tests/connectors/test_databricks.py

+        expected_repr = (
+            "<DatabricksConnector dialect=databricks token=token "
+            "host=ehxzojy-ue47135 port=443 "
+            "database=SNOWFLAKE_SAMPLE_DATA httpPath=/sql/1.0/warehouses/1241rsa32"


Leftover, let's use DATABRICKS

gventuri · 2023-09-20T16:46:12Z

tests/connectors/test_databricks.py

+    def test_columns_count_property(self):
+        # Test columns_count property
+        self.connector._columns_count = None
+        mock_df = Mock()
+        mock_df.columns = ["Column1", "Column2"]
+        self.connector.head = Mock(return_value=mock_df)
+        columns_count = self.connector.columns_count
+        self.assertEqual(columns_count, 2)


@ArslanSaleem let's add it!

gventuri · 2023-09-20T16:46:49Z

tests/connectors/test_databricks.py

+    def test_column_hash_property(self):
+        # Test column_hash property
+        mock_df = Mock()
+        mock_df.columns = ["Column1", "Column2"]
+        self.connector.head = Mock(return_value=mock_df)
+        column_hash = self.connector.column_hash
+        self.assertIsNotNone(column_hash)


@ArslanSaleem let's try (maybe with a hardcoded value) the correctness of it!

gventuri · 2023-09-20T22:59:26Z

@ArslanSaleem also, let's it also to the documentation (connectors.md)!

coderabbitai

Review Status

Actionable comments generated: 10

Commits

Files that changed from the base of the PR and between 358c612 and 6ef5416.

Files ignored due to filter (2)

poetry.lock
pyproject.toml

Files selected for processing (8)

examples/from_databricks.py (1 hunks)
pandasai/connectors/base.py (3 hunks)
pandasai/connectors/databricks.py (1 hunks)
pandasai/connectors/snowflake.py (4 hunks)
pandasai/connectors/sql.py (8 hunks)
tests/connectors/test_databricks.py (1 hunks)
tests/connectors/test_snowflake.py (4 hunks)
tests/connectors/test_sql.py (4 hunks)

Files skipped from review due to trivial changes (1)

tests/connectors/test_databricks.py

Additional comments (Suppressed): 17

pandasai/connectors/base.py (3)

67-76: The new DatabricksConnectorConfig class extends SQLBaseConnectorConfig and adds specific properties for Databricks connection such as host, port, token, and httpPath. Ensure that these properties are correctly used in the DatabricksConnector class and that they match the expected configuration parameters for a Databricks connection.

110-126: The _populate_config_from_env method populates the configuration dictionary with values from environment variables if they do not exist in the config. This is a good practice as it allows for flexibility in configuration management and adheres to the twelve-factor app methodology. However, ensure that sensitive data like tokens or passwords are securely managed when using environment variables.

128-130: The _init_connection method signature has been changed to include a config parameter of type BaseConnectorConfig. Make sure all calls to this method throughout the codebase have been updated to match the new signature.

pandasai/connectors/snowflake.py (4)

2-9: The import statement for os has been removed. If the environment variables are still being used elsewhere in the code, this could cause a NameError.

18-18: The function signature of __init__ has changed from accepting a SnowFlakeConnectorConfig to accepting a Union[SQLConnectorConfig, dict]. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

24-38: The method of populating the configuration from environment variables has been refactored. Instead of checking and assigning each variable individually, a dictionary of environment variable names is created and passed to a _populate_config_from_env method. This change improves the maintainability and readability of the code by reducing redundancy and complexity.

85-91: The __repr__ method no longer includes the username and password in its output. This is a good security practice as it prevents sensitive information from being accidentally logged or displayed.

tests/connectors/test_snowflake.py (1)

9-13: The patch for pandasai.connectors.sql.sql has been removed in the new hunk. If this is intentional, ensure that all dependencies on this mock within the test cases have been properly updated or removed.
pandasai/connectors/databricks.py (1)
49-50: As previously discussed, the connection string is being created without any sanitization or encoding of the input parameters. This could potentially lead to security issues such as SQL injection attacks if the config parameters are user-supplied. Consider using a safer method to construct the connection string, such as URL encoding the parameters or using a library function designed for this purpose.
from urllib.parse import quote_plus

safe_token = quote_plus(config.token)
safe_host = quote_plus(config.host)
safe_port = quote_plus(str(config.port))
safe_httpPath = quote_plus(config.httpPath)

self._engine = create_engine(
    f"{config.dialect}://token:{safe_token}@{safe_host}:{safe_port}?http_path={safe_httpPath}"
)
pandasai/connectors/sql.py (4)

12-14: The import of sql from sqlalchemy has been removed. Ensure that this does not affect any other parts of the codebase where sql might be used.

95-101: The __repr__ method no longer includes the username and password in its output. This is a good practice for security reasons as it prevents sensitive information from being logged or displayed inadvertently.

389-390: The __init__ methods of the MySQLConnector and PostgreSQLConnector classes now accept a dictionary in addition to an instance of SQLConnectorConfig. This change provides more flexibility in how the configuration can be provided. However, ensure that all calls to these constructors throughout the codebase have been updated to match the new signatures.

399-407: The environment variable loading logic has been refactored into a separate _populate_config_from_env method. This change improves modularity and reduces code duplication. However, ensure that the _populate_config_from_env method correctly handles cases where the environment variables are not set or are set to invalid values.

tests/connectors/test_sql.py (4)

9-13: The patch for "pandasai.connectors.sql.sql" has been removed in the new hunk. If this is intentional and the SQL module is no longer used in the setup, then this change is fine. However, if the SQL module is still being used in the setup or elsewhere in the tests, this could cause issues. Please verify that the removal of this patch does not affect the functionality of the tests.

33-46: The constructor test now includes checks for _load_connector_config and _init_connection methods to be called during the object creation. This is a good practice as it ensures that these important methods are indeed invoked when an instance of SQLConnector is created. No changes needed here.

48-54: The expected representation string in the __repr__ method test has changed. The username and password fields have been removed from the expected string. If this is due to a change in the __repr__ method of the SQLConnector class where it no longer includes the username and password in its representation, then this change is fine. However, please verify that this change aligns with the current implementation of the __repr__ method.

82-89: The test for the columns_count property has been modified. Instead of mocking a return value for self.mock_connection.execute.return_value.fetchone, it now mocks a return value for self.connector.head. This change seems to reflect a modification in the columns_count property implementation, where it now relies on the head method instead of directly executing a SQL query. Please ensure that this change aligns with the current implementation of the columns_count property.

examples/from_databricks.py

coderabbitai · 2023-09-21T08:28:31Z

pandasai/connectors/base.py

@@ -3,6 +3,7 @@
 """

 from abc import ABC, abstractmethod
+import os


The os module is imported but not used in the provided hunk. If it's not used elsewhere in the file, consider removing this import to keep the code clean and efficient.

- import os

coderabbitai · 2023-09-21T08:28:31Z

tests/connectors/test_snowflake.py

+    @patch("pandasai.connectors.SnowFlakeConnector._load_connector_config")
+    @patch("pandasai.connectors.SnowFlakeConnector._init_connection")
+    def test_constructor_and_properties(
+        self, mock_load_connector_config, mock_init_connection
+    ):
        # Test constructor and properties
        self.assertEqual(self.connector._config, self.config)
        self.assertEqual(self.connector._engine, self.mock_engine)
        self.assertEqual(self.connector._connection, self.mock_connection)
        self.assertEqual(self.connector._cache_interval, 600)
+        SnowFlakeConnector(self.config)
+        mock_load_connector_config.assert_called()
+        mock_init_connection.assert_called()


The constructor of SnowFlakeConnector is now being tested with two additional mocked methods: _load_connector_config and _init_connection. Ensure these methods are not required to be tested separately as their functionality is now covered under the constructor test.

coderabbitai · 2023-09-21T08:28:31Z

tests/connectors/test_snowflake.py

    def test_columns_count_property(self):
        # Test columns_count property
        self.connector._columns_count = None
-        self.mock_connection.execute.return_value.fetchone.return_value = (
-            8,
-        )  # Sample columns count
+        mock_df = Mock()
+        mock_df.columns = ["Column1", "Column2"]
+        self.connector.head = Mock(return_value=mock_df)
        columns_count = self.connector.columns_count
-        self.assertEqual(columns_count, 8)
+        self.assertEqual(columns_count, 2)


The method to test columns_count property has been changed. Previously, it was directly fetching the count from the database, but now it's using the head method to get a dataframe and then counting the columns. This change might affect the performance if the table has a large number of rows. Consider reverting back to the old method if performance is a concern.

coderabbitai · 2023-09-21T08:28:31Z

tests/connectors/test_snowflake.py

        self.connector.head = Mock(return_value=mock_df)
        column_hash = self.connector.column_hash
        self.assertIsNotNone(column_hash)
+        self.assertEqual(
+            column_hash,
+            "ea6a80582b83e1511f8be83412b13e7b86d20c45b96fcf9731f3b99dc3b568aa",
+        )


The expected value of column_hash is hardcoded in the test. This could lead to test failures if the implementation of generating the hash changes. Consider mocking the method used to generate the hash or dynamically generating the expected hash based on the mock data.

pandasai/connectors/databricks.py

coderabbitai · 2023-09-21T08:28:32Z

pandasai/connectors/databricks.py

+        return (
+            f"<{self.__class__.__name__} dialect={self._config.dialect} "
+            f"host={self._config.host} port={self._config.port} "
+            f"database={self._config.database} httpPath={str(self._config.httpPath)}"
+        )


The __repr__ method is revealing potentially sensitive information such as the host and port. It's generally not a good practice to expose such details. Consider removing these from the representation.

- return ( - f"<{self.__class__.__name__} dialect={self._config.dialect} " - f"host={self._config.host} port={self._config.port} " - f"database={self._config.database} httpPath={str(self._config.httpPath)}" - ) + return f"<{self.__class__.__name__} database={self._config.database}>"

coderabbitai · 2023-09-21T08:28:32Z

pandasai/connectors/sql.py

            )

        # Run a SQL query to get the number of rows
-        query = sql.text(
-            "SELECT COUNT(*) FROM information_schema.columns "
-            "WHERE table_name = :table_name"
-        ).bindparams(table_name=self._config.table)
+        query = select(text("COUNT(*)")).select_from(text(self._config.table))

        # Return the number of rows
        self._rows_count = self._connection.execute(query).fetchone()[0]


The SQL query to get the number of rows has been simplified. The new query directly counts the rows in the table, which should be more efficient than the previous approach of counting columns in the information_schema.columns table. However, ensure that the user executing this query has sufficient permissions on the target table.

coderabbitai · 2023-09-21T08:28:32Z

pandasai/connectors/sql.py

                f"{self._config.dialect}"
            )

-        # Run a SQL query to get the number of columns
-        query = sql.text(
-            "SELECT COUNT(*) FROM information_schema.columns "
-            f"WHERE table_name = '{self._config.table}'"
-        )
-
-        # Return the number of columns
-        self._columns_count = self._connection.execute(query).fetchone()[0]
+        self._columns_count = len(self.head().columns)
        return self._columns_count


The columns_count method now uses the len function on the columns returned by the head method instead of running a SQL query. This change could improve performance by reducing the number of database queries. However, make sure that the head method always returns the correct and complete set of columns.

coderabbitai

Review Status

Actionable comments generated: 0

Commits

Files that changed from the base of the PR and between 6ef5416 and 637348a.

Files selected for processing (1)

pandasai/connectors/snowflake.py (4 hunks)

Additional comments (Suppressed): 4

pandasai/connectors/snowflake.py (4)

2-7: The import statement for os module has been removed. If the environment variables are being used elsewhere in the code, this could cause a NameError.

18-18: The function signature of __init__ method has been changed to accept either a SnowFlakeConnectorConfig instance or a dictionary as the configuration parameter. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

24-38: The way environment variables are loaded into the config has been refactored. This is a good change as it reduces redundancy and makes the code more maintainable. However, ensure that the _populate_config_from_env method correctly populates the config from environment variables.

85-91: The __repr__ method no longer includes the username and password in its output. This is a good security practice as it prevents sensitive information from being logged or displayed inadvertently.

#580)

ArslanSaleem added 2 commits September 20, 2023 15:07

fix: pre-commit changes

a7da1a5

Fix: Databricks connection issues

70c71ed

coderabbitai bot reviewed Sep 20, 2023

View reviewed changes

fix: typos

358c612

coderabbitai bot reviewed Sep 20, 2023

View reviewed changes

gventuri reviewed Sep 20, 2023

View reviewed changes

chore: fix and refactor connector code

6ef5416

coderabbitai bot reviewed Sep 21, 2023

View reviewed changes

fix: leftover

637348a

coderabbitai bot reviewed Sep 21, 2023

View reviewed changes

gventuri changed the title ~~Feat[DatabricksConnector]: Connector to connect to Databricks on Cloud~~ feat[DatabricksConnector]: connector to connect to Databricks on Cloud Sep 21, 2023

gventuri merged commit 325fed6 into main Sep 21, 2023
9 checks passed

gventuri pushed a commit that referenced this pull request Sep 25, 2023

feat[DatabricksConnector]: connector to connect to Databricks on Cloud (

a8e5afb

#580)

		SnowFlake connectors are used to connect to SnowFlake Data Cloud.
		"""

		if "database" not in config and os.getenv("SNOWFLAKE_DATABASE"):
		config["database"] = os.getenv("SNOWFLAKE_DATABASE")

feat[DatabricksConnector]: connector to connect to Databricks on Cloud #580

feat[DatabricksConnector]: connector to connect to Databricks on Cloud #580

Conversation

ArslanSaleem commented Sep 20, 2023 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Sep 20, 2023 • edited Loading

Walkthrough

Changes

Chat with CodeRabbit Bot (@coderabbitai)

Pause Incremental Reviews

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Sep 20, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Sep 20, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Sep 20, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 20, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 20, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 20, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 20, 2023

Choose a reason for hiding this comment

codecov-commenter commented Sep 20, 2023 • edited Loading

Codecov Report

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Sep 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Sep 20, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Sep 20, 2023

Choose a reason for hiding this comment

gventuri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gventuri commented Sep 20, 2023

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Sep 21, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 21, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 21, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 21, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 21, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 21, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 21, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

ArslanSaleem commented Sep 20, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 20, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

coderabbitai bot Sep 20, 2023 •

edited

Loading

coderabbitai bot Sep 20, 2023 •

edited

Loading

codecov-commenter commented Sep 20, 2023 •

edited

Loading

coderabbitai bot Sep 20, 2023 •

edited

Loading

coderabbitai bot Sep 20, 2023 •

edited

Loading