fix(sql): allow paramerized query through sql sanitization #1576

ArslanSaleem · 2025-01-31T18:23:53Z

Important

Enable parameterized queries and suppress SQLAlchemy warnings, with updated tests for query safety and import handling.

Behavior:
- Allow parameterized queries by replacing '%s' with a placeholder in is_sql_query_safe() in sql_sanitizer.py.
- Suppress SQLAlchemy warnings in load_from_mysql(), load_from_postgres(), and load_from_cockroachdb() in pandasai_sql/__init__.py.
- Raise ImportError in execute_query() in sql_loader.py if connector is missing.
Tests:
- Add test_safe_query_with_query_params() in test_sql_sanitizer.py to verify parameterized query safety.
- Add test_mysql_malicious_with_no_import() in test_sql_loader.py to check handling of missing imports.
- Update other tests in test_sql_loader.py to cover new behavior.

^{This description was created by}^{for 8b29e2c. It will automatically update as commits are pushed.}

ellipsis-dev

❌ Changes requested. Reviewed everything up to 51dac82 in 56 seconds

More details

Looked at 133 lines of code in 5 files
Skipped 0 files when reviewing.
Skipped posting 6 drafted comments based on config settings.

1. pandasai/helpers/sql_sanitizer.py:62

Draft comment:
The placeholder ___PLACEHOLDER___ used for parameterized queries might not be unique enough and could conflict with actual query content. Consider using a more unique pattern or a UUID to ensure no conflicts.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:

The placeholder is only used internally within this function. 2. It's only used for parsing, not query execution. 3. The chance of this exact string appearing in a legitimate SELECT query is extremely low. 4. Even if it did appear, it would only affect the parsing step, not the actual query execution. 5. The current implementation is simple and effective for its purpose.
The comment raises a theoretical edge case that could occur if someone had this exact string in their query. We should consider if this could cause false positives in the safety check.
The risk is extremely low, and even if the unlikely collision occurred, it would only potentially cause a false negative (marking a safe query as unsafe), not a security vulnerability.
The comment raises a theoretical concern that has minimal real-world impact and doesn't require action. The current implementation is sufficient for its purpose.

2. extensions/connectors/sql/pandasai_sql/__init__.py:22

Draft comment:
The comment mentions that the warning suppression can be removed when SQLAlchemy is used, but SQLAlchemy is already being used here. Please clarify or update the comment to reflect the correct context.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment author misunderstood the context. While pandas' read_sql does use SQLAlchemy internally, the code is not directly using SQLAlchemy for database connections. The TODO comment is actually referring to a future migration to use SQLAlchemy directly instead of pymysql/psycopg2. The comment author's confusion makes their feedback incorrect.
Am I certain about the intention behind the TODO comment? Could there be another SQLAlchemy-related issue I'm not aware of?
While I can't be 100% certain of the intention, the code clearly shows direct database driver usage rather than SQLAlchemy, making the comment author's interpretation incorrect.
The comment should be deleted as it's based on a misunderstanding of how SQLAlchemy is currently being used in the code.

3. extensions/connectors/sql/pandasai_sql/__init__.py:41

Draft comment:
The comment mentions that the warning suppression can be removed when SQLAlchemy is used, but SQLAlchemy is already being used here. Please clarify or update the comment to reflect the correct context. (Also applicable to load_from_cockroachdb)
Reason this comment was not posted:
Marked as duplicate.

4. extensions/connectors/sql/pandasai_sql/__init__.py:60

Draft comment:
The comment mentions that the warning suppression can be removed when SQLAlchemy is used, but SQLAlchemy is already being used here. Please clarify or update the comment to reflect the correct context.
Reason this comment was not posted:
Marked as duplicate.

5. extensions/connectors/sql/pandasai_sql/__init__.py:22

Draft comment:
The comment regarding the suppression of warnings is not grammatically correct. Consider rephrasing it for clarity:

# TODO - This can be removed later when SqlAlchemy is used

Reason this comment was not posted:
Confidence changes required: 50%
The comment in the code regarding the suppression of warnings is not grammatically correct and could be improved for clarity.

6. tests/unit_tests/helpers/test_sql_sanitizer.py:87

Draft comment:
The comment for the test case test_safe_query_with_query_params should be updated to reflect the purpose of the test more clearly:

assert is_sql_query_safe(query)  # Test query with parameters is considered safe

Reason this comment was not posted:
Confidence changes required: 50%
The test case test_safe_query_with_query_params is correctly added to test the new functionality of handling query parameters. However, the comment should be updated to reflect the purpose of the test more clearly.

Workflow ID: wflow_291HVouRuaS9bDlK

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2025-01-31T18:24:58Z

pandasai/data_loader/sql_loader.py

+
+        except ModuleNotFoundError as e:
+            raise ImportError(
+                f"{source_type.capitalize()} connector not found. Please install the pandasai_sql[{source_type}] library.e.g. `pip install pandasai_sql[{source_type}]`"


The error message in the ModuleNotFoundError exception handling is not grammatically correct. Consider rephrasing it for clarity:

Suggested change

f"{source_type.capitalize()} connector not found. Please install the pandasai_sql[{source_type}] library.e.g. `pip install pandasai_sql[{source_type}]`"

f"{source_type.capitalize()} connector not found. Please install the pandasai_sql[{source_type}] library, e.g., `pip install pandasai_sql[{source_type}]`."

codecov · 2025-01-31T18:25:14Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.76%. Comparing base (4ca228f) to head (8b29e2c).
Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1576      +/-   ##
==========================================
+ Coverage   82.73%   82.76%   +0.02%     
==========================================
  Files          66       66              
  Lines        2410     2414       +4     
==========================================
+ Hits         1994     1998       +4     
  Misses        416      416

Flag	Coverage Δ
unittests	`82.76% <100.00%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ellipsis-dev

👍 Looks good to me! Incremental review on 8b29e2c in 49 seconds

More details

Looked at 32 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. tests/unit_tests/data_loader/test_sql_loader.py:221

Draft comment:
The mock_sql_query.return_value should be False to simulate a malicious query, as the test name suggests.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
Despite the test name containing "malicious", this test appears to be focused on testing ImportError handling, not malicious query detection. The test expects an ImportError to be raised regardless of whether the query is malicious or not. Setting mock_sql_query.return_value to True or False doesn't affect the test's core purpose since the ImportError happens first.
The test name is misleading since it's not actually testing malicious query behavior. The comment author may have been misled by the test name.
The test's behavior is correct as changed - it's testing module import failure, not query safety, so the mock_sql_query value is not relevant to the test's purpose.
The comment should be deleted as it misunderstands the purpose of the test case, which is to verify ImportError handling rather than malicious query detection.

2. pandasai/data_loader/sql_loader.py:54

Draft comment:
The error message was missing a comma and a period. The correction made in the PR is appropriate.
Reason this comment was not posted:
Comment did not seem useful.

3. tests/unit_tests/data_loader/test_sql_loader.py:213

Draft comment:
Consider adding unit tests to cover the changes made for parameterized queries in the execute_query method.
Reason this comment was not posted:
Comment was not on a valid diff hunk.

Workflow ID: wflow_FjVETGQgJRjNaQRI

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ArslanSaleem added 2 commits January 31, 2025 19:17

fix(sql): sanitize to support regression query

324b2ea

Merge branch 'main' into fix/sql_integration_regressions

51dac82

ArslanSaleem requested a review from gventuri January 31, 2025 18:23

ellipsis-dev bot reviewed Jan 31, 2025

View reviewed changes

fix(sql): update comment and test case

8b29e2c

ellipsis-dev bot reviewed Jan 31, 2025

View reviewed changes

gventuri merged commit f667367 into main Jan 31, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sql): allow paramerized query through sql sanitization #1576

fix(sql): allow paramerized query through sql sanitization #1576

ArslanSaleem commented Jan 31, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot Jan 31, 2025

codecov bot commented Jan 31, 2025 •

edited

Loading

ellipsis-dev bot left a comment

	f"{source_type.capitalize()} connector not found. Please install the pandasai_sql[{source_type}] library.e.g. `pip install pandasai_sql[{source_type}]`"
	f"{source_type.capitalize()} connector not found. Please install the pandasai_sql[{source_type}] library, e.g., `pip install pandasai_sql[{source_type}]`."

fix(sql): allow paramerized query through sql sanitization #1576

fix(sql): allow paramerized query through sql sanitization #1576

Conversation

ArslanSaleem commented Jan 31, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot Jan 31, 2025

Choose a reason for hiding this comment

codecov bot commented Jan 31, 2025 • edited Loading

Codecov Report

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ArslanSaleem commented Jan 31, 2025 •

edited by ellipsis-dev bot

Loading

codecov bot commented Jan 31, 2025 •

edited

Loading