-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(sql): allow paramerized query through sql sanitization #1576
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to 51dac82 in 56 seconds
More details
- Looked at
133
lines of code in5
files - Skipped
0
files when reviewing. - Skipped posting
6
drafted comments based on config settings.
1. pandasai/helpers/sql_sanitizer.py:62
- Draft comment:
The placeholder___PLACEHOLDER___
used for parameterized queries might not be unique enough and could conflict with actual query content. Consider using a more unique pattern or a UUID to ensure no conflicts. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
- The placeholder is only used internally within this function. 2. It's only used for parsing, not query execution. 3. The chance of this exact string appearing in a legitimate SELECT query is extremely low. 4. Even if it did appear, it would only affect the parsing step, not the actual query execution. 5. The current implementation is simple and effective for its purpose.
The comment raises a theoretical edge case that could occur if someone had this exact string in their query. We should consider if this could cause false positives in the safety check.
The risk is extremely low, and even if the unlikely collision occurred, it would only potentially cause a false negative (marking a safe query as unsafe), not a security vulnerability.
The comment raises a theoretical concern that has minimal real-world impact and doesn't require action. The current implementation is sufficient for its purpose.
2. extensions/connectors/sql/pandasai_sql/__init__.py:22
- Draft comment:
The comment mentions that the warning suppression can be removed when SQLAlchemy is used, but SQLAlchemy is already being used here. Please clarify or update the comment to reflect the correct context. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment author misunderstood the context. While pandas' read_sql does use SQLAlchemy internally, the code is not directly using SQLAlchemy for database connections. The TODO comment is actually referring to a future migration to use SQLAlchemy directly instead of pymysql/psycopg2. The comment author's confusion makes their feedback incorrect.
Am I certain about the intention behind the TODO comment? Could there be another SQLAlchemy-related issue I'm not aware of?
While I can't be 100% certain of the intention, the code clearly shows direct database driver usage rather than SQLAlchemy, making the comment author's interpretation incorrect.
The comment should be deleted as it's based on a misunderstanding of how SQLAlchemy is currently being used in the code.
3. extensions/connectors/sql/pandasai_sql/__init__.py:41
- Draft comment:
The comment mentions that the warning suppression can be removed when SQLAlchemy is used, but SQLAlchemy is already being used here. Please clarify or update the comment to reflect the correct context. (Also applicable toload_from_cockroachdb
) - Reason this comment was not posted:
Marked as duplicate.
4. extensions/connectors/sql/pandasai_sql/__init__.py:60
- Draft comment:
The comment mentions that the warning suppression can be removed when SQLAlchemy is used, but SQLAlchemy is already being used here. Please clarify or update the comment to reflect the correct context. - Reason this comment was not posted:
Marked as duplicate.
5. extensions/connectors/sql/pandasai_sql/__init__.py:22
- Draft comment:
The comment regarding the suppression of warnings is not grammatically correct. Consider rephrasing it for clarity:
# TODO - This can be removed later when SqlAlchemy is used
- Reason this comment was not posted:
Confidence changes required:50%
The comment in the code regarding the suppression of warnings is not grammatically correct and could be improved for clarity.
6. tests/unit_tests/helpers/test_sql_sanitizer.py:87
- Draft comment:
The comment for the test casetest_safe_query_with_query_params
should be updated to reflect the purpose of the test more clearly:
assert is_sql_query_safe(query) # Test query with parameters is considered safe
- Reason this comment was not posted:
Confidence changes required:50%
The test casetest_safe_query_with_query_params
is correctly added to test the new functionality of handling query parameters. However, the comment should be updated to reflect the purpose of the test more clearly.
Workflow ID: wflow_291HVouRuaS9bDlK
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
pandasai/data_loader/sql_loader.py
Outdated
|
||
except ModuleNotFoundError as e: | ||
raise ImportError( | ||
f"{source_type.capitalize()} connector not found. Please install the pandasai_sql[{source_type}] library.e.g. `pip install pandasai_sql[{source_type}]`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message in the ModuleNotFoundError
exception handling is not grammatically correct. Consider rephrasing it for clarity:
f"{source_type.capitalize()} connector not found. Please install the pandasai_sql[{source_type}] library.e.g. `pip install pandasai_sql[{source_type}]`" | |
f"{source_type.capitalize()} connector not found. Please install the pandasai_sql[{source_type}] library, e.g., `pip install pandasai_sql[{source_type}]`." |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1576 +/- ##
==========================================
+ Coverage 82.73% 82.76% +0.02%
==========================================
Files 66 66
Lines 2410 2414 +4
==========================================
+ Hits 1994 1998 +4
Misses 416 416
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 8b29e2c in 49 seconds
More details
- Looked at
32
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
3
drafted comments based on config settings.
1. tests/unit_tests/data_loader/test_sql_loader.py:221
- Draft comment:
Themock_sql_query.return_value
should beFalse
to simulate a malicious query, as the test name suggests. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
Despite the test name containing "malicious", this test appears to be focused on testing ImportError handling, not malicious query detection. The test expects an ImportError to be raised regardless of whether the query is malicious or not. Setting mock_sql_query.return_value to True or False doesn't affect the test's core purpose since the ImportError happens first.
The test name is misleading since it's not actually testing malicious query behavior. The comment author may have been misled by the test name.
The test's behavior is correct as changed - it's testing module import failure, not query safety, so the mock_sql_query value is not relevant to the test's purpose.
The comment should be deleted as it misunderstands the purpose of the test case, which is to verify ImportError handling rather than malicious query detection.
2. pandasai/data_loader/sql_loader.py:54
- Draft comment:
The error message was missing a comma and a period. The correction made in the PR is appropriate. - Reason this comment was not posted:
Comment did not seem useful.
3. tests/unit_tests/data_loader/test_sql_loader.py:213
- Draft comment:
Consider adding unit tests to cover the changes made for parameterized queries in theexecute_query
method. - Reason this comment was not posted:
Comment was not on a valid diff hunk.
Workflow ID: wflow_FjVETGQgJRjNaQRI
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Important
Enable parameterized queries and suppress SQLAlchemy warnings, with updated tests for query safety and import handling.
is_sql_query_safe()
insql_sanitizer.py
.load_from_mysql()
,load_from_postgres()
, andload_from_cockroachdb()
inpandasai_sql/__init__.py
.ImportError
inexecute_query()
insql_loader.py
if connector is missing.test_safe_query_with_query_params()
intest_sql_sanitizer.py
to verify parameterized query safety.test_mysql_malicious_with_no_import()
intest_sql_loader.py
to check handling of missing imports.test_sql_loader.py
to cover new behavior.This description was created by for 8b29e2c. It will automatically update as commits are pushed.