Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sql): allow paramerized query through sql sanitization #1576

Merged
merged 3 commits into from
Jan 31, 2025

Conversation

ArslanSaleem
Copy link
Collaborator

@ArslanSaleem ArslanSaleem commented Jan 31, 2025

Important

Enable parameterized queries and suppress SQLAlchemy warnings, with updated tests for query safety and import handling.

  • Behavior:
    • Allow parameterized queries by replacing '%s' with a placeholder in is_sql_query_safe() in sql_sanitizer.py.
    • Suppress SQLAlchemy warnings in load_from_mysql(), load_from_postgres(), and load_from_cockroachdb() in pandasai_sql/__init__.py.
    • Raise ImportError in execute_query() in sql_loader.py if connector is missing.
  • Tests:
    • Add test_safe_query_with_query_params() in test_sql_sanitizer.py to verify parameterized query safety.
    • Add test_mysql_malicious_with_no_import() in test_sql_loader.py to check handling of missing imports.
    • Update other tests in test_sql_loader.py to cover new behavior.

This description was created by Ellipsis for 8b29e2c. It will automatically update as commits are pushed.

@ArslanSaleem ArslanSaleem requested a review from gventuri January 31, 2025 18:23
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 51dac82 in 56 seconds

More details
  • Looked at 133 lines of code in 5 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 drafted comments based on config settings.
1. pandasai/helpers/sql_sanitizer.py:62
  • Draft comment:
    The placeholder ___PLACEHOLDER___ used for parameterized queries might not be unique enough and could conflict with actual query content. Consider using a more unique pattern or a UUID to ensure no conflicts.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
  1. The placeholder is only used internally within this function. 2. It's only used for parsing, not query execution. 3. The chance of this exact string appearing in a legitimate SELECT query is extremely low. 4. Even if it did appear, it would only affect the parsing step, not the actual query execution. 5. The current implementation is simple and effective for its purpose.
    The comment raises a theoretical edge case that could occur if someone had this exact string in their query. We should consider if this could cause false positives in the safety check.
    The risk is extremely low, and even if the unlikely collision occurred, it would only potentially cause a false negative (marking a safe query as unsafe), not a security vulnerability.
    The comment raises a theoretical concern that has minimal real-world impact and doesn't require action. The current implementation is sufficient for its purpose.
2. extensions/connectors/sql/pandasai_sql/__init__.py:22
  • Draft comment:
    The comment mentions that the warning suppression can be removed when SQLAlchemy is used, but SQLAlchemy is already being used here. Please clarify or update the comment to reflect the correct context.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment author misunderstood the context. While pandas' read_sql does use SQLAlchemy internally, the code is not directly using SQLAlchemy for database connections. The TODO comment is actually referring to a future migration to use SQLAlchemy directly instead of pymysql/psycopg2. The comment author's confusion makes their feedback incorrect.
    Am I certain about the intention behind the TODO comment? Could there be another SQLAlchemy-related issue I'm not aware of?
    While I can't be 100% certain of the intention, the code clearly shows direct database driver usage rather than SQLAlchemy, making the comment author's interpretation incorrect.
    The comment should be deleted as it's based on a misunderstanding of how SQLAlchemy is currently being used in the code.
3. extensions/connectors/sql/pandasai_sql/__init__.py:41
  • Draft comment:
    The comment mentions that the warning suppression can be removed when SQLAlchemy is used, but SQLAlchemy is already being used here. Please clarify or update the comment to reflect the correct context. (Also applicable to load_from_cockroachdb)
  • Reason this comment was not posted:
    Marked as duplicate.
4. extensions/connectors/sql/pandasai_sql/__init__.py:60
  • Draft comment:
    The comment mentions that the warning suppression can be removed when SQLAlchemy is used, but SQLAlchemy is already being used here. Please clarify or update the comment to reflect the correct context.
  • Reason this comment was not posted:
    Marked as duplicate.
5. extensions/connectors/sql/pandasai_sql/__init__.py:22
  • Draft comment:
    The comment regarding the suppression of warnings is not grammatically correct. Consider rephrasing it for clarity:
# TODO - This can be removed later when SqlAlchemy is used
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The comment in the code regarding the suppression of warnings is not grammatically correct and could be improved for clarity.
6. tests/unit_tests/helpers/test_sql_sanitizer.py:87
  • Draft comment:
    The comment for the test case test_safe_query_with_query_params should be updated to reflect the purpose of the test more clearly:
assert is_sql_query_safe(query)  # Test query with parameters is considered safe
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The test case test_safe_query_with_query_params is correctly added to test the new functionality of handling query parameters. However, the comment should be updated to reflect the purpose of the test more clearly.

Workflow ID: wflow_291HVouRuaS9bDlK


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.


except ModuleNotFoundError as e:
raise ImportError(
f"{source_type.capitalize()} connector not found. Please install the pandasai_sql[{source_type}] library.e.g. `pip install pandasai_sql[{source_type}]`"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message in the ModuleNotFoundError exception handling is not grammatically correct. Consider rephrasing it for clarity:

Suggested change
f"{source_type.capitalize()} connector not found. Please install the pandasai_sql[{source_type}] library.e.g. `pip install pandasai_sql[{source_type}]`"
f"{source_type.capitalize()} connector not found. Please install the pandasai_sql[{source_type}] library, e.g., `pip install pandasai_sql[{source_type}]`."

Copy link

codecov bot commented Jan 31, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.76%. Comparing base (4ca228f) to head (8b29e2c).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1576      +/-   ##
==========================================
+ Coverage   82.73%   82.76%   +0.02%     
==========================================
  Files          66       66              
  Lines        2410     2414       +4     
==========================================
+ Hits         1994     1998       +4     
  Misses        416      416              
Flag Coverage Δ
unittests 82.76% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 8b29e2c in 49 seconds

More details
  • Looked at 32 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. tests/unit_tests/data_loader/test_sql_loader.py:221
  • Draft comment:
    The mock_sql_query.return_value should be False to simulate a malicious query, as the test name suggests.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    Despite the test name containing "malicious", this test appears to be focused on testing ImportError handling, not malicious query detection. The test expects an ImportError to be raised regardless of whether the query is malicious or not. Setting mock_sql_query.return_value to True or False doesn't affect the test's core purpose since the ImportError happens first.
    The test name is misleading since it's not actually testing malicious query behavior. The comment author may have been misled by the test name.
    The test's behavior is correct as changed - it's testing module import failure, not query safety, so the mock_sql_query value is not relevant to the test's purpose.
    The comment should be deleted as it misunderstands the purpose of the test case, which is to verify ImportError handling rather than malicious query detection.
2. pandasai/data_loader/sql_loader.py:54
  • Draft comment:
    The error message was missing a comma and a period. The correction made in the PR is appropriate.
  • Reason this comment was not posted:
    Comment did not seem useful.
3. tests/unit_tests/data_loader/test_sql_loader.py:213
  • Draft comment:
    Consider adding unit tests to cover the changes made for parameterized queries in the execute_query method.
  • Reason this comment was not posted:
    Comment was not on a valid diff hunk.

Workflow ID: wflow_FjVETGQgJRjNaQRI


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@gventuri gventuri merged commit f667367 into main Jan 31, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants