Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat[DatabricksConnector]: connector to connect to Databricks on Cloud #580
feat[DatabricksConnector]: connector to connect to Databricks on Cloud #580
Changes from 2 commits
a7da1a5
70c71ed
358c612
6ef5416
637348a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class docstring seems to be copied from the SnowFlake connector and not updated. It should reflect that this is a Databricks connector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This accepts also a dictionary,
Union[DatabricksConnectorConfig, dict]
. In that case, we need to convert it to a DatabricksConnectorConfig.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the docstring for the
__init__
method refers to SnowFlake instead of Databricks. This needs to be corrected.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The environment variable being checked here is for Snowflake, not Databricks. This should be corrected to check for a Databricks-specific environment variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connection string is being created without any sanitization or encoding of the input parameters. This could potentially lead to security issues such as SQL injection attacks if the config parameters are user-supplied. Consider using a safer method to construct the connection string, such as URL encoding the parameters or using a library function designed for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai, could you make an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gventuri, sure! Here's an example of how you could safely construct the connection string using URL encoding:
In this example, I've used Python's built-in
urllib.parse.quote_plus
function to URL encode the input parameters. This will ensure that any special characters in the parameters are properly escaped, preventing potential SQL injection attacks.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring for the
__repr__
method also refers to SnowFlake instead of Databricks. This needs to be corrected.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exposing sensitive information like tokens in the string representation of the object can be a security risk. If this object is ever logged or printed in a debugging statement, it could expose the token to unauthorized users. Consider removing the token from the string representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArslanSaleem let's not log the sensitive info!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new hunk simplifies the SQL query for row count, which is a good change. However, it assumes that the table name in the configuration does not contain any special characters or reserved words that might need to be escaped. This could lead to SQL injection vulnerabilities if the table name is user-supplied and not properly sanitized. Consider using bind parameters as in the old hunk to mitigate this risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this could create a potential attack. We are using bindparams to validate the table name. Also the suggestion from @coderabbitai might work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gventuri: I'm glad you agree with the suggestion. Using bind parameters is indeed a good practice to prevent SQL injection attacks. It ensures that the table name is properly escaped, preventing malicious input from being executed as part of the SQL query. Here's how you can modify your code:
This way, SQLAlchemy will handle the proper escaping of the table name, making your code safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArslanSaleem: You're correct. My apologies for the confusion. SQLAlchemy's
bindparams
is indeed used to bind parameters, not table names or column names.To safely include a table name in a query, we need to use SQLAlchemy's
Table
construct which provides proper quoting around identifiers. Here's how you can modify your code:This way, SQLAlchemy will handle the proper quoting of the table name, making your code safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
columns_count
method now uses thelen
function on the columns returned by thehead
method instead of running a SQL query. This change could improve performance by reducing the number of database queries. However, make sure that thehead
method always returns the correct and complete set of columns.