Releases: databrickslabs/lsql
Releases · databrickslabs/lsql
v0.4.3
- Bump actions/checkout from 4.1.2 to 4.1.3 (#97). The
actions/checkout
dependency has been updated from version 4.1.2 to 4.1.3 in theupdate-main-version.yml
file. This new version includes a check to verify the git version before attempting to disablesparse-checkout
, and adds an SSH user parameter to improve functionality and compatibility. The release notes and CHANGELOG.md file provide detailed information on the specific changes and improvements. The pull request also includes a detailed commit history and links to corresponding issues and pull requests on GitHub for transparency. You can review and merge the pull request to update theactions/checkout
dependency in your project. - Maintain PySpark compatibility for databricks.labs.lsql.core.Row (#99). In this release, we have added a new method
asDict
to theRow
class in thedatabricks.labs.lsql.core
module to maintain compatibility with PySpark. This method returns a dictionary representation of theRow
object, with keys corresponding to column names and values corresponding to the values in each column. Additionally, we have modified thefetch
function in thebackends.py
file to returnRow
objects ofpyspark.sql
when usingself._spark.sql(sql).collect()
. This change is temporary and marked with aTODO
comment, indicating that it will be addressed in the future. We have also added error handling code in thefetch
function to ensure the function operates as expected. TheasDict
method in this implementation simply calls the existingas_dict
method, meaning the behavior of theasDict
method is identical to theas_dict
method. Theas_dict
method returns a dictionary representation of theRow
object, with keys corresponding to column names and values corresponding to the values in each column. The optionalrecursive
argument in theasDict
method, when set toTrue
, enables recursive conversion of nestedRow
objects to nested dictionaries. However, this behavior is not currently implemented, and therecursive
argument is alwaysFalse
by default.
Dependency updates:
- Bump actions/checkout from 4.1.2 to 4.1.3 (#97).
Contributors: @dependabot[bot], @bishwajit-db
v0.4.2
- Added more
NotFound
error type (#94). In the latest update, thecore.py
file in thedatabricks/labs/lsql
package has undergone enhancements to the error handling functionality. The_raise_if_needed
function has been modified to raise aNotFound
error when the error message includes the phrase "does not exist". This update enables the system to categorize specific SQL query errors asNotFound
error messages, thereby improving the overall error handling and reporting capabilities. This change was a collaborative effort, as indicated by the co-authored-by statement in the commit.
Contributors: @nkvuong
v0.4.1
- Fixing ovewrite integration tests (#92). A new enhancement has been implemented for the
overwrite
feature's integration tests, addressing a concern with write operations. Two new variables,catalog
and "schema", have been incorporated using theenv_or_skip
function. These variables are utilized in thesave_table
method, which is now invoked twice with the same table, once with theappend
and once with theoverwrite
option. The data in the table is retrieved and checked for accuracy after each call, employing the updatedRow
class with revised field namesfirst
and "second", formerlyname
and "id". This modification ensures the proper operation of theoverwrite
feature during integration tests and resolves any related issues. The commit messageFixing overwrite integration tests
signifies this change.
Contributors: @william-conti
v0.4.0
- Added catalog and schema parameters to execute and fetch (#90). In this release, we have added optional
catalog
andschema
parameters to theexecute
andfetch
methods in theSqlBackend
abstract base class, allowing for more flexibility when executing SQL statements in specific catalogs and schemas. These updates include new method signatures and their respective implementations in theSparkSqlBackend
andDatabricksSqlBackend
classes. The new parameters control the catalog and schema used by theSparkSession
instance in theSparkSqlBackend
class and theSqlClient
instance in theDatabricksSqlBackend
class. This enhancement enables better functionality in multi-catalog and multi-schema environments. Additionally, this change comes with unit tests and integration tests to ensure proper functionality. The new parameters can be used when calling theexecute
andfetch
methods. For example, with aSparkSqlBackend
instancespark_backend
, you can execute a SQL statement in a specific catalog and schema with the following code:spark_backend.execute("SELECT * FROM my_table", catalog="my_catalog", schema="my_schema")
. Similarly, thefetch
method can also be used with the new parameters.
Contributors: @FastLee
v0.3.1
- Check UCX and LSQL for backwards compatibility (#78). In this release, we introduce a new GitHub Actions workflow, downstreams.yml, which automates unit testing for downstream projects upon changes made to the upstream project. The workflow runs on pull requests, merge groups, and pushes to the main branch and sets permissions for id-token, contents, and pull-requests. It includes a compatibility job that runs on Ubuntu, checks out the code, sets up Python, installs the toolchain, and accepts downstream projects using the databrickslabs/sandbox/downstreams action. The job matrix includes two downstream projects, ucx and remorph, and uses the build cache to speed up the pip install step. This feature ensures that changes to the upstream project do not break compatibility with downstream projects, maintaining a stable and reliable library for software engineers.
- Fixed
Builder
object has no attributesdk_config
error (#86). In this release, we've resolved aBuilder
object has no attributesdk_config
error that occurred when initializing a Spark session using theDatabricksSession.builder
method. The issue was caused by using dot notation to access thesdk_config
attribute, which is incorrect. This has been updated to the correct syntax ofsdkConfig
. This change enables successful creation of the Spark session, preventing the error from recurring. TheDatabricksSession
class and its methods, such asgetOrCreate
, continue to be used for interacting with Databricks clusters and workspaces, while theWorkspaceClient
class manages Databricks resources within a workspace.
Dependency updates:
- Bump codecov/codecov-action from 1 to 4 (#84).
- Bump actions/setup-python from 4 to 5 (#83).
- Bump actions/checkout from 2.5.0 to 4.1.2 (#81).
- Bump softprops/action-gh-release from 1 to 2 (#80).
Contributors: @dependabot[bot], @nfx, @bishwajit-db, @william-conti
v0.3.0
- Added support for
save_table(..., mode="overwrite")
toStatementExecutionBackend
(#74). In this release, we've added support for overwriting a table when saving data using thesave_table
method in theStatementExecutionBackend
. Previously, attempting to use theoverwrite
mode would raise aNotImplementedError
. Now, when this mode is specified, the method first truncates the table before inserting the new rows. The truncation is done using theexecute
method to run aTRUNCATE TABLE
SQL command. Additionally, we've added a new integration test,test_overwrite
, to thetest_deployment.py
file to verify the newoverwrite
mode functionality. A new option,mode="overwrite"
, has been added to thesave_table
method, allowing for the existing data in the table to be deleted and replaced with the new data being written. We've also added two new test cases,test_statement_execution_backend_save_table_overwrite_empty_table
andtest_mock_backend_overwrite
, to verify the new functionality. It's important to note that the method signature has been updated to include a default value for themode
parameter, setting it toappend
by default. This change does not affect the functionality and only provides a more convenient default behavior for users of the method.
Contributors: @william-conti
v0.2.5
- Fixed PyPI badge (#72). In this release, we have implemented a fix to the PyPI badge in the README file of our open-source library. The PyPI badge displays the version of the package and serves as a quick reference for users. This fix ensures the accuracy and proper functioning of the badge, without involving any changes to the functionality or methods within the project. Software engineers can be assured that this update is limited to the README file, specifically the PyPI badge, and will not affect the overall functionality of the library.
- Fixed
no-cheat
check (#71). In this release, we have made improvements to theno-cheat
verification process for new code. Previously, the check for disabling the linter was prone to false positives when the string '# pylint: disable' appeared for reasons other than disabling the linter. The updated code now includes an additional filter to exclude the stringCHEAT
from the search, and the number of characters in the output is counted using thewc -c
command. If the count is not zero, the script will terminate with an error message. This change enhances the accuracy of theno-cheat
check, ensuring that the linter is being used correctly and that all new code meets our quality standards. - Removed upper bound on
sqlglot
dependency (#70). In this update, we have removed the upper bound on thesqlglot
dependency version in the project'spyproject.toml
file. Previously, the version constraint requiredsqlglot
to be at least 22.3.1 but less than 22.5.0. With this modification, there will be no upper limit, enabling the project to utilize any version greater than or equal to 22.3.1. This change provides the project with the flexibility to take advantage of future bug fixes, performance improvements, and new features available in newersqlglot
package versions. Developers should thoroughly test the updated package version to ensure compatibility with the existing codebase.
Contributors: @nfx
v0.2.4
- Fixed
Builder
object is not callable error (#67). In this release, we have made an enhancement to theBackends
class in thedatabricks/labs/lsql/backends.py
file. TheDatabricksSession.builder()
method call in the__init__
method has been changed toDatabricksSession.builder
. This update uses thebuilder
attribute to create a new instance ofDatabricksSession
without calling it like a function. Thesdk_config
method is then used to configure the instance with the required settings. Finally, thegetOrCreate
method is utilized to obtain aSparkSession
object, which is then passed as a parameter to the parent class constructor. This modification simplifies the code and eliminates the error caused by treating thebuilder
attribute as a callable object. Software engineers may benefit from this change by having a more streamlined and error-free codebase when working with the open-source library. - Prevent silencing of
pylint
(#65). In this release, we have introduced a new job, "no-lint-disabled", to the GitHub Actions workflow for the repository. This job runs on the latest Ubuntu version and checks out the codebase with a full history. It verifies that no new instances of code suppressingpylint
checks have been added, by filtering the differences between the current branch and the main branch for new lines of code, and then checking if any of those new lines contain apylint
disable comment. If any such lines are found, the job will fail and print a message indicating the offending lines of code, thereby ensuring that the codebase maintains a consistent level of quality by not allowing linting checks to be bypassed. - Updated
_SparkBackend.fetch()
to return iterator instead of list (#62). In this release, thefetch()
method of the_SparkBackend
class has been updated to return an iterator instead of a list, which can result in reduced memory usage and improved performance, as the results of the SQL query can now be processed one element at a time. A new exception has been introduced to wrap any exceptions that occur during query execution, providing better debugging and error handling capabilities. Thetest_runtime_backend_fetch()
unit test has been updated to reflect this change, and users of thefetch()
method should be aware that it now returns an iterator and must be consumed to obtain the desired data. Thorough testing is recommended to ensure that the updated method still meets the needs of the application.
Contributors: @nfx, @qziyuan, @bishwajit-db
v0.2.3
- Added support for common parameters in StatementExecutionBackend (#59). The
StatementExecutionBackend
class in thedatabricks.labs.lsql
package'sbackends.py
file now supports the passing of common parameters through keyword arguments (kwargs). This enhancement allows for greater customization and flexibility in the backend's operation, as the kwargs are passed to theStatementExecutionExt
constructor. This change empowers users to control the behavior of the backend, making it more adaptable to various use cases. The key modification in this commit is the addition of the**kwargs
parameter in the constructor signature and passing it toStatementExecutionExt
, with no changes made to any methods within the class.
Contributors: @bishwajit-db
v0.2.2
- Updating packages. In this update, the dependencies specified in the pyproject.toml file have been updated to more recent versions. The outdated packages "databricks-labs-blueprint~=0.4.0" and "databricks-sdk~=0.21.0" have been replaced with "databricks-labs-blueprint>=0.4.2" and "databricks-sdk>=0.22.0", respectively. These updates are expected to bring new features and bug fixes to the software. The dependency
sqlglot
remains unchanged, with the same version requirement range of "sqlglot>=22.3.1,<22.5.0". These updates ensure that the software will function as intended, while also taking advantage of the enhancements provided by the more recent versions of the packages.
Contributors: @william-conti