- Changes to work with Databricks SDK
v0.38.0
(#350). In this release, we have upgraded the Databricks SDK to version 0.38.0 from version 0.37.0 to ensure compatibility with the latest SDK and address several issues. The update includes changes to make the code compatible with the new SDK version, removing the need for.as_dict()
method calls when creating or updating dashboards and utilizing asdk_dashboard
variable for interacting with the Databricks workspace. We also updated the dependencies to "databricks-labs-blueprint[yaml]" package version greater than or equal to 0.4.2 andsqlglot
package version greater than or equal to 22.3.1. Thetest_core.py
file has been updated to address multiple issues (#349 to #332) related to the Databricks SDK and thetest_dashboards.py
file has been revised to work with the new SDK version. These changes improve integration with Databricks' lakeview dashboards, simplify the code, and ensure compatibility with the latest SDK version, resolving issues #349 to #332. - Specify the minimum required version of
databricks-sdk
as 0.37.0 (#331). In this release, we have updated the minimum required version of thedatabricks-sdk
package to 0.37.0 from 0.29.0 in thepyproject.toml
file to ensure compatibility with the latest version. This change was made necessary due to updates made in issue #320. To accommodate any patch release ofdatabricks-sdk
with a major and minor version of 0.37, we have updated the dependency constraint to use the~=
operator, resolving issue #330. These changes are intended to enhance the compatibility and stability of our software.
- Added nightly tests run at 4:45am UTC (#318). A new nightly workflow has been added to the codebase, designed to automate a series of jobs every day at 4:45am UTC on the
larger
environment. The workflow includes permissions for writing id-tokens, accessing issues, reading contents and pull-requests. It checks out the code with a full fetch-depth, installs Python 3.10, and uses hatch 1.9.4. The key step in this workflow is the execution of nightly tests using the databrickslabs/sandbox/acceptance action, which creates issues if necessary. The workflow utilizes several secrets, including VAULT_URI, GITHUB_TOKEN, ARM_CLIENT_ID, and ARM_TENANT_ID, and sets the TEST_NIGHTLY environment variable to true. Additionally, the workflow is part of a concurrency group called "single-acceptance-job-per-repo", ensuring that only one acceptance job runs at a time per repository. - Bump codecov/codecov-action from 4 to 5 (#319). In this version update, the Codecov GitHub Action has been upgraded from 4 to 5, bringing improved functionality and new features. This new version utilizes the Codecov Wrapper to encapsulate the CLI, enabling faster updates. Additionally, an opt-out feature has been introduced for tokens in public repositories, allowing contributors and other members to upload coverage reports without requiring access to the Codecov token. The upgrade also includes changes to the arguments:
file
is now deprecated and replaced withfiles
, andplugin
is deprecated and replaced withplugins
. New arguments have been added, includingbinary
,gcov_args
,gcov_executable
,gcov_ignore
,gcov_include
,report_type
,skip_validation
, andswift_project
. Comprehensive documentation on these changes can be found in the release notes and changelog. - Fixed
RuntimeBackend
exception handling (#328). In this release, we have made significant improvements to the exception handling in theRuntimeBackend
component, addressing issues reported in tickets #328, #327, #326, and #325. We have updated theexecute
andfetch
methods to handle exceptions more gracefully and changed exception handling from catchingException
to catchingBaseException
for more comprehensive error handling. Additionally, we have updated thepyproject.toml
file to use a newer version of thedatabricks-labs-pytester
package (0.2.1 to 0.5.0) which may have contributed to the resolution of these issues. Furthermore, thetest_backends.py
file has been updated to improve the readability and user-friendliness of the test output for the functions testing if aNotFound
,BadRequest
, orUnknown
exception is raised when executing and fetching statements. Thetest_runtime_backend_use_statements
function has also been updated to printPASSED
orFAILED
instead of returning those values. These changes enhance the robustness of the exception handling mechanism in theRuntimeBackend
class and update related unit tests.
Dependency updates:
- Bump codecov/codecov-action from 4 to 5 (#319).
- Added
escape_name
function to escape individual SQL names andescape_full_name
function to escape dot-separated full names (#316). Two new functions,escape_name
andescape_full_name
, have been added to thedatabricks.labs.lsql.escapes
module for escaping SQL names. Theescape_name
function takes a single name as an input and returns it enclosed in backticks, whileescape_full_name
handles dot-separated full names by escaping each individual component. These functions have been ported from thedatabrickslabs/ucx
repository and are designed to provide a consistent way to escape names and full names in SQL statements, improving the robustness of the system by preventing issues caused by unescaped special characters in SQL names. The test suite includes various cases, including single names, full names with different combinations of escaped and unescaped components, and special characters, with a specific focus on the scenario where the column name contains a period. - Bump actions/checkout from 4.2.0 to 4.2.1 (#304). In this pull request, the
actions/checkout
dependency is updated from version 4.2.0 to 4.2.1 in the.github/workflows/release.yml
file. This update includes a new feature whererefs/*
are checked out by commit if provided, falling back to the ref specified by the@orhantoy
user. This change improves the flexibility of the action, allowing users to specify a commit or branch for checkout. The pull request also introduces a new contributor,@Jcambass
, who added a workflow file for publishing releases to an immutable action package. The commits for this release include changes to prepare for the 4.2.1 release, add a workflow file for publishing releases, and check out otherrefs/*
by commit if provided, falling back to ref. This pull request has been reviewed and approved by Dependabot. - Bump actions/checkout from 4.2.1 to 4.2.2 (#310). This is a pull request to update the
actions/checkout
dependency from version 4.2.1 to 4.2.2, which includes improvements to theurl-helper.ts
file that now utilize well-known environment variables and expanded unit test coverage for theisGhes
function. Theactions/checkout
action is commonly used in GitHub Actions workflows for checking out a repository at a specific commit or branch. The changes in this update are internal to theactions/checkout
action and should not affect the functionality of the project utilizing this action. The pull request also includes details on the commits and compatibility score for the upgrade, and reviewers can manage and merge the request using Dependabot commands once the changes have been verified. - Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 (#307). In this release, the
databrickslabs/sandbox
dependency has been updated from versionacceptance/v0.3.0
to0.3.1
. This update includes previously tagged commits, bug fixes for git-related libraries, and resolution of theunsupported protocol scheme
error. The README has been updated with more information on using thedatabricks labs sandbox
command, and installation instructions have been improved. Additionally, there have been dependency updates forgo-git
libraries andgolang.org/x/crypto
in the/go-libs
and/runtime-packages
directories. New commits in this release allow larger logs from acceptance tests and implement experimental OIDC refresh functionality. Ignore conditions have been applied to prevent conflicts with previous versions of the dependency. This update is recommended for users who want to take advantage of the latest bug fixes and improvements. - Bump databrickslabs/sandbox from acceptance/v0.3.1 to 0.4.2 (#315). In this release, the
databrickslabs/sandbox
dependency has been updated from versionacceptance/v0.3.1
to0.4.2
. This update includes bug fixes, dependency updates, and additional go-git libraries. Specifically, theRun integration tests
job in the GitHub Actions workflow has been updated to use the new version of thedatabrickslabs/sandbox/acceptance
Docker image. The updated version also includes install instructions, usage instructions in the README, and a modification to provide more git-related libraries. Additionally, there were several updates to dependencies, includinggolang.org/x/crypto
version0.16.0
to0.17.0
. Dependabot, a tool that manages dependencies in GitHub projects, is responsible for the update and provides instructions for resolving any conflicts or merging the changes into the project. This update is intended to improve the functionality and reliability of thedatabrickslabs/sandbox
dependency. - Deprecate
Row.as_dict()
(#309). In this release, we are introducing a deprecation warning for theas_dict()
method in theRow
class, which will be removed in favor of theasDict()
method. This change aims to maintain consistency with Spark'sRow
behavior and prevent subtle bugs when switching between different backends. The deprecation warning will be implemented using Python's warnings mechanism, including the new annotation in Python 3.13 for static code analysis. The existing functionality of fetching values from the database throughStatementExecutionExt
remains unchanged. We recommend that clients update their code to use.asDict()
instead of.as_dict()
to avoid any disruptions. A new test casetest_row_as_dict_deprecated()
has been added to verify the deprecation warning forRow.as_dict()
. - Minor improvements for
.save_table(mode="overwrite")
(#298). In this release, the.save_table()
method has been improved, particularly when using theoverwrite
mode. If no rows are supplied, the table will now be truncated, ensuring consistency with the mock backend behavior. This change has been optimized for SQL-based backends, which now perform truncation as part of the insert for the first batch. Type hints on the abstract method have been updated to match the concrete implementations. Unit tests and integration tests have been updated to cover the new functionality, and new methods have been added to test the truncation behavior in overwrite mode. These improvements enhance the consistency and efficiency of the.save_table()
method when usingoverwrite
mode across different backends. - Updated databrickslabs/sandbox requirement to acceptance/v0.3.0 (#305). In this release, we have updated the requirement for the
databrickslabs/sandbox
package to versionacceptance/v0.3.0
in thedownstreams.yml
file. This update is necessary to use the latest version of the package, which includes several bug fixes and dependency updates. Thedatabrickslabs/sandbox
package is used in the acceptance tests, which are run as part of the CI/CD pipeline. It provides a set of tools and utilities for developing and testing code in a sandbox environment. The changelog for this version includes the addition of install instructions, more git-related libraries, and the modification of the README to include information about how to use it with thedatabricks labs sandbox
command. Specifically, the version of thedatabrickslabs/sandbox
package used in theacceptance
job has been updated fromacceptance/v0.1.4
toacceptance/v0.3.0
, allowing the integration tests to be run using the latest version of the package. The ignore conditions for this PR ensure that Dependabot will resolve any conflicts that may arise and can be manually triggered with the@dependabot rebase
command.
Dependency updates:
- Bump actions/checkout from 4.2.0 to 4.2.1 (#304).
- Updated databrickslabs/sandbox requirement to acceptance/v0.3.0 (#305).
- Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 (#307).
- Bump actions/checkout from 4.2.1 to 4.2.2 (#310).
- Bump databrickslabs/sandbox from acceptance/v0.3.1 to 0.4.2 (#315).
- Bump actions/checkout from 4.1.7 to 4.2.0 (#295). In this version 4.2.0 release of the
actions/checkout
library, the team has addedRef
andCommit
outputs, which provide the ref and commit that were checked out, respectively. The update also includes dependency updates tobraces
,minor-npm-dependencies
,docker/build-push-action
, anddocker/login-action
, all of which were automatically resolved by Dependabot. These updates improve compatibility and stability for users of the library. This release is a result of contributions from new team members @yasonk and @lucacome. Users can find a detailed commit history, pull requests, and release notes in the associated links. The team strongly encourages all users to upgrade to this new version to access the latest features and improvements. - Set catalog on
SchemaDeployer
to overwrite the defaulthive_metastore
(#296). In this release, the default catalog forSchemaDeployer
has been changed fromhive_metastore
to a user-defined catalog, allowing for more flexibility in deploying resources to different catalogs. A new dependency,databricks-labs-pytester
, has been added with a version constraint of>=0.2.1
, which may indicate the introduction of new testing functionality. TheSchemaDeployer
class has been updated to accept acatalog
parameter and the tests for deploying and deleting schemas, tables, and views have been updated to reflect these changes. Thetest_deploys_schema
,test_deploys_dataclass
, andtest_deploys_view
tests have been updated to accept ainventory_catalog
parameter, and thecaplog
fixture is used to capture log messages and assert that they contain the expected messages. Additionally, a new test functiontest_statement_execution_backend_overwrites_table
has been added to thetests/integration/test_backends.py
file to test the functionality of theStatementExecutionBackend
class in overwriting a table in the database and retrieving the correct data. Issue #294 has been resolved, and progress has been made on issue #278, but issue #280 has been marked as technical debt and issue #287 is required for the CI to pass.
Dependency updates:
- Bump actions/checkout from 4.1.7 to 4.2.0 (#295).
- Added method to detect rows are written to the
MockBackend
(#292). In this commit, theMockBackend
class in the 'backends.py' file has been updated with a new method, 'has_rows_written_for', which allows for differentiation between a table that has never been written to and one with zero rows. This method checks if a specific table has been written to by iterating over the table stubs in the_save_table
attribute and returningTrue
if the given full name matches any of the stub full names. Additionally, the class has been supplemented with therows_written_for
method, which takes a table name and mode as input and returns a list of rows written to that table in the given mode. Furthermore, several new test cases have been added to test the functionality of theMockBackend
class, including checking if thehas_rows_written_for
method correctly identifies when there are no rows written, when there are zero rows written, and when rows are written after the first and second write operations. These changes improve the overall testing coverage of the project and aid in testing the functionality of theMockBackend
class. The new methods are accompanied by documentation strings that explain their purpose and functionality.
- Added filter spec implementation (#276). In this commit, a new
FilterHandler
class has been introduced to handle filter files with the suffix.filter.json
, which can parse filter specifications in the header of the filter file and validate the filter columns and types. The commit also adds support for three types of filters:DATE_RANGE_PICKER
,MULTI_SELECT
, andDROPDOWN
, which can be linked with multiple visualization widgets. Additionally, aFilterTile
class has been added to theTile
class, which represents a filter tile in the dashboard and includes methods to validate the tile, create widgets, and generate filter encodings and queries. TheDashboardMetadata
class has been updated to include a new methodget_datasets()
to retrieve the datasets for the dashboard. These changes enhance the functionality of the dashboard by adding support for filtering data using various filter types and linking them with multiple visualization widgets, improving the customization and interactivity of the dashboard, and making it more user-friendly and efficient. - Bugfix:
MockBackend
wasn't mockingsavetable
properly when the mode isappend
(#289). This release includes a bugfix and enhancements for theMockBackend
component, which is used to mock theSQLBackend
. The.savetable()
method failed to function as expected inappend
mode, writing all rows to the same table instead of accumulating them. This bug has been addressed, ensuring that rows accumulate correctly inappend
mode. Additionally, a new test function,test_mock_backend_save_table_overwrite()
, has been added to demonstrate the corrected behavior ofoverwrite
mode, showing that it now replaces only the existing rows for the given table while preserving other tables' contents. The type signature for.save_table()
has been updated, restricting themode
parameter to accept only two string literals:"append"
and"overwrite"
. TheMockBackend
behavior has been updated accordingly, and rows are now filtered to exclude anyNone
orNULL
values prior to saving. These improvements to theMockBackend
functionality and test suite increase reliability when using theMockBackend
as a testing backend for the system. - Changed filter spec to use YML instead of JSON (#290). In this release, the filter specification files have been converted from JSON to YAML format, providing a more human-readable format for the filter specifications. The schema for the filter file includes flags for column, columns, type, title, description, order, and id, with the type flag taking on values of DROPDOWN, MULTI_SELECT, or DATE_RANGE_PICKER. This change impacts the FilterHandler, is_filter method, and _from_dashboard_folder method, as well as relevant parts of the documentation. Additionally, the parsing methods have been updated to use yaml.safe_load instead of json.loads, and the is_filter method now checks for .filter.yml suffix. A new file, '00_0_date.filter.yml', has been added to the 'tests/integration/dashboards/filter_spec_basic' directory, containing a sample date filter definition. Furthermore, various tests have been added to validate filter specifications, such as checking for invalid type and both
column
andcolumns
keys being present. These updates aim to enhance readability, maintainability, and ease of use for filter configuration. - Increase testing of generic types storage (#282). A new commit enhances the testing of generic types storage by expanding the test suite to include a list of structs, ensuring more comprehensive testing of the system. The
Foo
struct has been renamed toNested
for clarity, and two new structs,NestedWithDict
andNesting
, have been added. TheNesting
struct contains aNested
object, whileNestedWithDict
includes a string and an optional dictionary of strings. A new test case demonstrates appending complex types to a table by creating and saving a table with two rows, each containing aNesting
struct. The test then fetches the data and asserts the expected number of rows are returned, ensuring the proper functioning of the storage system with complex data types. - Minor Changes to avoid redundancy in code and follow code patterns (#279). In this release, we have made significant improvements to the
dashboards.py
file to make the code more concise, maintainable, and in line with the standard library's recommended usage. Theexport_to_zipped_csv
method has undergone major changes, including the removal of theBytesIO
module import and the use ofStringIO
for handling strings as files. The method no longer creates a separate ZIP file for the CSV files, instead using the providedexport_path
. Additionally, the method skips tiles that don't contain queries. We have also introduced a new method,dataclass_transform
, which transforms a given dataclass into a new one with specific attributes and behavior. This method creates a new dataclass with a custom metaclass and adds a new method,to_dict()
, which converts the instances of the new dataclass to dictionaries. These changes promote code reusability and reduce redundancy in the codebase, making it easier for software engineers to work with. - New example with bar chart in dashboards-as-code (#281). A new example of a dashboard featuring a bar chart has been added to the
dashboards-as-code
feature using the existing metadata overrides feature to support the new widget type, without bloating the TileMetadata structure. An integration test was added to demonstrate the creation of a bar chart, and the resulting dashboard can be seen in the attached screenshot. Additionally, a new SQL file has been added for theProduct Sales
dashboard, showcasing sales data for different product categories. This approach can potentially be used to support other widget types such as Bar, Pivot, Area, etc. The team is encouraged to provide feedback on this proposed solution.
- Added Functionality to export any dashboards-as-code into CSV (#269). The
DashboardMetadata
class now includes a new method,export_to_zipped_csv
, which enables exporting any dashboard as CSV files in a ZIP archive. This method acceptssql_backend
andexport_path
as parameters and exports dashboard queries to CSV files in the specified ZIP archive by iterating through tiles and fetching dashboard queries if the tile is a query. To ensure the proper functioning of this feature, unit tests and manual testing have been conducted. A new test,test_dashboards_export_to_zipped_csv
, has been added to verify the correct export of dashboard data to a CSV file. - Added support for generic types in
SqlBackend
(#272). In this release, we've added support for using rich dataclasses, including those with optional and generic types, in theSqlBackend
of theStatementExecutionBackend
class. The new functionality is demonstrated in thetest_supports_complex_types
unit test, which creates aNested
dataclass containing various complex data types, such as nested dataclasses,datetime
objects,dict
,list
, and optional fields. This enhancement is achieved by updating thesave_table
method to handle the conversion of complex dataclasses to SQL statements. To facilitate type inference, we've introduced a newStructInference
class that converts Python dataclasses and built-in types to their corresponding SQL Data Definition Language (DDL) representations. This addition simplifies data definition and manipulation operations while maintaining type safety and compatibility with various SQL data types.
- Added documentation for exclude flag (#265). A new
exclude
flag has been added to the configuration file for our lab tool, allowing users to specify a path to exclude from formatting during lab execution. This release also includes corrections to grammatical errors in the descriptions of existing flags related to catalog and database settings, such as updatingseperated
to "separate". Additionally, the flag descriptions forpublish
andopen-browser
have been updated for clarification:publish
now clearly controls whether the dashboard is published after creation, whileopen-browser
controls whether the dashboard is opened in a web browser. These changes are aimed at improving user experience and ease of use for our lab tool. - Fixed dataclass field type in _row_to_sql (#266). In this release, we have addressed an issue related to #257 by fixing the dataclass field type in the
_row_to_sql
method of thebackends.py
file. Additionally, we have made updates to the_schema_for
method to use a new_field_type
class method. This change resolves a rare problem where thefield.type
is a string instead of a type and ensures compatibility with a pull request from an external repository (databrickslabs/ucx#2526). The new_field_type
method attempts to load the type from__builtins__
if it's a string and logs a warning if it fails. The_row_to_sql
method now consistently uses the_field_type
method to get the field type. This ensures that the library functions seamlessly and consistently, avoiding any potential issues in the future.
- Make hatch a prerequisite (#259). In this commit, Eric Vergnaud has introduced a change to make the installation of
hatch
version 1.9.4 a prerequisite for the project to avoid errors related topip
command recognition. The Makefile has been updated to handle the installation of hatch automatically, and thehatch env create
command is now used instead ofpip install hatch==1.7.0
. This change ensures that the development environment is consistent and reliable by maintaining the correct version of hatch and automatically handling its installation. Additionally, the.venv/bin/python
anddev
targets have been updated accordingly to reflect these changes. This commit also formats all files using themake dev fmt
command, which helps maintain consistent code formatting throughout the project. - add support for exclusions in
fmt
command (#263). In this release, we have added support for exclusions to thefmt
command in the 'databricks/labs/lsql/cli.py' module. This feature allows users to specify a list of directories or files to exclude while formatting SQL files, which is particularly useful when verifying SQL notebooks in ucx. Thefmt
command now accepts a new optional parameter 'exclude', which accepts an iterable of strings that specify the relative paths to exclude. Anysql_file
that is a descendant of anyexclusion
is skipped during formatting. The exclusions are implemented by converting the relative paths intoPath
objects. This change addresses the issue where single line comments are converted into inlined comments, causing misinterpretation. The added unit test is manually verified, and this pull request fixes issue #261. This feature was authored and co-authored by Eric Vergnaud.
- Fixed dataclass field types (#257). This PR introduces a workaround to a Python bug affecting the
dataclasses.fields()
function, which sometimes returns field types as string type names instead of types. This can cause the ORM to malfunction. The workaround involves checking if the returnedf.type
is a string, and if so, converting it to a type by looking it up in the__builtins__
dictionary. This change is global and affects the_schema_for
function in thebackends.py
file, which is responsible for creating a schema for a given dataclass, taking into account any necessary type conversions. This change ensures consistent and accurate type handling in the face of the Python bug, improving the reliability of our ORM. - Fixed missing EOL when formatting SQL files (#260). In this release, we have addressed an issue related to the inconsistent addition of end-of-line (EOL) characters in formatted SQL files. The
QueryTile.format()
method has been updated to ensure that an EOL character is always added, except when the input query already ends with a newline. This change enhances the reliability of the SQL formatting functionality, making the output format more predictable and improving the overall user experience. The new implementation is demonstrated in thetest_query_format_preserves_eol()
test case, and existing test cases have been updated to check for the presence of EOL characters, further ensuring consistent and correct formatting. - Fixed normalize case input in cli (#258). In this release, we have updated the
fmt
command in thecli.py
file to allow users to specify whether they want to normalize the case of SQL files when formatting. Thenormalize_case
parameter now defaults to the string"true"
and checks if it is in theSTRING_AFFIRMATIVES
list to determine whether to normalize the case of SQL files. Additionally, we have introduced a new optionalnormalize_case
parameter in theformat
method of thedashboards.py
file in the Databricks CLI, which normalizes the identifiers in the query to lower case when set toTrue
. We have also added support for a newnormalize_case
parameter in theQueryTile.format()
method, which prevents the automatic normalization of string input to uppercase when set toFalse
. This change allows for more flexibility in handling string input and ensures that the input string is preserved as-is. These updates improve the functionality and usability of the open-source library, providing more control to users over formatting and handling of string input.
- Added design for filter file (#251). A new feature has been added to enable the creation of filters for multiple widgets in a dashboard using a
.filter.json
file. This file allows users to specify columns to be filtered, the filter type, title, description, order, and a unique ID for each filter. Both thecolumn
andcolumns
flags are supported, with the former taking a single string and the latter taking a list of strings. The filter type can be set to a drop-down menu or another type as desired. The.filter.json
file schema also supports optionaltitle
anddescription
strings, as well asorder
andID
flags. An example of a.filter.json
file is provided in the commit message. Additionally, thedashboard.yml
file documentation has been updated to include information on how to use the new.filter.json
file. - adding normalize-case option to databricks labs lsql fmt cmd (#254). In this open-source library release, the
databricks labs lsql
tool'sfmt
command now supports a new flag,normalize-case
. This flag allows users to control the normalization of query text to lowercase, providing more flexibility when formatting SQL queries. By default, query text is still normalized to lowercase, but users can now prevent this behavior by setting thenormalize-case
flag toFalse
. This change addresses an issue where some queries are case sensitive, such as those using map field keys in UCX dashboards. Additionally, a new parameternormalize_case
has been added to theformat
method in thedashboards.py
file, with updated method documentation. A new test function,test_query_formats_no_normalize()
, has also been included to ensure consistent formatter behavior.
- Removed deploy_dashboard method (#240). In this release, the
deploy_dashboard
method has been removed from thedashboards.py
file and the legacy deployment method has been deprecated. Thedeploy_dashboard
method was previously used to deploy a dashboard to a workspace, but it has been replaced with thecreate
method of thelakeview
attribute of the WorkspaceClient object. Additionally, thetest_dashboards_creates_dashboard_via_legacy_method
method has been removed. A new test has been added to ensure that thedeploy_dashboard
method is no longer being used, utilizing thedeprecated_call
function from pytest to verify that calling the method raises a deprecation warning. This change simplifies the code and improves the overall design of the system, resolving issue #232. The_with_better_names
method andcreate_dashboard
method remain unchanged. - Skip test that fails due to insufficient permission to create schema (#248). A new test function,
test_dashboards_creates_dashboard_with_replace_database
, has been added to the open-source library, but it is currently marked to be skipped due to missing permissions to create a schema. This function creates an instance of theDashboards
class with thews
parameter, creates a dashboard using themake_dashboard
function, and performs various actions using the created dashboard, as well as functions such astmp_path
andsql_backend
. This test function aims to ensure that theDashboards
class functions as expected when creating a dashboard with a replaced database. Once the necessary permissions for creating a schema are acquired, this test function can be enabled for further testing and validation. - Updates to use the Databricks Python sdk 0.30.0 (#247). In this release, we have updated the project to use Databricks Python SDK version 0.30.0. This update includes changes to the
execute
andfetch_value
functions, which now use the newStatementResponse
type instead ofExecuteStatementResponse
. A conditional import statement has been added to maintain compatibility with both Databricks SDK versions 0.30.0 and below. Theexecute
function now raisesTimeoutError
when the specified timeout is greater than 50 seconds and the statement execution hasn't finished. Additionally, thefetch_value
function has been updated to handle the case when theexecute
function returnsNone
. The unit test filetest_backends.py
has also been updated to reflect these changes, with multiple test functions now using theStatementResponse
class instead ofExecuteStatementResponse
. These changes improve the system's compatibility with the latest version of the Databricks SDK, ensuring that the core functionality of the SDK continues to work as expected.
- Fixed missing widget name suffixes (#243). In this release, we have addressed an issue related to missing widget name suffixes (#243) by adding a
_widget
suffix to the name of the widget object in the dashboards.py file. This change ensures consistency between the widget name and the id of the query, facilitating user understanding of the relationship between the two. A new method, _get_query_widget_spec, has also been added, although its specific functionality requires further investigation. Additionally, the unit tests in thetests/unit/test_dashboards.py
file have been updated to check for the presence of the_widget
suffix in widget names, ensuring that the tests accurately reflect the desired behavior. These changes improve the consistency of dashboard widget naming, thus benefiting software engineers utilizing or extending the project's widget-ordering functionalities.
- Fixed dataset/widget name uniqueness requirement that was preventing dashboards being deployed (#241). A fix has been implemented to address a uniqueness requirement issue with the dataset/widget name that was preventing dashboard deployment. A new
widget
instance is now created with a unique name, generated by appending_widget
to the metadata ID, in theget_layouts
method. This ensures that multiple widgets with the same ID but different content can exist in a single dashboard, thereby meeting the name uniqueness requirement. In thesave_to_folder
method, the widget name is modified by removing the_widget
suffix before writing the textbox specification to a markdown file, maintaining consistency between the widget ID and file name. These changes are localized to theget_layouts
andsave_to_folder
methods, and no new methods have been added. The existing functionality related to the creation, validation, and saving of dashboard layouts remains unaltered.
- Added publish flag to
Dashboards.create_dashboard
(#233). In this release, we have added apublish
flag to theDashboards.create_dashboard
method, allowing users to publish the dashboard upon creation, thereby resolving issue #219. This flag is included in thelabs.yml
file with a description of its functionality. Additionally, theno-open
flag's description has been updated to specify that it prevents the dashboard from opening in the browser after creation. Thecreate_dashboard
function in thecli.py
anddashboards.py
files has been updated to include the newpublish
flag, allowing for more flexibility in how users create and manage their dashboards. TheDashboards.create_dashboard
method now calls theWorkspaceClient.lakeview.publish
method when thepublish
flag is set toTrue
, which publishes the created dashboard. This behavior is covered in the updated tests for the method. - Fixed boolean cli flags (#235). In this release, we have improved the handling of command-line interface (CLI) flags in the
databricks labs
command. Specifically, we have addressed the limitation that pure boolean flags are not supported. Now, when using boolean flags, the user will be prompted to confirm with ay
or 'yes'. We have modified thecreate_dashboard
command to accept string inputs for thepublish
andno_open
flags, which are then converted to boolean values for internal use. Additionally, we have introduced a newopen-browser
command, which will open the dashboard in the browser after creating when set toy
or 'yes'. These changes have been tested manually to ensure correct behavior. This improvement provides a more flexible input experience and better handling of boolean flags in the CLI command for software engineers using the open-source library. - Fixed format breaks widget (#238). In this release, we've made significant changes to the 'databricks/labs/lsql' directory's 'dashboards.py' file to address formatting breaks in the widget that could occur with Call to Action (CTA) presence in a query. These changes include the addition of new class variables, including _SQL_DIALECT and _DIALECT, and their integration into existing methods such as _parse_header, validate, format, _get_abstract_syntax_tree, and replace_catalog_and_database_in_query. Furthermore, we have developed new methods for creating and deleting schemas and getting the current test purge time. We have also implemented new integration tests to demonstrate the fix for the formatting issue and added new test cases for the query handler's header-splitting functionality, query formatting, and CTE handling. These enhancements improve the library's handling of SQL queries and query tiles in the context of dashboard creation, ensuring proper parsing, formatting, and metadata extraction for a wide range of query scenarios.
- Fixed replace database when catalog or database is None (#237). In this release, we have addressed an issue where system tables disappeared in ucx dashboards when replacing the placeholder database. To rectify this, we have developed a new method,
replace_catalog_and_database_in_query
, in thedashboards.py
file'sreplace_database
function. This method checks if the catalog or database in a query match the ones to be replaced and replaces them with new ones, ensuring that system tables are not lost during the replacement process. Additionally, we have introduced new unit tests intest_dashboards.py
to verify that queries are correctly transformed when replacing the database or catalog in the query. These tests include various scenarios, using two parametrized test functions, to ensure the correct functioning of the feature. This change provides a more robust and reliable dashboard display when replacing the placeholder database in the system.
- Fixed dashboard deployment/creation (#230). The recent changes to our open-source library address issues related to dashboard deployment and creation, enhancing their reliability and consistency. The
deploy_dashboard
function has been deprecated in favor of the more accuratecreate_dashboard
function, which now includes apublish
flag. Avalidate
method has been added to theTile
,MarkdownTile
, andQueryTile
classes to raise an error if the dashboard is invalid. Thetest_dashboards.py
file has been updated to reflect these changes. These enhancements address issues #222, #229, and partially resolve #220. The commit includes an image of a dashboard created through the deprecateddeploy_dashboard
method. These improvements ensure better dashboard creation, validation, and deployment, while also maintaining backward compatibility through the deprecation ofdeploy_dashboard
.
- Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 (#224). In version 3.0.0 of sigstore/gh-action-sigstore-python, several changes, additions, and removals have been implemented. Notably, certain settings such as fulcio-url, rekor-url, ctfe, and rekor-root-pubkey have been removed. Additionally, the output settings signature, certificate, and bundle have also been removed. The inputs are now parsed according to POSIX shell lexing rules for better consistency. The release-signing-artifacts setting no longer causes a hard error when used under the incorrect event. Furthermore, various deprecations present in sigstore-python's 2.x series have been resolved. The default suffix has been changed from .sigstore to .sigstore.json, in line with Sigstore's client specification. The release-signing-artifacts setting now defaults to true. This version also includes several bug fixes and improvements to support CI runners that use PEP 668 to constrain global package prefixes.
- Use default factory to create
Tile._position
(#226). In this change, the default value creation for the_position
field in various classes includingTile
,MarkdownTile
,TableTile
, andCounterTile
has been updated. Previously, a newPosition
object was explicitly created for the default value. With this update, thedefault_factory
argument of thedataclasses.field
function is now used to create a newPosition
object. This change is made in anticipation of the Python 3.11 release, which modifies the field default mutability check behavior. By utilizing thedefault_factory
approach, we ensure that a newPosition
object is generated during each instance creation, rather than reusing a single default instance. This guarantees the immutability of default values and aligns with best practices for forward-compatibility with future Python versions. It is important to note that this modification does not affect the functionality of the classes but enhances their initialization process.
Dependency updates:
- Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 (#224).
- Added
databricks labs lsql fmt
command (#221). The commit introduces a new command,databricks labs lsql fmt
, to the open-source library, which formats SQL files in a given folder using the Databricks SDK. This command can be used without authentication and accepts afolder
flag, which specifies the directory containing SQL files to format. The change also updates the labs.yml file and includes a new method,format
, in theQueryTile
class, which formats SQL queries using thesqlglot
library. This commit enhances the functionality of the CLI for SQL file formatting and improves the readability and consistency of SQL files, making it easier for developers to understand and maintain the code. Additionally, the commit includes changes to various SQL files to demonstrate the improved formatting, such as converting SQL keywords to uppercase, adding appropriate spacing around keywords and operators, and aligning column names in theVALUES
clause. The purpose of this change is to ensure that the formatting method works correctly and does not introduce any issues in the existing functionality.
- Added method to dashboards to get dashboard url (#211). In this release, we have added a new method
get_url
to thelakeview_dashboards
object in thelaksedashboard
library. This method utilizes the Databricks SDK to retrieve the dashboard URL, simplifying the code and making it more maintainable. Previously, the dashboard URL was constructed by concatenating the host and dashboard ID, but this new method ensures that the URL is obtained correctly, even if the format changes in the future. Additionally, a new unit test has been added for a method that gets the dashboard URL using the workspace client. This new functionality allows users to easily retrieve the URL for a dashboard using its ID and the workspace client. - Extend replace database in query (#210). This commit extends the database replacement functionality in the
DashboardMetadata
class, allowing users to specify which database and catalog to replace. The enhancement includes support for catalog replacement and a newreplace_database
method in theDashboardMetadata
class, which replaces the catalog and/or database in the query based on provided parameters. These changes enhance the flexibility and customization of the database replacement feature in queries, making it easier for users to control how their data is displayed in the dashboard. Thecreate_dashboard
function has also been updated to use the new method for replacing the database and catalog. Additionally, theTileMetadata
update method has been replaced with a new merge method, and theQueryTile
andTile
classes have new properties and methods for handling content, width, height, and position. The commit also includes several unit tests to ensure the new functionality works as expected. - Improve object oriented dashboard-as-code implementation (#208). In this release, the object-oriented implementation of the dashboard-as-code feature has been significantly improved, addressing previous pull request comments (#201). The
TileMetadata
dataclass now includes methods for updating and comparing tile metadata, and theDashboardMetadata
class has been removed and its functionality incorporated into theDashboards
class. TheDashboards
class now generates tiles, datasets, and layouts for dashboards using the providedquery_transformer
. The code's readability and maintainability have been further enhanced by replacing the use of thecopy
module withdataclasses.replace
for creating object copies. Additionally, updates have been made to the unit tests for dashboard functionality in the project, with new methods and attributes added to check for valid dashboard metadata and handle duplicate query or widget IDs, as well as to specify the order in which tiles and widgets should be displayed in the dashboard.
- Added Command Execution backend which uses Command Execution API on a cluster (#95). In this release, the databricks labs lSQL library has been updated with a new Command Execution backend that utilizes the Command Execution API. A new
CommandExecutionBackend
class has been implemented, which initializes aCommandExecutor
instance taking a cluster ID, workspace client, and language as parameters. Theexecute
method runs SQL commands on the specified cluster, and thefetch
method returns the query result as an iterator of Row objects. The existingStatementExecutionBackend
class has been updated to inherit from a new abstract base class calledExecutionBackend
, which includes asave_table
method for saving data to tables and is meant to be a common base class for both Statement and Command Execution backends. TheStatementExecutionBackend
class has also been updated to use the newExecutionBackend
abstract class and its constructor now accepts amax_records_per_batch
parameter. Theexecute
andfetch
methods have been updated to use the new_only_n_bytes
method for logging truncated SQL statements. Additionally, theCommandExecutionBackend
class has several methods,execute
,fetch
, andsave_table
to execute commands on a cluster and save the results to tables in the databricks workspace. This new backend is intended to be used for executing commands on a cluster and saving the results in a databricks workspace. - Added basic integration with Lakeview Dashboards (#66). In this release, we've added basic integration with Lakeview Dashboards to the project, enhancing its capabilities. This includes updating the
databricks-labs-blueprint
dependency to version 0.4.2 with the[yaml]
extra, allowing for additional functionality related to handling YAML files. A new file,dashboards.py
, has been introduced, providing a class for interacting with Databricks dashboards, along with methods for retrieving and saving dashboard configurations. Additionally, a new__init__.py
file under thesrc/databricks/labs/lsql/lakeview
directory imports all classes and functions from themodel.py
module, providing a foundation for further development and customization. The release also introduces a new file,model.py
, containing code generated from OpenAPI specs by the Databricks SDK Generator, and a template file,model.py.tmpl
, used for handling JSON data during integration with Lakeview Dashboards. A new file,polymorphism.py
, provides utilities for checking if a value can be assigned to a specific type, supporting correct data typing and formatting with Lakeview Dashboards. Furthermore, a.gitignore
file has been added to thetests/integration
directory as part of the initial steps in adding integration testing to ensure compatibility with the Lakeview Dashboards platform. Lastly, thetest_dashboards.py
file in thetests/integration
directory contains a function,test_load_dashboard(ws)
, which uses theDashboards
class to save a dashboard from a source to a destination path, facilitating testing during the integration process. - Added dashboard-as-code functionality (#201). This commit introduces dashboard-as-code functionality for the UCX project, enabling the creation and management of dashboards using code. The feature resolves multiple issues and includes a new
create-dashboard
command for creating unpublished dashboards. The functionality is available in thelsql
lab and allows for specifying the order and width of widgets, overriding default widget identifiers, and supporting various SQL and markdown header arguments. Thedashboard.yml
file is used to define top-level metadata for the dashboard. This commit also includes extensive documentation and examples for using the dashboard as a library and configuring different options. - Automate opening integration test dashboard in debug mode (#167). A new feature has been added to automatically open the integration test dashboard in debug mode, making it easier for software engineers to debug and troubleshoot. This has been achieved by importing the
webbrowser
andis_in_debug
modules from "databricks.labs.blueprint.entrypoint", and adding a check in thecreate
function to determine if the code is running in debug mode. If it is, a dashboard URL is constructed from the workspace configuration and dashboard ID, and then opened in a web browser using "webbrowser.open". This allows for a more streamlined debugging process for the integration test dashboard. No other parts of the code have been affected by this change. - Automatically tile widgets (#109). In this release, we've introduced an automatic widget tiling feature for the dashboard creation process in our open-source library. The
Dashboards
class now includes a new class variable,_maximum_dashboard_width
, set to 6, representing the maximum width allowed for each row of widgets in the dashboard. Thecreate_dashboard
method has been updated to accept a newself
parameter, turning it into an instance method. A new_get_position
method has been introduced to calculate and return the next available position for placing a widget, and a_get_width_and_height
method has been added to return the width and height for a widget specification, initially handlingCounterSpec
instances. Additionally, we've added new unit tests to improve testing coverage, ensuring that widgets are created, positioned, and sized correctly. These tests also cover the correct positioning of widgets based on their order and available space, as well as the expected width and height for each widget. - Bump actions/checkout from 4.1.3 to 4.1.6 (#102). In the latest release, the 'actions/checkout' GitHub Action has been updated from version 4.1.3 to 4.1.6, which includes checking the platform to set the archive extension appropriately. This release also bumps the version of github/codeql-action from 2 to 3, actions/setup-node from 1 to 4, and actions/upload-artifact from 2 to 4. Additionally, the minor-actions-dependencies group was updated with two new versions. Disabling extensions.worktreeConfig when disabling sparse-checkout was introduced in version 4.1.4. The release notes and changelog for this update can be found in the provided link. This commit was made by dependabot[bot] with contributions from cory-miller and jww3.
- Bump actions/checkout from 4.1.6 to 4.1.7 (#151). In the latest release, the 'actions/checkout' GitHub action has been updated from version 4.1.6 to 4.1.7 in the project's push workflow, which checks out the repository at the start of the workflow. This change brings potential bug fixes, performance improvements, or new features compared to the previous version. The update only affects the version number in the YAML configuration for the 'actions/checkout' step in the release.yml file, with no new methods or alterations to existing functionality. This update aims to ensure a smooth and enhanced user experience for those utilizing the project's push workflows by taking advantage of the possible improvements or bug fixes in the new version of 'actions/checkout'.
- Create a dashboard with a counter from a single query (#107). In this release, we have introduced several enhancements to our dashboard-as-code approach, including the creation of a
Dashboards
class that provides methods for getting, saving, and deploying dashboards. A new method,create_dashboard
, has been added to create a dashboard with a single page containing a counter widget. The counter widget is associated with a query that counts the number of rows in a specified dataset. Thedeploy_dashboard
method has also been added to deploy the dashboard to the workspace. Additionally, we have implemented a new feature for creating dashboards with a counter from a single query, including modifications to thetest_dashboards.py
file and the addition of four new tests. These changes improve the robustness of the dashboard creation process and provide a more automated way to view important metrics. - Create text widget from markdown file (#142). A new feature has been implemented in the library that allows for the creation of a text widget from a markdown file, enhancing customization and readability for users. This development resolves issue #1
- Design document for dashboards-as-code (#105). "The latest release introduces 'Dashboards as Code,' a method for defining and managing dashboards through configuration files, enabling version control and controlled changes. The building blocks include
.sql
,.md
, anddashboard.yml
files, with.sql
defining queries and determining tile order, anddashboard.yml
specifying top-level metadata and tile overrides. Metadata can be inferred or explicitly defined in the query or files. The tile order can be determined by SQL file order,tiles
order indashboard.yml
, or SQL file metadata. This project can also be used as a library for embedding dashboard generation in your code. Configuration precedence follows command-line flags, SQL file headers,dashboard.yml
, and SQL query content. The command-line interface is utilized for dashboard generation from configuration files." - Ensure propagation of
lsql
version intoUser-Agent
header when it is used as library (#206). In this release, thepyproject.toml
file has been updated to ensure that the correct version of thelsql
library is propagated into theUser-Agent
header when used as a library, improving attribution. Thedatabricks-sdk
version has been updated from0.22.0
to0.29.0
, and the__init__.py
file of thelsql
library has been modified to add thewith_user_agent_extra
function from thedatabricks.sdk.core
package for correct attribution. Thebackends.py
file has also been updated with improved type handling in the_row_to_sql
andsave_table
functions for accurate SQL insertion and handling of user-defined classes. Additionally, a test has been added to ensure that thelsql
version is correctly propagated in theUser-Agent
header when used as a library. These changes offer improved functionality and accurate type handling, making it easier for developers to identify the library version when used in other projects. - Fixed counter encodings (#143). In this release, we have improved the encoding of counters in the lsql dashboard by modifying the
create_dashboard
function in thedashboards.py
file. Previously, the counter field encoding was hardcoded as "count," but has been changed to dynamically determine the first field name of the given fields, ensuring that counters are expected to have only one field. Additionally, a new integration test has been added to thetests/integration/test_dashboards.py
file to ensure that the dashboard deployment functionality correctly handles SQL queries that do not perform a count. A new test for theDashboards
class has also been added to check that counter field encoding names are created as expected. TheWorkspaceClient
is mocked and not called in this test. These changes enhance the accuracy of counter encoding and improve the overall functionality and reliability of the lsql dashboard. - Fixed non-existing reference and typo in the documentation (#104). In this release, we've made improvements to the documentation of our open-source library, specifically addressing issue #104. The changes include fixing a non-existent reference and a typo in the
Library size comparison
section of the "comparison.md" document. This section provides guidance for selecting a library based on factors like library size, unified authentication, and compatibility with various Databricks warehouses and SQL Python APIs. The updates clarify the required dependency size for simple applications and scripts, and offer more detailed information about each library option. We've also added a new subsection titledDetailed comparison
to provide a more comprehensive overview of each library's features. These changes are intended to help software engineers better understand which library is best suited for their specific needs, particularly for applications that require data transfer of large amounts of data serialized in Apache Arrow format and low result fetching latency, where we recommend using the Databricks SQL Connector for Python for efficient data transfer and low latency. - Fixed parsing message (#146). In this release, the warning message logged during the creation of a dashboard when a ParseError occurs has been updated to provide clearer and more detailed information about the parsing error. The new error message now includes the specific query being parsed and the exact parsing error, enabling developers to quickly identify the cause of parsing issues. This change ensures that engineers can efficiently diagnose and address parsing errors, improving the overall development and debugging experience with a more informative log format: "Parsing {query}: {error}".
- Improve dashboard as code (#108). The
Dashboards
class in the 'dashboards.py' file has been updated to improve functionality and usability, with changes such as the addition of a type variableT
for type checking and more descriptive names for methods. Thesave_to_folder
method now accepts aDashboard
object and returns aDashboard
object, and a new static methodcreate_dashboard
has been added. Additionally, two new methods_with_better_names
and_replace_names
have been added for improved readability. Theget_dashboard
method now returns aDashboard
object instead of a dictionary. Thesave_to_folder
method now also formats SQL code before saving it to file. These changes aim to enhance the functionality and readability of the codebase and provide more user-friendly methods for interacting with theDashboards
class. In addition to the changes in theDashboards
class, there have been updates in the organization of the project structure. The 'queries/counter.sql' file has been moved to 'dashboards/one_counter/counter.sql' in the 'tests/integration' directory. This modification enhances the organization of the project. Furthermore, several tests for theDashboards
class have been introduced in the 'databricks.labs.lsql.dashboards' module, demonstrating various functionalities of the class and ensuring that it functions as intended. The tests cover saving SQL and YML files to a specified folder, creating a dataset and a counter widget for each query, deploying dashboards with a given display name or dashboard ID, and testing the behavior of thesave_to_folder
anddeploy_dashboard
methods. Lastly, the commit removes thetest_load_dashboard
function and updates thetest_dashboard_creates_one_dataset_per_query
andtest_dashboard_creates_one_counter_widget_per_query
functions to use the updatedDashboard
class. A newreplace_recursively
function is introduced to replace specific fields in a dataclass recursively. A new test functiontest_dashboards_deploys_exported_dashboard_definition
has been added, which reads a dashboard definition from a JSON file, deploys it, and checks if it's successfully deployed using theDashboards
class. A new test functiontest_dashboard_deploys_dashboard_the_same_as_created_dashboard
has also been added, which compares the original and deployed dashboards to ensure they are identical. Overall, these changes aim to improve the functionality and readability of the codebase and provide more user-friendly methods for interacting with theDashboards
class, as well as enhance the organization of the project structure and add new tests for theDashboards
class to ensure it functions as intended. - Infer fields from a query (#111). The
Dashboards
class in thedashboards.py
file has been updated with the addition of a new method,_get_fields
, which accepts a SQL query as input and returns a list ofField
objects using thesqlglot
library to parse the query and extract the necessary information. Thecreate_dashboard
method has been modified to call this new function when creatingQuery
objects for each dataset. If aParseError
occurs, a warning is logged and iteration continues. This allows for the automatic population of fields when creating a new dashboard, eliminating the need for manual specification. Additionally, new tests have been added for invalid queries and for checking if the fields in a query have the expected names. These tests includetest_dashboards_skips_invalid_query
andtest_dashboards_gets_fields_with_expected_names
, which utilize the caplog fixture and create temporary query files to verify functionality. Existing functionality related to creating dashboards remains unchanged. - Make constant all caps (#140). In this release, the project's 'dashboards.py' file has been updated to improve code readability and maintainability. A constant variable
_maximum_dashboard_width
has been changed to all caps, becoming '_MAXIMUM_DASHBOARD_WIDTH'. This modification affects theDashboards
class and its methods, particularly_get_fields
and '_get_position'. The_get_position
method has been revised to use the new all caps constant variable. This change ensures better visibility of constants within the code, addressing issue #140. It's important to note that this modification only impacts the 'dashboards.py' file and does not affect any other functionalities. - Read display name from
dashboard.yml
(#144). In this release, we have introduced a newDashboardMetadata
dataclass that reads the display name of a dashboard from adashboard.yml
file located in the dashboard's directory. If thedashboard.yml
file is absent, the folder name will be used as the display name. This change improves the readability and maintainability of the dashboard configuration by explicitly defining the display name and reducing the need to specify widget information in multiple places. We have also added a new fixture calledmake_dashboard
for creating and cleaning up lakeview dashboards in the test suite. The fixture handles creation and deletion of the dashboard and provides an option to set a custom display name. Additionally, we have added and modified several unit tests to ensure the proper handling of theDashboardMetadata
class and the dashboard creation process, including tests for missing, present, or incorrectdisplay_name
keys in the YAML file. Thedashboards.deploy_dashboard()
function has been updated to handle cases where onlydashboard_id
is provided. - Set widget id in query header (#154). In this release, we've made significant improvements to widget metadata handling in our open-source library. We've introduced a new
WidgetMetadata
class that replaces the previousWidgetMetadata
dataclass, now featuring apath
attribute,spec_type
property, and optional parameters fororder
,width
,height
, and_id
. The_get_widgets
method has been updated to accept an Iterable ofWidgetMetadata
objects, and both_get_layouts
and_get_widgets
methods now sort widgets using the order field. A new class method,WidgetMetadata.from_path
, handles parsing widget metadata from a file path, replacing the removed_get_width_and_height
method. Additionally, theWidgetMetadata
class is now used in thedeploy_dashboard
method, and the test suite for thedashboards
module has been enhanced with updatedtest_widget_metadata_replaces_width_and_height
andtest_widget_metadata_replaces_attribute
functions, as well as new tests for specific scenarios. Issue #154 has been addressed by setting the widget id in the query header, and the aforementioned changes improve flexibility and ease of use for dashboard development. - Use order key in query header if defined (#149). In this release, we've introduced a new feature to use an order key in the query header if defined, enhancing the flexibility and control over the dashboard creation process. The
WidgetMetadata
dataclass now includes an optionalorder
parameter of typeint
, and the_get_arguments_parser()
method accepts the--order
flag with typeint
. Thereplace_from_arguments()
method has been updated to support the neworder
parameter, with a default value ofself.order
. Thecreate_dashboard()
method now implements a new_get_datasets()
method to retrieve datasets from the dashboard folder and introduces a_get_widgets()
method, which accepts a list of files, iterates over them, and yields tuples containing widgets and their corresponding metadata, including the order. These improvements enable the use of an order key in query headers, ensuring the correct order of widgets in the dashboard creation process. Additionally, a new test case has been added to verify the correct behavior of the dashboard deployment with a specified order key in the query header. This feature resolves issue #148. - Use widget width and height defined in query header (#147). In this release, the handling of metadata in SQL files has been updated to utilize the header of the file, instead of the first line, for improved readability and flexibility. This change includes a new WidgetMetadata class for defining the width and height of a widget in a dashboard, as well as new methods for parsing the widget metadata from a provided path. The release also includes updates to the documentation to cover the supported widget arguments
-w or --width
and '-h or --height', and resolves issue #114 by adding a test for deploying a dashboard with a big widget using a new functiontest_dashboard_deploys_dashboard_with_big_widget
. Additionally, new test cases have been added for creating dashboards with custom-sized widgets based on query header width and height values, improving functionality and error handling.
Dependency updates:
- Bump actions/checkout from 4.1.2 to 4.1.3 (#97). The
actions/checkout
dependency has been updated from version 4.1.2 to 4.1.3 in theupdate-main-version.yml
file. This new version includes a check to verify the git version before attempting to disablesparse-checkout
, and adds an SSH user parameter to improve functionality and compatibility. The release notes and CHANGELOG.md file provide detailed information on the specific changes and improvements. The pull request also includes a detailed commit history and links to corresponding issues and pull requests on GitHub for transparency. You can review and merge the pull request to update theactions/checkout
dependency in your project. - Maintain PySpark compatibility for databricks.labs.lsql.core.Row (#99). In this release, we have added a new method
asDict
to theRow
class in thedatabricks.labs.lsql.core
module to maintain compatibility with PySpark. This method returns a dictionary representation of theRow
object, with keys corresponding to column names and values corresponding to the values in each column. Additionally, we have modified thefetch
function in thebackends.py
file to returnRow
objects ofpyspark.sql
when usingself._spark.sql(sql).collect()
. This change is temporary and marked with aTODO
comment, indicating that it will be addressed in the future. We have also added error handling code in thefetch
function to ensure the function operates as expected. TheasDict
method in this implementation simply calls the existingas_dict
method, meaning the behavior of theasDict
method is identical to theas_dict
method. Theas_dict
method returns a dictionary representation of theRow
object, with keys corresponding to column names and values corresponding to the values in each column. The optionalrecursive
argument in theasDict
method, when set toTrue
, enables recursive conversion of nestedRow
objects to nested dictionaries. However, this behavior is not currently implemented, and therecursive
argument is alwaysFalse
by default.
Dependency updates:
- Bump actions/checkout from 4.1.2 to 4.1.3 (#97).
- Added more
NotFound
error type (#94). In the latest update, thecore.py
file in thedatabricks/labs/lsql
package has undergone enhancements to the error handling functionality. The_raise_if_needed
function has been modified to raise aNotFound
error when the error message includes the phrase "does not exist". This update enables the system to categorize specific SQL query errors asNotFound
error messages, thereby improving the overall error handling and reporting capabilities. This change was a collaborative effort, as indicated by the co-authored-by statement in the commit.
- Fixing ovewrite integration tests (#92). A new enhancement has been implemented for the
overwrite
feature's integration tests, addressing a concern with write operations. Two new variables,catalog
and "schema", have been incorporated using theenv_or_skip
function. These variables are utilized in thesave_table
method, which is now invoked twice with the same table, once with theappend
and once with theoverwrite
option. The data in the table is retrieved and checked for accuracy after each call, employing the updatedRow
class with revised field namesfirst
and "second", formerlyname
and "id". This modification ensures the proper operation of theoverwrite
feature during integration tests and resolves any related issues. The commit messageFixing overwrite integration tests
signifies this change.
- Added catalog and schema parameters to execute and fetch (#90). In this release, we have added optional
catalog
andschema
parameters to theexecute
andfetch
methods in theSqlBackend
abstract base class, allowing for more flexibility when executing SQL statements in specific catalogs and schemas. These updates include new method signatures and their respective implementations in theSparkSqlBackend
andDatabricksSqlBackend
classes. The new parameters control the catalog and schema used by theSparkSession
instance in theSparkSqlBackend
class and theSqlClient
instance in theDatabricksSqlBackend
class. This enhancement enables better functionality in multi-catalog and multi-schema environments. Additionally, this change comes with unit tests and integration tests to ensure proper functionality. The new parameters can be used when calling theexecute
andfetch
methods. For example, with aSparkSqlBackend
instancespark_backend
, you can execute a SQL statement in a specific catalog and schema with the following code:spark_backend.execute("SELECT * FROM my_table", catalog="my_catalog", schema="my_schema")
. Similarly, thefetch
method can also be used with the new parameters.
- Check UCX and LSQL for backwards compatibility (#78). In this release, we introduce a new GitHub Actions workflow, downstreams.yml, which automates unit testing for downstream projects upon changes made to the upstream project. The workflow runs on pull requests, merge groups, and pushes to the main branch and sets permissions for id-token, contents, and pull-requests. It includes a compatibility job that runs on Ubuntu, checks out the code, sets up Python, installs the toolchain, and accepts downstream projects using the databrickslabs/sandbox/downstreams action. The job matrix includes two downstream projects, ucx and remorph, and uses the build cache to speed up the pip install step. This feature ensures that changes to the upstream project do not break compatibility with downstream projects, maintaining a stable and reliable library for software engineers.
- Fixed
Builder
object has no attributesdk_config
error (#86). In this release, we've resolved aBuilder
object has no attributesdk_config
error that occurred when initializing a Spark session using theDatabricksSession.builder
method. The issue was caused by using dot notation to access thesdk_config
attribute, which is incorrect. This has been updated to the correct syntax ofsdkConfig
. This change enables successful creation of the Spark session, preventing the error from recurring. TheDatabricksSession
class and its methods, such asgetOrCreate
, continue to be used for interacting with Databricks clusters and workspaces, while theWorkspaceClient
class manages Databricks resources within a workspace.
Dependency updates:
- Bump codecov/codecov-action from 1 to 4 (#84).
- Bump actions/setup-python from 4 to 5 (#83).
- Bump actions/checkout from 2.5.0 to 4.1.2 (#81).
- Bump softprops/action-gh-release from 1 to 2 (#80).
- Added support for
save_table(..., mode="overwrite")
toStatementExecutionBackend
(#74). In this release, we've added support for overwriting a table when saving data using thesave_table
method in theStatementExecutionBackend
. Previously, attempting to use theoverwrite
mode would raise aNotImplementedError
. Now, when this mode is specified, the method first truncates the table before inserting the new rows. The truncation is done using theexecute
method to run aTRUNCATE TABLE
SQL command. Additionally, we've added a new integration test,test_overwrite
, to thetest_deployment.py
file to verify the newoverwrite
mode functionality. A new option,mode="overwrite"
, has been added to thesave_table
method, allowing for the existing data in the table to be deleted and replaced with the new data being written. We've also added two new test cases,test_statement_execution_backend_save_table_overwrite_empty_table
andtest_mock_backend_overwrite
, to verify the new functionality. It's important to note that the method signature has been updated to include a default value for themode
parameter, setting it toappend
by default. This change does not affect the functionality and only provides a more convenient default behavior for users of the method.
- Fixed PyPI badge (#72). In this release, we have implemented a fix to the PyPI badge in the README file of our open-source library. The PyPI badge displays the version of the package and serves as a quick reference for users. This fix ensures the accuracy and proper functioning of the badge, without involving any changes to the functionality or methods within the project. Software engineers can be assured that this update is limited to the README file, specifically the PyPI badge, and will not affect the overall functionality of the library.
- Fixed
no-cheat
check (#71). In this release, we have made improvements to theno-cheat
verification process for new code. Previously, the check for disabling the linter was prone to false positives when the string '# pylint: disable' appeared for reasons other than disabling the linter. The updated code now includes an additional filter to exclude the stringCHEAT
from the search, and the number of characters in the output is counted using thewc -c
command. If the count is not zero, the script will terminate with an error message. This change enhances the accuracy of theno-cheat
check, ensuring that the linter is being used correctly and that all new code meets our quality standards. - Removed upper bound on
sqlglot
dependency (#70). In this update, we have removed the upper bound on thesqlglot
dependency version in the project'spyproject.toml
file. Previously, the version constraint requiredsqlglot
to be at least 22.3.1 but less than 22.5.0. With this modification, there will be no upper limit, enabling the project to utilize any version greater than or equal to 22.3.1. This change provides the project with the flexibility to take advantage of future bug fixes, performance improvements, and new features available in newersqlglot
package versions. Developers should thoroughly test the updated package version to ensure compatibility with the existing codebase.
- Fixed
Builder
object is not callable error (#67). In this release, we have made an enhancement to theBackends
class in thedatabricks/labs/lsql/backends.py
file. TheDatabricksSession.builder()
method call in the__init__
method has been changed toDatabricksSession.builder
. This update uses thebuilder
attribute to create a new instance ofDatabricksSession
without calling it like a function. Thesdk_config
method is then used to configure the instance with the required settings. Finally, thegetOrCreate
method is utilized to obtain aSparkSession
object, which is then passed as a parameter to the parent class constructor. This modification simplifies the code and eliminates the error caused by treating thebuilder
attribute as a callable object. Software engineers may benefit from this change by having a more streamlined and error-free codebase when working with the open-source library. - Prevent silencing of
pylint
(#65). In this release, we have introduced a new job, "no-lint-disabled", to the GitHub Actions workflow for the repository. This job runs on the latest Ubuntu version and checks out the codebase with a full history. It verifies that no new instances of code suppressingpylint
checks have been added, by filtering the differences between the current branch and the main branch for new lines of code, and then checking if any of those new lines contain apylint
disable comment. If any such lines are found, the job will fail and print a message indicating the offending lines of code, thereby ensuring that the codebase maintains a consistent level of quality by not allowing linting checks to be bypassed. - Updated
_SparkBackend.fetch()
to return iterator instead of list (#62). In this release, thefetch()
method of the_SparkBackend
class has been updated to return an iterator instead of a list, which can result in reduced memory usage and improved performance, as the results of the SQL query can now be processed one element at a time. A new exception has been introduced to wrap any exceptions that occur during query execution, providing better debugging and error handling capabilities. Thetest_runtime_backend_fetch()
unit test has been updated to reflect this change, and users of thefetch()
method should be aware that it now returns an iterator and must be consumed to obtain the desired data. Thorough testing is recommended to ensure that the updated method still meets the needs of the application.
- Added support for common parameters in StatementExecutionBackend (#59). The
StatementExecutionBackend
class in thedatabricks.labs.lsql
package'sbackends.py
file now supports the passing of common parameters through keyword arguments (kwargs). This enhancement allows for greater customization and flexibility in the backend's operation, as the kwargs are passed to theStatementExecutionExt
constructor. This change empowers users to control the behavior of the backend, making it more adaptable to various use cases. The key modification in this commit is the addition of the**kwargs
parameter in the constructor signature and passing it toStatementExecutionExt
, with no changes made to any methods within the class.
- Updating packages. In this update, the dependencies specified in the pyproject.toml file have been updated to more recent versions. The outdated packages "databricks-labs-blueprint~=0.4.0" and "databricks-sdk~=0.21.0" have been replaced with "databricks-labs-blueprint>=0.4.2" and "databricks-sdk>=0.22.0", respectively. These updates are expected to bring new features and bug fixes to the software. The dependency
sqlglot
remains unchanged, with the same version requirement range of "sqlglot>=22.3.1,<22.5.0". These updates ensure that the software will function as intended, while also taking advantage of the enhancements provided by the more recent versions of the packages.
- Fixed row converter to properly handle nullable values (#53). In this release, the row converter in the
databricks.labs.lsql.core
module has been updated to handle nullable values correctly. A new methodStatementExecutionExt
has been added, which manages the handling of nullable values during SQL statement execution. TheRow
class has also been modified to include nullable values, improving the robustness and flexibility of SQL execution in dealing with various data types, including null values. These enhancements increase the overall reliability of the system, making it more production-ready. - Improved integration test coverage (#52). In this release, the project's integration test coverage has been significantly improved through several changes. A new function,
make_random()
, has been added to theconftest.py
file to generate a random string of fixed length, aiding in the creation of more meaningful and readable random strings for integration tests. A new file,test_deployment.py
, has been introduced, containing a test function for deploying a database schema and verifying successful data retrieval via a view. Thetest_integration.py
file has been renamed totest_core.py
, with updates to thetest_fetch_one
function to test thefetch_one
method using a SQL query with an aliased value. Additionally, a newFoo
dataclass has been added to thetests/integration/views/__init__.py
file, supporting integration test coverage. Lastly, a new SQL query has been added to the integration test suite, located in thesome.sql
file, which retrieves data from a table namedfoo
in theinventory
schema. These changes aim to enhance the overall stability, reliability, and coverage of the project's integration tests. Note: The changes to the.gitignore
file and the improvements to theStatementExecutionBackend
class in thebackends.py
file are not included in this summary, as they were described in the opening statement. - Rely on
hatch
being present on the build machine (#54). In this release, we have made significant changes to how we manage our build process and toolchain configuration. We have removed the need to manually installhatch
version 1.7.0 in the build machine, and instead, rely on its presence, adding it to the list of required tools in the toolchain configuration. The command to create a virtual environment usinghatch
has also been added, and thepre_setup
section no longer includes installinghatch
, assuming its availability. We have also updated thehatch
package version from 1.7.0 to 1.9.4, which may include bug fixes, performance improvements, or new features. This change may impact the behavior of any existing functionality that relies onhatch
. Thepyproject.toml
file has been modified to update thefmt
andverify
sections, withruff check . --fix
replacingruff . --fix
and the removal ofblack --check .
andisort . --check-only
. A new configuration forisort
has also been added to specify thedatabricks.labs.blueprint
package as a known first-party package, enabling more precise management of imports related to that package. These changes simplify the build process and ensure that the project is using a more recent version of thehatch
package for packaging and distributing Python projects. - Updated sqlglot requirement from ~=22.3.1 to >=22.3.1,<22.5.0 (#51). In this release, we have updated the version constraint for the
sqlglot
package in our project'spyproject.toml
file. Previously, we had set the constraint to~=22.3.1
, allowing for any version with the same major and minor numbers but different patch numbers. With this update, we have changed the constraint to>=22.3.1,<22.5.0
. This change enables our project to utilize bug fixes and improvements made in the latest patch versions ofsqlglot
, while still preventing it from inadvertently using any breaking changes introduced in version 22.5.0 or later versions. This modification allows us to take advantage of the latest features and improvements insqlglot
while maintaining compatibility and stability in our project.
Dependency updates:
- Updated sqlglot requirement from ~=22.3.1 to >=22.3.1,<22.5.0 (#51).
- Added
MockBackend.rows("col1", "col2")[(...), (...)]
helper (#49). In this release, we have added a new helper methodMockBackend.rows("col1", "col2")[(...), (...)]
to simplify testing withMockBackend
. This method allows for the creation of rows using a more concise syntax, taking in the column names and a list of values to be used for each column, and returning a list ofRow
objects with the specified columns and values. Additionally, a__eq__
method has been introduced to check if two rows are equal by converting the rows to dictionaries using the existingas_dict
method and comparing them. The__contains__
method has also been modified to improve the behavior of thein
keyword when used with rows, ensuring columns can be checked for membership in the row in a more intuitive and predictable manner. These changes make it easier to test and work withMockBackend
, improving overall quality and maintainability of the project.
- Updated project metadata (#46). In this release, the project metadata has been updated to reflect changes in the library's capabilities and dependencies. The project now supports lightweight SQL statement execution using the Databricks SDK for Python, setting it apart from other solutions. The library size comparison in the documentation has been updated, reflecting an increase in the compressed and uncompressed size of Databricks Labs LightSQL, as well as the addition of a new direct dependency, SQLglot. The project's dependencies and URLs in the
pyproject.toml
file have also been updated, including a version update fordatabricks-labs-blueprint
and the removal of a specific range forPyYAML
.
Dependency updates:
- Updated sqlglot requirement from ~=22.2.1 to ~=22.3.1 (#43).
- Ported
StatementExecutionExt
from UCX (#31).
Initial commit