Skip to content

Latest commit

 

History

History
268 lines (156 loc) · 93.9 KB

CHANGELOG.md

File metadata and controls

268 lines (156 loc) · 93.9 KB

Version changelog

0.14.1

  • Changes to work with Databricks SDK v0.38.0 (#350). In this release, we have upgraded the Databricks SDK to version 0.38.0 from version 0.37.0 to ensure compatibility with the latest SDK and address several issues. The update includes changes to make the code compatible with the new SDK version, removing the need for .as_dict() method calls when creating or updating dashboards and utilizing a sdk_dashboard variable for interacting with the Databricks workspace. We also updated the dependencies to "databricks-labs-blueprint[yaml]" package version greater than or equal to 0.4.2 and sqlglot package version greater than or equal to 22.3.1. The test_core.py file has been updated to address multiple issues (#349 to #332) related to the Databricks SDK and the test_dashboards.py file has been revised to work with the new SDK version. These changes improve integration with Databricks' lakeview dashboards, simplify the code, and ensure compatibility with the latest SDK version, resolving issues #349 to #332.
  • Specify the minimum required version of databricks-sdk as 0.37.0 (#331). In this release, we have updated the minimum required version of the databricks-sdk package to 0.37.0 from 0.29.0 in the pyproject.toml file to ensure compatibility with the latest version. This change was made necessary due to updates made in issue #320. To accommodate any patch release of databricks-sdk with a major and minor version of 0.37, we have updated the dependency constraint to use the ~= operator, resolving issue #330. These changes are intended to enhance the compatibility and stability of our software.

0.14.0

  • Added nightly tests run at 4:45am UTC (#318). A new nightly workflow has been added to the codebase, designed to automate a series of jobs every day at 4:45am UTC on the larger environment. The workflow includes permissions for writing id-tokens, accessing issues, reading contents and pull-requests. It checks out the code with a full fetch-depth, installs Python 3.10, and uses hatch 1.9.4. The key step in this workflow is the execution of nightly tests using the databrickslabs/sandbox/acceptance action, which creates issues if necessary. The workflow utilizes several secrets, including VAULT_URI, GITHUB_TOKEN, ARM_CLIENT_ID, and ARM_TENANT_ID, and sets the TEST_NIGHTLY environment variable to true. Additionally, the workflow is part of a concurrency group called "single-acceptance-job-per-repo", ensuring that only one acceptance job runs at a time per repository.
  • Bump codecov/codecov-action from 4 to 5 (#319). In this version update, the Codecov GitHub Action has been upgraded from 4 to 5, bringing improved functionality and new features. This new version utilizes the Codecov Wrapper to encapsulate the CLI, enabling faster updates. Additionally, an opt-out feature has been introduced for tokens in public repositories, allowing contributors and other members to upload coverage reports without requiring access to the Codecov token. The upgrade also includes changes to the arguments: file is now deprecated and replaced with files, and plugin is deprecated and replaced with plugins. New arguments have been added, including binary, gcov_args, gcov_executable, gcov_ignore, gcov_include, report_type, skip_validation, and swift_project. Comprehensive documentation on these changes can be found in the release notes and changelog.
  • Fixed RuntimeBackend exception handling (#328). In this release, we have made significant improvements to the exception handling in the RuntimeBackend component, addressing issues reported in tickets #328, #327, #326, and #325. We have updated the execute and fetch methods to handle exceptions more gracefully and changed exception handling from catching Exception to catching BaseException for more comprehensive error handling. Additionally, we have updated the pyproject.toml file to use a newer version of the databricks-labs-pytester package (0.2.1 to 0.5.0) which may have contributed to the resolution of these issues. Furthermore, the test_backends.py file has been updated to improve the readability and user-friendliness of the test output for the functions testing if a NotFound, BadRequest, or Unknown exception is raised when executing and fetching statements. The test_runtime_backend_use_statements function has also been updated to print PASSED or FAILED instead of returning those values. These changes enhance the robustness of the exception handling mechanism in the RuntimeBackend class and update related unit tests.

Dependency updates:

  • Bump codecov/codecov-action from 4 to 5 (#319).

0.13.0

  • Added escape_name function to escape individual SQL names and escape_full_name function to escape dot-separated full names (#316). Two new functions, escape_name and escape_full_name, have been added to the databricks.labs.lsql.escapes module for escaping SQL names. The escape_name function takes a single name as an input and returns it enclosed in backticks, while escape_full_name handles dot-separated full names by escaping each individual component. These functions have been ported from the databrickslabs/ucx repository and are designed to provide a consistent way to escape names and full names in SQL statements, improving the robustness of the system by preventing issues caused by unescaped special characters in SQL names. The test suite includes various cases, including single names, full names with different combinations of escaped and unescaped components, and special characters, with a specific focus on the scenario where the column name contains a period.
  • Bump actions/checkout from 4.2.0 to 4.2.1 (#304). In this pull request, the actions/checkout dependency is updated from version 4.2.0 to 4.2.1 in the .github/workflows/release.yml file. This update includes a new feature where refs/* are checked out by commit if provided, falling back to the ref specified by the @orhantoy user. This change improves the flexibility of the action, allowing users to specify a commit or branch for checkout. The pull request also introduces a new contributor, @Jcambass, who added a workflow file for publishing releases to an immutable action package. The commits for this release include changes to prepare for the 4.2.1 release, add a workflow file for publishing releases, and check out other refs/* by commit if provided, falling back to ref. This pull request has been reviewed and approved by Dependabot.
  • Bump actions/checkout from 4.2.1 to 4.2.2 (#310). This is a pull request to update the actions/checkout dependency from version 4.2.1 to 4.2.2, which includes improvements to the url-helper.ts file that now utilize well-known environment variables and expanded unit test coverage for the isGhes function. The actions/checkout action is commonly used in GitHub Actions workflows for checking out a repository at a specific commit or branch. The changes in this update are internal to the actions/checkout action and should not affect the functionality of the project utilizing this action. The pull request also includes details on the commits and compatibility score for the upgrade, and reviewers can manage and merge the request using Dependabot commands once the changes have been verified.
  • Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 (#307). In this release, the databrickslabs/sandbox dependency has been updated from version acceptance/v0.3.0 to 0.3.1. This update includes previously tagged commits, bug fixes for git-related libraries, and resolution of the unsupported protocol scheme error. The README has been updated with more information on using the databricks labs sandbox command, and installation instructions have been improved. Additionally, there have been dependency updates for go-git libraries and golang.org/x/crypto in the /go-libs and /runtime-packages directories. New commits in this release allow larger logs from acceptance tests and implement experimental OIDC refresh functionality. Ignore conditions have been applied to prevent conflicts with previous versions of the dependency. This update is recommended for users who want to take advantage of the latest bug fixes and improvements.
  • Bump databrickslabs/sandbox from acceptance/v0.3.1 to 0.4.2 (#315). In this release, the databrickslabs/sandbox dependency has been updated from version acceptance/v0.3.1 to 0.4.2. This update includes bug fixes, dependency updates, and additional go-git libraries. Specifically, the Run integration tests job in the GitHub Actions workflow has been updated to use the new version of the databrickslabs/sandbox/acceptance Docker image. The updated version also includes install instructions, usage instructions in the README, and a modification to provide more git-related libraries. Additionally, there were several updates to dependencies, including golang.org/x/crypto version 0.16.0 to 0.17.0. Dependabot, a tool that manages dependencies in GitHub projects, is responsible for the update and provides instructions for resolving any conflicts or merging the changes into the project. This update is intended to improve the functionality and reliability of the databrickslabs/sandbox dependency.
  • Deprecate Row.as_dict() (#309). In this release, we are introducing a deprecation warning for the as_dict() method in the Row class, which will be removed in favor of the asDict() method. This change aims to maintain consistency with Spark's Row behavior and prevent subtle bugs when switching between different backends. The deprecation warning will be implemented using Python's warnings mechanism, including the new annotation in Python 3.13 for static code analysis. The existing functionality of fetching values from the database through StatementExecutionExt remains unchanged. We recommend that clients update their code to use .asDict() instead of .as_dict() to avoid any disruptions. A new test case test_row_as_dict_deprecated() has been added to verify the deprecation warning for Row.as_dict().
  • Minor improvements for .save_table(mode="overwrite") (#298). In this release, the .save_table() method has been improved, particularly when using the overwrite mode. If no rows are supplied, the table will now be truncated, ensuring consistency with the mock backend behavior. This change has been optimized for SQL-based backends, which now perform truncation as part of the insert for the first batch. Type hints on the abstract method have been updated to match the concrete implementations. Unit tests and integration tests have been updated to cover the new functionality, and new methods have been added to test the truncation behavior in overwrite mode. These improvements enhance the consistency and efficiency of the .save_table() method when using overwrite mode across different backends.
  • Updated databrickslabs/sandbox requirement to acceptance/v0.3.0 (#305). In this release, we have updated the requirement for the databrickslabs/sandbox package to version acceptance/v0.3.0 in the downstreams.yml file. This update is necessary to use the latest version of the package, which includes several bug fixes and dependency updates. The databrickslabs/sandbox package is used in the acceptance tests, which are run as part of the CI/CD pipeline. It provides a set of tools and utilities for developing and testing code in a sandbox environment. The changelog for this version includes the addition of install instructions, more git-related libraries, and the modification of the README to include information about how to use it with the databricks labs sandbox command. Specifically, the version of the databrickslabs/sandbox package used in the acceptance job has been updated from acceptance/v0.1.4 to acceptance/v0.3.0, allowing the integration tests to be run using the latest version of the package. The ignore conditions for this PR ensure that Dependabot will resolve any conflicts that may arise and can be manually triggered with the @dependabot rebase command.

Dependency updates:

  • Bump actions/checkout from 4.2.0 to 4.2.1 (#304).
  • Updated databrickslabs/sandbox requirement to acceptance/v0.3.0 (#305).
  • Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 (#307).
  • Bump actions/checkout from 4.2.1 to 4.2.2 (#310).
  • Bump databrickslabs/sandbox from acceptance/v0.3.1 to 0.4.2 (#315).

0.12.1

  • Bump actions/checkout from 4.1.7 to 4.2.0 (#295). In this version 4.2.0 release of the actions/checkout library, the team has added Ref and Commit outputs, which provide the ref and commit that were checked out, respectively. The update also includes dependency updates to braces, minor-npm-dependencies, docker/build-push-action, and docker/login-action, all of which were automatically resolved by Dependabot. These updates improve compatibility and stability for users of the library. This release is a result of contributions from new team members @yasonk and @lucacome. Users can find a detailed commit history, pull requests, and release notes in the associated links. The team strongly encourages all users to upgrade to this new version to access the latest features and improvements.
  • Set catalog on SchemaDeployer to overwrite the default hive_metastore (#296). In this release, the default catalog for SchemaDeployer has been changed from hive_metastore to a user-defined catalog, allowing for more flexibility in deploying resources to different catalogs. A new dependency, databricks-labs-pytester, has been added with a version constraint of >=0.2.1, which may indicate the introduction of new testing functionality. The SchemaDeployer class has been updated to accept a catalog parameter and the tests for deploying and deleting schemas, tables, and views have been updated to reflect these changes. The test_deploys_schema, test_deploys_dataclass, and test_deploys_view tests have been updated to accept a inventory_catalog parameter, and the caplog fixture is used to capture log messages and assert that they contain the expected messages. Additionally, a new test function test_statement_execution_backend_overwrites_table has been added to the tests/integration/test_backends.py file to test the functionality of the StatementExecutionBackend class in overwriting a table in the database and retrieving the correct data. Issue #294 has been resolved, and progress has been made on issue #278, but issue #280 has been marked as technical debt and issue #287 is required for the CI to pass.

Dependency updates:

  • Bump actions/checkout from 4.1.7 to 4.2.0 (#295).

0.12.0

  • Added method to detect rows are written to the MockBackend (#292). In this commit, the MockBackend class in the 'backends.py' file has been updated with a new method, 'has_rows_written_for', which allows for differentiation between a table that has never been written to and one with zero rows. This method checks if a specific table has been written to by iterating over the table stubs in the _save_table attribute and returning True if the given full name matches any of the stub full names. Additionally, the class has been supplemented with the rows_written_for method, which takes a table name and mode as input and returns a list of rows written to that table in the given mode. Furthermore, several new test cases have been added to test the functionality of the MockBackend class, including checking if the has_rows_written_for method correctly identifies when there are no rows written, when there are zero rows written, and when rows are written after the first and second write operations. These changes improve the overall testing coverage of the project and aid in testing the functionality of the MockBackend class. The new methods are accompanied by documentation strings that explain their purpose and functionality.

0.11.0

  • Added filter spec implementation (#276). In this commit, a new FilterHandler class has been introduced to handle filter files with the suffix .filter.json, which can parse filter specifications in the header of the filter file and validate the filter columns and types. The commit also adds support for three types of filters: DATE_RANGE_PICKER, MULTI_SELECT, and DROPDOWN, which can be linked with multiple visualization widgets. Additionally, a FilterTile class has been added to the Tile class, which represents a filter tile in the dashboard and includes methods to validate the tile, create widgets, and generate filter encodings and queries. The DashboardMetadata class has been updated to include a new method get_datasets() to retrieve the datasets for the dashboard. These changes enhance the functionality of the dashboard by adding support for filtering data using various filter types and linking them with multiple visualization widgets, improving the customization and interactivity of the dashboard, and making it more user-friendly and efficient.
  • Bugfix: MockBackend wasn't mocking savetable properly when the mode is append (#289). This release includes a bugfix and enhancements for the MockBackend component, which is used to mock the SQLBackend. The .savetable() method failed to function as expected in append mode, writing all rows to the same table instead of accumulating them. This bug has been addressed, ensuring that rows accumulate correctly in append mode. Additionally, a new test function, test_mock_backend_save_table_overwrite(), has been added to demonstrate the corrected behavior of overwrite mode, showing that it now replaces only the existing rows for the given table while preserving other tables' contents. The type signature for .save_table() has been updated, restricting the mode parameter to accept only two string literals: "append" and "overwrite". The MockBackend behavior has been updated accordingly, and rows are now filtered to exclude any None or NULL values prior to saving. These improvements to the MockBackend functionality and test suite increase reliability when using the MockBackend as a testing backend for the system.
  • Changed filter spec to use YML instead of JSON (#290). In this release, the filter specification files have been converted from JSON to YAML format, providing a more human-readable format for the filter specifications. The schema for the filter file includes flags for column, columns, type, title, description, order, and id, with the type flag taking on values of DROPDOWN, MULTI_SELECT, or DATE_RANGE_PICKER. This change impacts the FilterHandler, is_filter method, and _from_dashboard_folder method, as well as relevant parts of the documentation. Additionally, the parsing methods have been updated to use yaml.safe_load instead of json.loads, and the is_filter method now checks for .filter.yml suffix. A new file, '00_0_date.filter.yml', has been added to the 'tests/integration/dashboards/filter_spec_basic' directory, containing a sample date filter definition. Furthermore, various tests have been added to validate filter specifications, such as checking for invalid type and both column and columns keys being present. These updates aim to enhance readability, maintainability, and ease of use for filter configuration.
  • Increase testing of generic types storage (#282). A new commit enhances the testing of generic types storage by expanding the test suite to include a list of structs, ensuring more comprehensive testing of the system. The Foo struct has been renamed to Nested for clarity, and two new structs, NestedWithDict and Nesting, have been added. The Nesting struct contains a Nested object, while NestedWithDict includes a string and an optional dictionary of strings. A new test case demonstrates appending complex types to a table by creating and saving a table with two rows, each containing a Nesting struct. The test then fetches the data and asserts the expected number of rows are returned, ensuring the proper functioning of the storage system with complex data types.
  • Minor Changes to avoid redundancy in code and follow code patterns (#279). In this release, we have made significant improvements to the dashboards.py file to make the code more concise, maintainable, and in line with the standard library's recommended usage. The export_to_zipped_csv method has undergone major changes, including the removal of the BytesIO module import and the use of StringIO for handling strings as files. The method no longer creates a separate ZIP file for the CSV files, instead using the provided export_path. Additionally, the method skips tiles that don't contain queries. We have also introduced a new method, dataclass_transform, which transforms a given dataclass into a new one with specific attributes and behavior. This method creates a new dataclass with a custom metaclass and adds a new method, to_dict(), which converts the instances of the new dataclass to dictionaries. These changes promote code reusability and reduce redundancy in the codebase, making it easier for software engineers to work with.
  • New example with bar chart in dashboards-as-code (#281). A new example of a dashboard featuring a bar chart has been added to the dashboards-as-code feature using the existing metadata overrides feature to support the new widget type, without bloating the TileMetadata structure. An integration test was added to demonstrate the creation of a bar chart, and the resulting dashboard can be seen in the attached screenshot. Additionally, a new SQL file has been added for the Product Sales dashboard, showcasing sales data for different product categories. This approach can potentially be used to support other widget types such as Bar, Pivot, Area, etc. The team is encouraged to provide feedback on this proposed solution.

0.10.0

  • Added Functionality to export any dashboards-as-code into CSV (#269). The DashboardMetadata class now includes a new method, export_to_zipped_csv, which enables exporting any dashboard as CSV files in a ZIP archive. This method accepts sql_backend and export_path as parameters and exports dashboard queries to CSV files in the specified ZIP archive by iterating through tiles and fetching dashboard queries if the tile is a query. To ensure the proper functioning of this feature, unit tests and manual testing have been conducted. A new test, test_dashboards_export_to_zipped_csv, has been added to verify the correct export of dashboard data to a CSV file.
  • Added support for generic types in SqlBackend (#272). In this release, we've added support for using rich dataclasses, including those with optional and generic types, in the SqlBackend of the StatementExecutionBackend class. The new functionality is demonstrated in the test_supports_complex_types unit test, which creates a Nested dataclass containing various complex data types, such as nested dataclasses, datetime objects, dict, list, and optional fields. This enhancement is achieved by updating the save_table method to handle the conversion of complex dataclasses to SQL statements. To facilitate type inference, we've introduced a new StructInference class that converts Python dataclasses and built-in types to their corresponding SQL Data Definition Language (DDL) representations. This addition simplifies data definition and manipulation operations while maintaining type safety and compatibility with various SQL data types.

0.9.3

  • Added documentation for exclude flag (#265). A new exclude flag has been added to the configuration file for our lab tool, allowing users to specify a path to exclude from formatting during lab execution. This release also includes corrections to grammatical errors in the descriptions of existing flags related to catalog and database settings, such as updating seperated to "separate". Additionally, the flag descriptions for publish and open-browser have been updated for clarification: publish now clearly controls whether the dashboard is published after creation, while open-browser controls whether the dashboard is opened in a web browser. These changes are aimed at improving user experience and ease of use for our lab tool.
  • Fixed dataclass field type in _row_to_sql (#266). In this release, we have addressed an issue related to #257 by fixing the dataclass field type in the _row_to_sql method of the backends.py file. Additionally, we have made updates to the _schema_for method to use a new _field_type class method. This change resolves a rare problem where the field.type is a string instead of a type and ensures compatibility with a pull request from an external repository (databrickslabs/ucx#2526). The new _field_type method attempts to load the type from __builtins__ if it's a string and logs a warning if it fails. The _row_to_sql method now consistently uses the _field_type method to get the field type. This ensures that the library functions seamlessly and consistently, avoiding any potential issues in the future.

0.9.2

  • Make hatch a prerequisite (#259). In this commit, Eric Vergnaud has introduced a change to make the installation of hatch version 1.9.4 a prerequisite for the project to avoid errors related to pip command recognition. The Makefile has been updated to handle the installation of hatch automatically, and the hatch env create command is now used instead of pip install hatch==1.7.0. This change ensures that the development environment is consistent and reliable by maintaining the correct version of hatch and automatically handling its installation. Additionally, the .venv/bin/python and dev targets have been updated accordingly to reflect these changes. This commit also formats all files using the make dev fmt command, which helps maintain consistent code formatting throughout the project.
  • add support for exclusions in fmt command (#263). In this release, we have added support for exclusions to the fmt command in the 'databricks/labs/lsql/cli.py' module. This feature allows users to specify a list of directories or files to exclude while formatting SQL files, which is particularly useful when verifying SQL notebooks in ucx. The fmt command now accepts a new optional parameter 'exclude', which accepts an iterable of strings that specify the relative paths to exclude. Any sql_file that is a descendant of any exclusion is skipped during formatting. The exclusions are implemented by converting the relative paths into Path objects. This change addresses the issue where single line comments are converted into inlined comments, causing misinterpretation. The added unit test is manually verified, and this pull request fixes issue #261. This feature was authored and co-authored by Eric Vergnaud.

0.9.1

  • Fixed dataclass field types (#257). This PR introduces a workaround to a Python bug affecting the dataclasses.fields() function, which sometimes returns field types as string type names instead of types. This can cause the ORM to malfunction. The workaround involves checking if the returned f.type is a string, and if so, converting it to a type by looking it up in the __builtins__ dictionary. This change is global and affects the _schema_for function in the backends.py file, which is responsible for creating a schema for a given dataclass, taking into account any necessary type conversions. This change ensures consistent and accurate type handling in the face of the Python bug, improving the reliability of our ORM.
  • Fixed missing EOL when formatting SQL files (#260). In this release, we have addressed an issue related to the inconsistent addition of end-of-line (EOL) characters in formatted SQL files. The QueryTile.format() method has been updated to ensure that an EOL character is always added, except when the input query already ends with a newline. This change enhances the reliability of the SQL formatting functionality, making the output format more predictable and improving the overall user experience. The new implementation is demonstrated in the test_query_format_preserves_eol() test case, and existing test cases have been updated to check for the presence of EOL characters, further ensuring consistent and correct formatting.
  • Fixed normalize case input in cli (#258). In this release, we have updated the fmt command in the cli.py file to allow users to specify whether they want to normalize the case of SQL files when formatting. The normalize_case parameter now defaults to the string "true" and checks if it is in the STRING_AFFIRMATIVES list to determine whether to normalize the case of SQL files. Additionally, we have introduced a new optional normalize_case parameter in the format method of the dashboards.py file in the Databricks CLI, which normalizes the identifiers in the query to lower case when set to True. We have also added support for a new normalize_case parameter in the QueryTile.format() method, which prevents the automatic normalization of string input to uppercase when set to False. This change allows for more flexibility in handling string input and ensures that the input string is preserved as-is. These updates improve the functionality and usability of the open-source library, providing more control to users over formatting and handling of string input.

0.9.0

  • Added design for filter file (#251). A new feature has been added to enable the creation of filters for multiple widgets in a dashboard using a .filter.json file. This file allows users to specify columns to be filtered, the filter type, title, description, order, and a unique ID for each filter. Both the column and columns flags are supported, with the former taking a single string and the latter taking a list of strings. The filter type can be set to a drop-down menu or another type as desired. The .filter.json file schema also supports optional title and description strings, as well as order and ID flags. An example of a .filter.json file is provided in the commit message. Additionally, the dashboard.yml file documentation has been updated to include information on how to use the new .filter.json file.
  • adding normalize-case option to databricks labs lsql fmt cmd (#254). In this open-source library release, the databricks labs lsql tool's fmt command now supports a new flag, normalize-case. This flag allows users to control the normalization of query text to lowercase, providing more flexibility when formatting SQL queries. By default, query text is still normalized to lowercase, but users can now prevent this behavior by setting the normalize-case flag to False. This change addresses an issue where some queries are case sensitive, such as those using map field keys in UCX dashboards. Additionally, a new parameter normalize_case has been added to the format method in the dashboards.py file, with updated method documentation. A new test function, test_query_formats_no_normalize(), has also been included to ensure consistent formatter behavior.

0.8.0

  • Removed deploy_dashboard method (#240). In this release, the deploy_dashboard method has been removed from the dashboards.py file and the legacy deployment method has been deprecated. The deploy_dashboard method was previously used to deploy a dashboard to a workspace, but it has been replaced with the create method of the lakeview attribute of the WorkspaceClient object. Additionally, the test_dashboards_creates_dashboard_via_legacy_method method has been removed. A new test has been added to ensure that the deploy_dashboard method is no longer being used, utilizing the deprecated_call function from pytest to verify that calling the method raises a deprecation warning. This change simplifies the code and improves the overall design of the system, resolving issue #232. The _with_better_names method and create_dashboard method remain unchanged.
  • Skip test that fails due to insufficient permission to create schema (#248). A new test function, test_dashboards_creates_dashboard_with_replace_database, has been added to the open-source library, but it is currently marked to be skipped due to missing permissions to create a schema. This function creates an instance of the Dashboards class with the ws parameter, creates a dashboard using the make_dashboard function, and performs various actions using the created dashboard, as well as functions such as tmp_path and sql_backend. This test function aims to ensure that the Dashboards class functions as expected when creating a dashboard with a replaced database. Once the necessary permissions for creating a schema are acquired, this test function can be enabled for further testing and validation.
  • Updates to use the Databricks Python sdk 0.30.0 (#247). In this release, we have updated the project to use Databricks Python SDK version 0.30.0. This update includes changes to the execute and fetch_value functions, which now use the new StatementResponse type instead of ExecuteStatementResponse. A conditional import statement has been added to maintain compatibility with both Databricks SDK versions 0.30.0 and below. The execute function now raises TimeoutError when the specified timeout is greater than 50 seconds and the statement execution hasn't finished. Additionally, the fetch_value function has been updated to handle the case when the execute function returns None. The unit test file test_backends.py has also been updated to reflect these changes, with multiple test functions now using the StatementResponse class instead of ExecuteStatementResponse. These changes improve the system's compatibility with the latest version of the Databricks SDK, ensuring that the core functionality of the SDK continues to work as expected.

0.7.5

  • Fixed missing widget name suffixes (#243). In this release, we have addressed an issue related to missing widget name suffixes (#243) by adding a _widget suffix to the name of the widget object in the dashboards.py file. This change ensures consistency between the widget name and the id of the query, facilitating user understanding of the relationship between the two. A new method, _get_query_widget_spec, has also been added, although its specific functionality requires further investigation. Additionally, the unit tests in the tests/unit/test_dashboards.py file have been updated to check for the presence of the _widget suffix in widget names, ensuring that the tests accurately reflect the desired behavior. These changes improve the consistency of dashboard widget naming, thus benefiting software engineers utilizing or extending the project's widget-ordering functionalities.

0.7.4

  • Fixed dataset/widget name uniqueness requirement that was preventing dashboards being deployed (#241). A fix has been implemented to address a uniqueness requirement issue with the dataset/widget name that was preventing dashboard deployment. A new widget instance is now created with a unique name, generated by appending _widget to the metadata ID, in the get_layouts method. This ensures that multiple widgets with the same ID but different content can exist in a single dashboard, thereby meeting the name uniqueness requirement. In the save_to_folder method, the widget name is modified by removing the _widget suffix before writing the textbox specification to a markdown file, maintaining consistency between the widget ID and file name. These changes are localized to the get_layouts and save_to_folder methods, and no new methods have been added. The existing functionality related to the creation, validation, and saving of dashboard layouts remains unaltered.

0.7.3

  • Added publish flag to Dashboards.create_dashboard (#233). In this release, we have added a publish flag to the Dashboards.create_dashboard method, allowing users to publish the dashboard upon creation, thereby resolving issue #219. This flag is included in the labs.yml file with a description of its functionality. Additionally, the no-open flag's description has been updated to specify that it prevents the dashboard from opening in the browser after creation. The create_dashboard function in the cli.py and dashboards.py files has been updated to include the new publish flag, allowing for more flexibility in how users create and manage their dashboards. The Dashboards.create_dashboard method now calls the WorkspaceClient.lakeview.publish method when the publish flag is set to True, which publishes the created dashboard. This behavior is covered in the updated tests for the method.
  • Fixed boolean cli flags (#235). In this release, we have improved the handling of command-line interface (CLI) flags in the databricks labs command. Specifically, we have addressed the limitation that pure boolean flags are not supported. Now, when using boolean flags, the user will be prompted to confirm with a y or 'yes'. We have modified the create_dashboard command to accept string inputs for the publish and no_open flags, which are then converted to boolean values for internal use. Additionally, we have introduced a new open-browser command, which will open the dashboard in the browser after creating when set to y or 'yes'. These changes have been tested manually to ensure correct behavior. This improvement provides a more flexible input experience and better handling of boolean flags in the CLI command for software engineers using the open-source library.
  • Fixed format breaks widget (#238). In this release, we've made significant changes to the 'databricks/labs/lsql' directory's 'dashboards.py' file to address formatting breaks in the widget that could occur with Call to Action (CTA) presence in a query. These changes include the addition of new class variables, including _SQL_DIALECT and _DIALECT, and their integration into existing methods such as _parse_header, validate, format, _get_abstract_syntax_tree, and replace_catalog_and_database_in_query. Furthermore, we have developed new methods for creating and deleting schemas and getting the current test purge time. We have also implemented new integration tests to demonstrate the fix for the formatting issue and added new test cases for the query handler's header-splitting functionality, query formatting, and CTE handling. These enhancements improve the library's handling of SQL queries and query tiles in the context of dashboard creation, ensuring proper parsing, formatting, and metadata extraction for a wide range of query scenarios.
  • Fixed replace database when catalog or database is None (#237). In this release, we have addressed an issue where system tables disappeared in ucx dashboards when replacing the placeholder database. To rectify this, we have developed a new method, replace_catalog_and_database_in_query, in the dashboards.py file's replace_database function. This method checks if the catalog or database in a query match the ones to be replaced and replaces them with new ones, ensuring that system tables are not lost during the replacement process. Additionally, we have introduced new unit tests in test_dashboards.py to verify that queries are correctly transformed when replacing the database or catalog in the query. These tests include various scenarios, using two parametrized test functions, to ensure the correct functioning of the feature. This change provides a more robust and reliable dashboard display when replacing the placeholder database in the system.

0.7.2

  • Fixed dashboard deployment/creation (#230). The recent changes to our open-source library address issues related to dashboard deployment and creation, enhancing their reliability and consistency. The deploy_dashboard function has been deprecated in favor of the more accurate create_dashboard function, which now includes a publish flag. A validate method has been added to the Tile, MarkdownTile, and QueryTile classes to raise an error if the dashboard is invalid. The test_dashboards.py file has been updated to reflect these changes. These enhancements address issues #222, #229, and partially resolve #220. The commit includes an image of a dashboard created through the deprecated deploy_dashboard method. These improvements ensure better dashboard creation, validation, and deployment, while also maintaining backward compatibility through the deprecation of deploy_dashboard.

0.7.1

  • Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 (#224). In version 3.0.0 of sigstore/gh-action-sigstore-python, several changes, additions, and removals have been implemented. Notably, certain settings such as fulcio-url, rekor-url, ctfe, and rekor-root-pubkey have been removed. Additionally, the output settings signature, certificate, and bundle have also been removed. The inputs are now parsed according to POSIX shell lexing rules for better consistency. The release-signing-artifacts setting no longer causes a hard error when used under the incorrect event. Furthermore, various deprecations present in sigstore-python's 2.x series have been resolved. The default suffix has been changed from .sigstore to .sigstore.json, in line with Sigstore's client specification. The release-signing-artifacts setting now defaults to true. This version also includes several bug fixes and improvements to support CI runners that use PEP 668 to constrain global package prefixes.
  • Use default factory to create Tile._position (#226). In this change, the default value creation for the _position field in various classes including Tile, MarkdownTile, TableTile, and CounterTile has been updated. Previously, a new Position object was explicitly created for the default value. With this update, the default_factory argument of the dataclasses.field function is now used to create a new Position object. This change is made in anticipation of the Python 3.11 release, which modifies the field default mutability check behavior. By utilizing the default_factory approach, we ensure that a new Position object is generated during each instance creation, rather than reusing a single default instance. This guarantees the immutability of default values and aligns with best practices for forward-compatibility with future Python versions. It is important to note that this modification does not affect the functionality of the classes but enhances their initialization process.

Dependency updates:

  • Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 (#224).

0.7.0

  • Added databricks labs lsql fmt command (#221). The commit introduces a new command, databricks labs lsql fmt, to the open-source library, which formats SQL files in a given folder using the Databricks SDK. This command can be used without authentication and accepts a folder flag, which specifies the directory containing SQL files to format. The change also updates the labs.yml file and includes a new method, format, in the QueryTile class, which formats SQL queries using the sqlglot library. This commit enhances the functionality of the CLI for SQL file formatting and improves the readability and consistency of SQL files, making it easier for developers to understand and maintain the code. Additionally, the commit includes changes to various SQL files to demonstrate the improved formatting, such as converting SQL keywords to uppercase, adding appropriate spacing around keywords and operators, and aligning column names in the VALUES clause. The purpose of this change is to ensure that the formatting method works correctly and does not introduce any issues in the existing functionality.

0.6.0

  • Added method to dashboards to get dashboard url (#211). In this release, we have added a new method get_url to the lakeview_dashboards object in the laksedashboard library. This method utilizes the Databricks SDK to retrieve the dashboard URL, simplifying the code and making it more maintainable. Previously, the dashboard URL was constructed by concatenating the host and dashboard ID, but this new method ensures that the URL is obtained correctly, even if the format changes in the future. Additionally, a new unit test has been added for a method that gets the dashboard URL using the workspace client. This new functionality allows users to easily retrieve the URL for a dashboard using its ID and the workspace client.
  • Extend replace database in query (#210). This commit extends the database replacement functionality in the DashboardMetadata class, allowing users to specify which database and catalog to replace. The enhancement includes support for catalog replacement and a new replace_database method in the DashboardMetadata class, which replaces the catalog and/or database in the query based on provided parameters. These changes enhance the flexibility and customization of the database replacement feature in queries, making it easier for users to control how their data is displayed in the dashboard. The create_dashboard function has also been updated to use the new method for replacing the database and catalog. Additionally, the TileMetadata update method has been replaced with a new merge method, and the QueryTile and Tile classes have new properties and methods for handling content, width, height, and position. The commit also includes several unit tests to ensure the new functionality works as expected.
  • Improve object oriented dashboard-as-code implementation (#208). In this release, the object-oriented implementation of the dashboard-as-code feature has been significantly improved, addressing previous pull request comments (#201). The TileMetadata dataclass now includes methods for updating and comparing tile metadata, and the DashboardMetadata class has been removed and its functionality incorporated into the Dashboards class. The Dashboards class now generates tiles, datasets, and layouts for dashboards using the provided query_transformer. The code's readability and maintainability have been further enhanced by replacing the use of the copy module with dataclasses.replace for creating object copies. Additionally, updates have been made to the unit tests for dashboard functionality in the project, with new methods and attributes added to check for valid dashboard metadata and handle duplicate query or widget IDs, as well as to specify the order in which tiles and widgets should be displayed in the dashboard.

0.5.0

  • Added Command Execution backend which uses Command Execution API on a cluster (#95). In this release, the databricks labs lSQL library has been updated with a new Command Execution backend that utilizes the Command Execution API. A new CommandExecutionBackend class has been implemented, which initializes a CommandExecutor instance taking a cluster ID, workspace client, and language as parameters. The execute method runs SQL commands on the specified cluster, and the fetch method returns the query result as an iterator of Row objects. The existing StatementExecutionBackend class has been updated to inherit from a new abstract base class called ExecutionBackend, which includes a save_table method for saving data to tables and is meant to be a common base class for both Statement and Command Execution backends. The StatementExecutionBackend class has also been updated to use the new ExecutionBackend abstract class and its constructor now accepts a max_records_per_batch parameter. The execute and fetch methods have been updated to use the new _only_n_bytes method for logging truncated SQL statements. Additionally, the CommandExecutionBackend class has several methods, execute, fetch, and save_table to execute commands on a cluster and save the results to tables in the databricks workspace. This new backend is intended to be used for executing commands on a cluster and saving the results in a databricks workspace.
  • Added basic integration with Lakeview Dashboards (#66). In this release, we've added basic integration with Lakeview Dashboards to the project, enhancing its capabilities. This includes updating the databricks-labs-blueprint dependency to version 0.4.2 with the [yaml] extra, allowing for additional functionality related to handling YAML files. A new file, dashboards.py, has been introduced, providing a class for interacting with Databricks dashboards, along with methods for retrieving and saving dashboard configurations. Additionally, a new __init__.py file under the src/databricks/labs/lsql/lakeview directory imports all classes and functions from the model.py module, providing a foundation for further development and customization. The release also introduces a new file, model.py, containing code generated from OpenAPI specs by the Databricks SDK Generator, and a template file, model.py.tmpl, used for handling JSON data during integration with Lakeview Dashboards. A new file, polymorphism.py, provides utilities for checking if a value can be assigned to a specific type, supporting correct data typing and formatting with Lakeview Dashboards. Furthermore, a .gitignore file has been added to the tests/integration directory as part of the initial steps in adding integration testing to ensure compatibility with the Lakeview Dashboards platform. Lastly, the test_dashboards.py file in the tests/integration directory contains a function, test_load_dashboard(ws), which uses the Dashboards class to save a dashboard from a source to a destination path, facilitating testing during the integration process.
  • Added dashboard-as-code functionality (#201). This commit introduces dashboard-as-code functionality for the UCX project, enabling the creation and management of dashboards using code. The feature resolves multiple issues and includes a new create-dashboard command for creating unpublished dashboards. The functionality is available in the lsql lab and allows for specifying the order and width of widgets, overriding default widget identifiers, and supporting various SQL and markdown header arguments. The dashboard.yml file is used to define top-level metadata for the dashboard. This commit also includes extensive documentation and examples for using the dashboard as a library and configuring different options.
  • Automate opening integration test dashboard in debug mode (#167). A new feature has been added to automatically open the integration test dashboard in debug mode, making it easier for software engineers to debug and troubleshoot. This has been achieved by importing the webbrowser and is_in_debug modules from "databricks.labs.blueprint.entrypoint", and adding a check in the create function to determine if the code is running in debug mode. If it is, a dashboard URL is constructed from the workspace configuration and dashboard ID, and then opened in a web browser using "webbrowser.open". This allows for a more streamlined debugging process for the integration test dashboard. No other parts of the code have been affected by this change.
  • Automatically tile widgets (#109). In this release, we've introduced an automatic widget tiling feature for the dashboard creation process in our open-source library. The Dashboards class now includes a new class variable, _maximum_dashboard_width, set to 6, representing the maximum width allowed for each row of widgets in the dashboard. The create_dashboard method has been updated to accept a new self parameter, turning it into an instance method. A new _get_position method has been introduced to calculate and return the next available position for placing a widget, and a _get_width_and_height method has been added to return the width and height for a widget specification, initially handling CounterSpec instances. Additionally, we've added new unit tests to improve testing coverage, ensuring that widgets are created, positioned, and sized correctly. These tests also cover the correct positioning of widgets based on their order and available space, as well as the expected width and height for each widget.
  • Bump actions/checkout from 4.1.3 to 4.1.6 (#102). In the latest release, the 'actions/checkout' GitHub Action has been updated from version 4.1.3 to 4.1.6, which includes checking the platform to set the archive extension appropriately. This release also bumps the version of github/codeql-action from 2 to 3, actions/setup-node from 1 to 4, and actions/upload-artifact from 2 to 4. Additionally, the minor-actions-dependencies group was updated with two new versions. Disabling extensions.worktreeConfig when disabling sparse-checkout was introduced in version 4.1.4. The release notes and changelog for this update can be found in the provided link. This commit was made by dependabot[bot] with contributions from cory-miller and jww3.
  • Bump actions/checkout from 4.1.6 to 4.1.7 (#151). In the latest release, the 'actions/checkout' GitHub action has been updated from version 4.1.6 to 4.1.7 in the project's push workflow, which checks out the repository at the start of the workflow. This change brings potential bug fixes, performance improvements, or new features compared to the previous version. The update only affects the version number in the YAML configuration for the 'actions/checkout' step in the release.yml file, with no new methods or alterations to existing functionality. This update aims to ensure a smooth and enhanced user experience for those utilizing the project's push workflows by taking advantage of the possible improvements or bug fixes in the new version of 'actions/checkout'.
  • Create a dashboard with a counter from a single query (#107). In this release, we have introduced several enhancements to our dashboard-as-code approach, including the creation of a Dashboards class that provides methods for getting, saving, and deploying dashboards. A new method, create_dashboard, has been added to create a dashboard with a single page containing a counter widget. The counter widget is associated with a query that counts the number of rows in a specified dataset. The deploy_dashboard method has also been added to deploy the dashboard to the workspace. Additionally, we have implemented a new feature for creating dashboards with a counter from a single query, including modifications to the test_dashboards.py file and the addition of four new tests. These changes improve the robustness of the dashboard creation process and provide a more automated way to view important metrics.
  • Create text widget from markdown file (#142). A new feature has been implemented in the library that allows for the creation of a text widget from a markdown file, enhancing customization and readability for users. This development resolves issue #1
  • Design document for dashboards-as-code (#105). "The latest release introduces 'Dashboards as Code,' a method for defining and managing dashboards through configuration files, enabling version control and controlled changes. The building blocks include .sql, .md, and dashboard.yml files, with .sql defining queries and determining tile order, and dashboard.yml specifying top-level metadata and tile overrides. Metadata can be inferred or explicitly defined in the query or files. The tile order can be determined by SQL file order, tiles order in dashboard.yml, or SQL file metadata. This project can also be used as a library for embedding dashboard generation in your code. Configuration precedence follows command-line flags, SQL file headers, dashboard.yml, and SQL query content. The command-line interface is utilized for dashboard generation from configuration files."
  • Ensure propagation of lsql version into User-Agent header when it is used as library (#206). In this release, the pyproject.toml file has been updated to ensure that the correct version of the lsql library is propagated into the User-Agent header when used as a library, improving attribution. The databricks-sdk version has been updated from 0.22.0 to 0.29.0, and the __init__.py file of the lsql library has been modified to add the with_user_agent_extra function from the databricks.sdk.core package for correct attribution. The backends.py file has also been updated with improved type handling in the _row_to_sql and save_table functions for accurate SQL insertion and handling of user-defined classes. Additionally, a test has been added to ensure that the lsql version is correctly propagated in the User-Agent header when used as a library. These changes offer improved functionality and accurate type handling, making it easier for developers to identify the library version when used in other projects.
  • Fixed counter encodings (#143). In this release, we have improved the encoding of counters in the lsql dashboard by modifying the create_dashboard function in the dashboards.py file. Previously, the counter field encoding was hardcoded as "count," but has been changed to dynamically determine the first field name of the given fields, ensuring that counters are expected to have only one field. Additionally, a new integration test has been added to the tests/integration/test_dashboards.py file to ensure that the dashboard deployment functionality correctly handles SQL queries that do not perform a count. A new test for the Dashboards class has also been added to check that counter field encoding names are created as expected. The WorkspaceClient is mocked and not called in this test. These changes enhance the accuracy of counter encoding and improve the overall functionality and reliability of the lsql dashboard.
  • Fixed non-existing reference and typo in the documentation (#104). In this release, we've made improvements to the documentation of our open-source library, specifically addressing issue #104. The changes include fixing a non-existent reference and a typo in the Library size comparison section of the "comparison.md" document. This section provides guidance for selecting a library based on factors like library size, unified authentication, and compatibility with various Databricks warehouses and SQL Python APIs. The updates clarify the required dependency size for simple applications and scripts, and offer more detailed information about each library option. We've also added a new subsection titled Detailed comparison to provide a more comprehensive overview of each library's features. These changes are intended to help software engineers better understand which library is best suited for their specific needs, particularly for applications that require data transfer of large amounts of data serialized in Apache Arrow format and low result fetching latency, where we recommend using the Databricks SQL Connector for Python for efficient data transfer and low latency.
  • Fixed parsing message (#146). In this release, the warning message logged during the creation of a dashboard when a ParseError occurs has been updated to provide clearer and more detailed information about the parsing error. The new error message now includes the specific query being parsed and the exact parsing error, enabling developers to quickly identify the cause of parsing issues. This change ensures that engineers can efficiently diagnose and address parsing errors, improving the overall development and debugging experience with a more informative log format: "Parsing {query}: {error}".
  • Improve dashboard as code (#108). The Dashboards class in the 'dashboards.py' file has been updated to improve functionality and usability, with changes such as the addition of a type variable T for type checking and more descriptive names for methods. The save_to_folder method now accepts a Dashboard object and returns a Dashboard object, and a new static method create_dashboard has been added. Additionally, two new methods _with_better_names and _replace_names have been added for improved readability. The get_dashboard method now returns a Dashboard object instead of a dictionary. The save_to_folder method now also formats SQL code before saving it to file. These changes aim to enhance the functionality and readability of the codebase and provide more user-friendly methods for interacting with the Dashboards class. In addition to the changes in the Dashboards class, there have been updates in the organization of the project structure. The 'queries/counter.sql' file has been moved to 'dashboards/one_counter/counter.sql' in the 'tests/integration' directory. This modification enhances the organization of the project. Furthermore, several tests for the Dashboards class have been introduced in the 'databricks.labs.lsql.dashboards' module, demonstrating various functionalities of the class and ensuring that it functions as intended. The tests cover saving SQL and YML files to a specified folder, creating a dataset and a counter widget for each query, deploying dashboards with a given display name or dashboard ID, and testing the behavior of the save_to_folder and deploy_dashboard methods. Lastly, the commit removes the test_load_dashboard function and updates the test_dashboard_creates_one_dataset_per_query and test_dashboard_creates_one_counter_widget_per_query functions to use the updated Dashboard class. A new replace_recursively function is introduced to replace specific fields in a dataclass recursively. A new test function test_dashboards_deploys_exported_dashboard_definition has been added, which reads a dashboard definition from a JSON file, deploys it, and checks if it's successfully deployed using the Dashboards class. A new test function test_dashboard_deploys_dashboard_the_same_as_created_dashboard has also been added, which compares the original and deployed dashboards to ensure they are identical. Overall, these changes aim to improve the functionality and readability of the codebase and provide more user-friendly methods for interacting with the Dashboards class, as well as enhance the organization of the project structure and add new tests for the Dashboards class to ensure it functions as intended.
  • Infer fields from a query (#111). The Dashboards class in the dashboards.py file has been updated with the addition of a new method, _get_fields, which accepts a SQL query as input and returns a list of Field objects using the sqlglot library to parse the query and extract the necessary information. The create_dashboard method has been modified to call this new function when creating Query objects for each dataset. If a ParseError occurs, a warning is logged and iteration continues. This allows for the automatic population of fields when creating a new dashboard, eliminating the need for manual specification. Additionally, new tests have been added for invalid queries and for checking if the fields in a query have the expected names. These tests include test_dashboards_skips_invalid_query and test_dashboards_gets_fields_with_expected_names, which utilize the caplog fixture and create temporary query files to verify functionality. Existing functionality related to creating dashboards remains unchanged.
  • Make constant all caps (#140). In this release, the project's 'dashboards.py' file has been updated to improve code readability and maintainability. A constant variable _maximum_dashboard_width has been changed to all caps, becoming '_MAXIMUM_DASHBOARD_WIDTH'. This modification affects the Dashboards class and its methods, particularly _get_fields and '_get_position'. The _get_position method has been revised to use the new all caps constant variable. This change ensures better visibility of constants within the code, addressing issue #140. It's important to note that this modification only impacts the 'dashboards.py' file and does not affect any other functionalities.
  • Read display name from dashboard.yml (#144). In this release, we have introduced a new DashboardMetadata dataclass that reads the display name of a dashboard from a dashboard.yml file located in the dashboard's directory. If the dashboard.yml file is absent, the folder name will be used as the display name. This change improves the readability and maintainability of the dashboard configuration by explicitly defining the display name and reducing the need to specify widget information in multiple places. We have also added a new fixture called make_dashboard for creating and cleaning up lakeview dashboards in the test suite. The fixture handles creation and deletion of the dashboard and provides an option to set a custom display name. Additionally, we have added and modified several unit tests to ensure the proper handling of the DashboardMetadata class and the dashboard creation process, including tests for missing, present, or incorrect display_name keys in the YAML file. The dashboards.deploy_dashboard() function has been updated to handle cases where only dashboard_id is provided.
  • Set widget id in query header (#154). In this release, we've made significant improvements to widget metadata handling in our open-source library. We've introduced a new WidgetMetadata class that replaces the previous WidgetMetadata dataclass, now featuring a path attribute, spec_type property, and optional parameters for order, width, height, and _id. The _get_widgets method has been updated to accept an Iterable of WidgetMetadata objects, and both _get_layouts and _get_widgets methods now sort widgets using the order field. A new class method, WidgetMetadata.from_path, handles parsing widget metadata from a file path, replacing the removed _get_width_and_height method. Additionally, the WidgetMetadata class is now used in the deploy_dashboard method, and the test suite for the dashboards module has been enhanced with updated test_widget_metadata_replaces_width_and_height and test_widget_metadata_replaces_attribute functions, as well as new tests for specific scenarios. Issue #154 has been addressed by setting the widget id in the query header, and the aforementioned changes improve flexibility and ease of use for dashboard development.
  • Use order key in query header if defined (#149). In this release, we've introduced a new feature to use an order key in the query header if defined, enhancing the flexibility and control over the dashboard creation process. The WidgetMetadata dataclass now includes an optional order parameter of type int, and the _get_arguments_parser() method accepts the --order flag with type int. The replace_from_arguments() method has been updated to support the new order parameter, with a default value of self.order. The create_dashboard() method now implements a new _get_datasets() method to retrieve datasets from the dashboard folder and introduces a _get_widgets() method, which accepts a list of files, iterates over them, and yields tuples containing widgets and their corresponding metadata, including the order. These improvements enable the use of an order key in query headers, ensuring the correct order of widgets in the dashboard creation process. Additionally, a new test case has been added to verify the correct behavior of the dashboard deployment with a specified order key in the query header. This feature resolves issue #148.
  • Use widget width and height defined in query header (#147). In this release, the handling of metadata in SQL files has been updated to utilize the header of the file, instead of the first line, for improved readability and flexibility. This change includes a new WidgetMetadata class for defining the width and height of a widget in a dashboard, as well as new methods for parsing the widget metadata from a provided path. The release also includes updates to the documentation to cover the supported widget arguments -w or --width and '-h or --height', and resolves issue #114 by adding a test for deploying a dashboard with a big widget using a new function test_dashboard_deploys_dashboard_with_big_widget. Additionally, new test cases have been added for creating dashboards with custom-sized widgets based on query header width and height values, improving functionality and error handling.

Dependency updates:

  • Bump actions/checkout from 4.1.3 to 4.1.6 (#102).
  • Bump actions/checkout from 4.1.6 to 4.1.7 (#151).

0.4.3

  • Bump actions/checkout from 4.1.2 to 4.1.3 (#97). The actions/checkout dependency has been updated from version 4.1.2 to 4.1.3 in the update-main-version.yml file. This new version includes a check to verify the git version before attempting to disable sparse-checkout, and adds an SSH user parameter to improve functionality and compatibility. The release notes and CHANGELOG.md file provide detailed information on the specific changes and improvements. The pull request also includes a detailed commit history and links to corresponding issues and pull requests on GitHub for transparency. You can review and merge the pull request to update the actions/checkout dependency in your project.
  • Maintain PySpark compatibility for databricks.labs.lsql.core.Row (#99). In this release, we have added a new method asDict to the Row class in the databricks.labs.lsql.core module to maintain compatibility with PySpark. This method returns a dictionary representation of the Row object, with keys corresponding to column names and values corresponding to the values in each column. Additionally, we have modified the fetch function in the backends.py file to return Row objects of pyspark.sql when using self._spark.sql(sql).collect(). This change is temporary and marked with a TODO comment, indicating that it will be addressed in the future. We have also added error handling code in the fetch function to ensure the function operates as expected. The asDict method in this implementation simply calls the existing as_dict method, meaning the behavior of the asDict method is identical to the as_dict method. The as_dict method returns a dictionary representation of the Row object, with keys corresponding to column names and values corresponding to the values in each column. The optional recursive argument in the asDict method, when set to True, enables recursive conversion of nested Row objects to nested dictionaries. However, this behavior is not currently implemented, and the recursive argument is always False by default.

Dependency updates:

  • Bump actions/checkout from 4.1.2 to 4.1.3 (#97).

0.4.2

  • Added more NotFound error type (#94). In the latest update, the core.py file in the databricks/labs/lsql package has undergone enhancements to the error handling functionality. The _raise_if_needed function has been modified to raise a NotFound error when the error message includes the phrase "does not exist". This update enables the system to categorize specific SQL query errors as NotFound error messages, thereby improving the overall error handling and reporting capabilities. This change was a collaborative effort, as indicated by the co-authored-by statement in the commit.

0.4.1

  • Fixing ovewrite integration tests (#92). A new enhancement has been implemented for the overwrite feature's integration tests, addressing a concern with write operations. Two new variables, catalog and "schema", have been incorporated using the env_or_skip function. These variables are utilized in the save_table method, which is now invoked twice with the same table, once with the append and once with the overwrite option. The data in the table is retrieved and checked for accuracy after each call, employing the updated Row class with revised field names first and "second", formerly name and "id". This modification ensures the proper operation of the overwrite feature during integration tests and resolves any related issues. The commit message Fixing overwrite integration tests signifies this change.

0.4.0

  • Added catalog and schema parameters to execute and fetch (#90). In this release, we have added optional catalog and schema parameters to the execute and fetch methods in the SqlBackend abstract base class, allowing for more flexibility when executing SQL statements in specific catalogs and schemas. These updates include new method signatures and their respective implementations in the SparkSqlBackend and DatabricksSqlBackend classes. The new parameters control the catalog and schema used by the SparkSession instance in the SparkSqlBackend class and the SqlClient instance in the DatabricksSqlBackend class. This enhancement enables better functionality in multi-catalog and multi-schema environments. Additionally, this change comes with unit tests and integration tests to ensure proper functionality. The new parameters can be used when calling the execute and fetch methods. For example, with a SparkSqlBackend instance spark_backend, you can execute a SQL statement in a specific catalog and schema with the following code: spark_backend.execute("SELECT * FROM my_table", catalog="my_catalog", schema="my_schema"). Similarly, the fetch method can also be used with the new parameters.

0.3.1

  • Check UCX and LSQL for backwards compatibility (#78). In this release, we introduce a new GitHub Actions workflow, downstreams.yml, which automates unit testing for downstream projects upon changes made to the upstream project. The workflow runs on pull requests, merge groups, and pushes to the main branch and sets permissions for id-token, contents, and pull-requests. It includes a compatibility job that runs on Ubuntu, checks out the code, sets up Python, installs the toolchain, and accepts downstream projects using the databrickslabs/sandbox/downstreams action. The job matrix includes two downstream projects, ucx and remorph, and uses the build cache to speed up the pip install step. This feature ensures that changes to the upstream project do not break compatibility with downstream projects, maintaining a stable and reliable library for software engineers.
  • Fixed Builder object has no attribute sdk_config error (#86). In this release, we've resolved a Builder object has no attribute sdk_config error that occurred when initializing a Spark session using the DatabricksSession.builder method. The issue was caused by using dot notation to access the sdk_config attribute, which is incorrect. This has been updated to the correct syntax of sdkConfig. This change enables successful creation of the Spark session, preventing the error from recurring. The DatabricksSession class and its methods, such as getOrCreate, continue to be used for interacting with Databricks clusters and workspaces, while the WorkspaceClient class manages Databricks resources within a workspace.

Dependency updates:

  • Bump codecov/codecov-action from 1 to 4 (#84).
  • Bump actions/setup-python from 4 to 5 (#83).
  • Bump actions/checkout from 2.5.0 to 4.1.2 (#81).
  • Bump softprops/action-gh-release from 1 to 2 (#80).

0.3.0

  • Added support for save_table(..., mode="overwrite") to StatementExecutionBackend (#74). In this release, we've added support for overwriting a table when saving data using the save_table method in the StatementExecutionBackend. Previously, attempting to use the overwrite mode would raise a NotImplementedError. Now, when this mode is specified, the method first truncates the table before inserting the new rows. The truncation is done using the execute method to run a TRUNCATE TABLE SQL command. Additionally, we've added a new integration test, test_overwrite, to the test_deployment.py file to verify the new overwrite mode functionality. A new option, mode="overwrite", has been added to the save_table method, allowing for the existing data in the table to be deleted and replaced with the new data being written. We've also added two new test cases, test_statement_execution_backend_save_table_overwrite_empty_table and test_mock_backend_overwrite, to verify the new functionality. It's important to note that the method signature has been updated to include a default value for the mode parameter, setting it to append by default. This change does not affect the functionality and only provides a more convenient default behavior for users of the method.

0.2.5

  • Fixed PyPI badge (#72). In this release, we have implemented a fix to the PyPI badge in the README file of our open-source library. The PyPI badge displays the version of the package and serves as a quick reference for users. This fix ensures the accuracy and proper functioning of the badge, without involving any changes to the functionality or methods within the project. Software engineers can be assured that this update is limited to the README file, specifically the PyPI badge, and will not affect the overall functionality of the library.
  • Fixed no-cheat check (#71). In this release, we have made improvements to the no-cheat verification process for new code. Previously, the check for disabling the linter was prone to false positives when the string '# pylint: disable' appeared for reasons other than disabling the linter. The updated code now includes an additional filter to exclude the string CHEAT from the search, and the number of characters in the output is counted using the wc -c command. If the count is not zero, the script will terminate with an error message. This change enhances the accuracy of the no-cheat check, ensuring that the linter is being used correctly and that all new code meets our quality standards.
  • Removed upper bound on sqlglot dependency (#70). In this update, we have removed the upper bound on the sqlglot dependency version in the project's pyproject.toml file. Previously, the version constraint required sqlglot to be at least 22.3.1 but less than 22.5.0. With this modification, there will be no upper limit, enabling the project to utilize any version greater than or equal to 22.3.1. This change provides the project with the flexibility to take advantage of future bug fixes, performance improvements, and new features available in newer sqlglot package versions. Developers should thoroughly test the updated package version to ensure compatibility with the existing codebase.

0.2.4

  • Fixed Builder object is not callable error (#67). In this release, we have made an enhancement to the Backends class in the databricks/labs/lsql/backends.py file. The DatabricksSession.builder() method call in the __init__ method has been changed to DatabricksSession.builder. This update uses the builder attribute to create a new instance of DatabricksSession without calling it like a function. The sdk_config method is then used to configure the instance with the required settings. Finally, the getOrCreate method is utilized to obtain a SparkSession object, which is then passed as a parameter to the parent class constructor. This modification simplifies the code and eliminates the error caused by treating the builder attribute as a callable object. Software engineers may benefit from this change by having a more streamlined and error-free codebase when working with the open-source library.
  • Prevent silencing of pylint (#65). In this release, we have introduced a new job, "no-lint-disabled", to the GitHub Actions workflow for the repository. This job runs on the latest Ubuntu version and checks out the codebase with a full history. It verifies that no new instances of code suppressing pylint checks have been added, by filtering the differences between the current branch and the main branch for new lines of code, and then checking if any of those new lines contain a pylint disable comment. If any such lines are found, the job will fail and print a message indicating the offending lines of code, thereby ensuring that the codebase maintains a consistent level of quality by not allowing linting checks to be bypassed.
  • Updated _SparkBackend.fetch() to return iterator instead of list (#62). In this release, the fetch() method of the _SparkBackend class has been updated to return an iterator instead of a list, which can result in reduced memory usage and improved performance, as the results of the SQL query can now be processed one element at a time. A new exception has been introduced to wrap any exceptions that occur during query execution, providing better debugging and error handling capabilities. The test_runtime_backend_fetch() unit test has been updated to reflect this change, and users of the fetch() method should be aware that it now returns an iterator and must be consumed to obtain the desired data. Thorough testing is recommended to ensure that the updated method still meets the needs of the application.

0.2.3

  • Added support for common parameters in StatementExecutionBackend (#59). The StatementExecutionBackend class in the databricks.labs.lsql package's backends.py file now supports the passing of common parameters through keyword arguments (kwargs). This enhancement allows for greater customization and flexibility in the backend's operation, as the kwargs are passed to the StatementExecutionExt constructor. This change empowers users to control the behavior of the backend, making it more adaptable to various use cases. The key modification in this commit is the addition of the **kwargs parameter in the constructor signature and passing it to StatementExecutionExt, with no changes made to any methods within the class.

0.2.2

  • Updating packages. In this update, the dependencies specified in the pyproject.toml file have been updated to more recent versions. The outdated packages "databricks-labs-blueprint~=0.4.0" and "databricks-sdk~=0.21.0" have been replaced with "databricks-labs-blueprint>=0.4.2" and "databricks-sdk>=0.22.0", respectively. These updates are expected to bring new features and bug fixes to the software. The dependency sqlglot remains unchanged, with the same version requirement range of "sqlglot>=22.3.1,<22.5.0". These updates ensure that the software will function as intended, while also taking advantage of the enhancements provided by the more recent versions of the packages.

0.2.1

  • Fixed row converter to properly handle nullable values (#53). In this release, the row converter in the databricks.labs.lsql.core module has been updated to handle nullable values correctly. A new method StatementExecutionExt has been added, which manages the handling of nullable values during SQL statement execution. The Row class has also been modified to include nullable values, improving the robustness and flexibility of SQL execution in dealing with various data types, including null values. These enhancements increase the overall reliability of the system, making it more production-ready.
  • Improved integration test coverage (#52). In this release, the project's integration test coverage has been significantly improved through several changes. A new function, make_random(), has been added to the conftest.py file to generate a random string of fixed length, aiding in the creation of more meaningful and readable random strings for integration tests. A new file, test_deployment.py, has been introduced, containing a test function for deploying a database schema and verifying successful data retrieval via a view. The test_integration.py file has been renamed to test_core.py, with updates to the test_fetch_one function to test the fetch_one method using a SQL query with an aliased value. Additionally, a new Foo dataclass has been added to the tests/integration/views/__init__.py file, supporting integration test coverage. Lastly, a new SQL query has been added to the integration test suite, located in the some.sql file, which retrieves data from a table named foo in the inventory schema. These changes aim to enhance the overall stability, reliability, and coverage of the project's integration tests. Note: The changes to the .gitignore file and the improvements to the StatementExecutionBackend class in the backends.py file are not included in this summary, as they were described in the opening statement.
  • Rely on hatch being present on the build machine (#54). In this release, we have made significant changes to how we manage our build process and toolchain configuration. We have removed the need to manually install hatch version 1.7.0 in the build machine, and instead, rely on its presence, adding it to the list of required tools in the toolchain configuration. The command to create a virtual environment using hatch has also been added, and the pre_setup section no longer includes installing hatch, assuming its availability. We have also updated the hatch package version from 1.7.0 to 1.9.4, which may include bug fixes, performance improvements, or new features. This change may impact the behavior of any existing functionality that relies on hatch. The pyproject.toml file has been modified to update the fmt and verify sections, with ruff check . --fix replacing ruff . --fix and the removal of black --check . and isort . --check-only. A new configuration for isort has also been added to specify the databricks.labs.blueprint package as a known first-party package, enabling more precise management of imports related to that package. These changes simplify the build process and ensure that the project is using a more recent version of the hatch package for packaging and distributing Python projects.
  • Updated sqlglot requirement from ~=22.3.1 to >=22.3.1,<22.5.0 (#51). In this release, we have updated the version constraint for the sqlglot package in our project's pyproject.toml file. Previously, we had set the constraint to ~=22.3.1, allowing for any version with the same major and minor numbers but different patch numbers. With this update, we have changed the constraint to >=22.3.1,<22.5.0. This change enables our project to utilize bug fixes and improvements made in the latest patch versions of sqlglot, while still preventing it from inadvertently using any breaking changes introduced in version 22.5.0 or later versions. This modification allows us to take advantage of the latest features and improvements in sqlglot while maintaining compatibility and stability in our project.

Dependency updates:

  • Updated sqlglot requirement from ~=22.3.1 to >=22.3.1,<22.5.0 (#51).

0.2.0

  • Added MockBackend.rows("col1", "col2")[(...), (...)] helper (#49). In this release, we have added a new helper method MockBackend.rows("col1", "col2")[(...), (...)] to simplify testing with MockBackend. This method allows for the creation of rows using a more concise syntax, taking in the column names and a list of values to be used for each column, and returning a list of Row objects with the specified columns and values. Additionally, a __eq__ method has been introduced to check if two rows are equal by converting the rows to dictionaries using the existing as_dict method and comparing them. The __contains__ method has also been modified to improve the behavior of the in keyword when used with rows, ensuring columns can be checked for membership in the row in a more intuitive and predictable manner. These changes make it easier to test and work with MockBackend, improving overall quality and maintainability of the project.

0.1.1

  • Updated project metadata (#46). In this release, the project metadata has been updated to reflect changes in the library's capabilities and dependencies. The project now supports lightweight SQL statement execution using the Databricks SDK for Python, setting it apart from other solutions. The library size comparison in the documentation has been updated, reflecting an increase in the compressed and uncompressed size of Databricks Labs LightSQL, as well as the addition of a new direct dependency, SQLglot. The project's dependencies and URLs in the pyproject.toml file have also been updated, including a version update for databricks-labs-blueprint and the removal of a specific range for PyYAML.

Dependency updates:

  • Updated sqlglot requirement from ~=22.2.1 to ~=22.3.1 (#43).

0.1.0

  • Ported StatementExecutionExt from UCX (#31).

0.0.0

Initial commit