24 Dec 18:02

david-leifker

d88e6c9

v0.15.0rc4 Pre-release

Pre-release

What's Changed

feat(structuredProperties) Add new settings aspect plus graphql changes for structured props by @chriscollins3456 in #12052
fix(ingest/tableau): project_path_pattern use in _is_denied_project by @sid-acryl in #12010
feat: Enrich superset ingestion by @hwmarkcheng in #11688
fix(ui) Add backwards compatibility to the UI for old policy filters by @chriscollins3456 in #12017
feat(structuredProps) Add frontend for managing structured props and filtering by them by @chriscollins3456 in #12097
feat(ui) Add full support for structured properties on assets by @chriscollins3456 in #12100
docs(champions): Update directory of DH Champions by @maggiehays in #12089
feat(ingest/snowflake): ingest secure, dynamic, hybrid table metadata by @mayurinehate in #12094
feat(spark):OpenLineage 1.25.0 by @Jorricks in #12041
fix(ingest): always resolve platform for browse path v2 by @mayurinehate in #12045
fix(ingest/sdk): report recipe correctly by @anshbansal in #12101
feat(cli): add --workers arg in delete command by @anshbansal in #12102
fix(ingest/snowflake): handle dots in snowflake table names by @hsheth2 in #12105
fix(ingest/tableau): apply page_size regardless of object count by @sid-acryl in #12026
docs(ingest/snowflake): update permissions for dynamic tables by @mayurinehate in #12074
fix(ingestion/lookml): resolve CLL issue caused by column name casing. by @sid-acryl in #11876
feat(glossary): support multiple ownership types by @kevinkarchacryl in #12050
feat(datahub-client): additionally generates java8 artefacts by @sgomezvillamor in #12106
fix(ui): dereference errors by @anshbansal in #12034
feat(openapi-v3): add minimal timeseries aspect support by @david-leifker in #12096
feat(forms) Clean up form prompts on structured property deletion by @chriscollins3456 in #12053
fix(datahub-client): adds missing archiveAppendix to artifactid when publishing by @sgomezvillamor in #12112
chore(deps): bump nanoid from 3.3.6 to 3.3.8 in /datahub-web-react by @dependabot in #12086
chore(deps): bump nanoid from 3.3.7 to 3.3.8 in /docs-website by @dependabot in #12114
feat(structuredProperties): add hide property and show as badge validators by @chriscollins3456 in #12099
fix(ingest/snowflake): further improve dot handling by @hsheth2 in #12110
feat(ingest): improve query fingerprinting by @hsheth2 in #12104
docs(ingest): add docs on the SQL parser by @hsheth2 in #12103
fix(ui): dereference issues by @anshbansal in #12109
fix(datahub-client): avoid parallel execution of publish and publish-java8 by @sgomezvillamor in #12120
fix(ingestion/dremio): Ignore filtered containers in schema allowdeny pattern by @acrylJonny in #11959
fix(ingest/kafka-connect): update connection test url, handle api failures by @mayurinehate in #12082
fix(ingest/dagster): Fix Dagster build by @treff7es in #12121
fix(ingest/snowflake): improve warn message by @anshbansal in #12125
fix(dataproduct): creator is assigned as owner by @anshbansal in #12127
fix(mysql): index gap lock deadlock by @david-leifker in #12119
feat(ingest): additional limits on ingestProposalBatch by @hsheth2 in #12130
refactor(ingest): cleanup structured properties validation by @hsheth2 in #12115
config(docker-profiles): clean-up by @david-leifker in #12051
build(gradle): version change (Gradle and shadow plugin) by @dejan2609 in #11999
feat(airflow): add DatahubRestHook.make_graph method by @hsheth2 in #12116
tests(datahub-client): new tests for the AvroSchemaConverter by @sgomezvillamor in #12087
feat(ingest/snowflake): secure view lineage without owner permissions by @mayurinehate in #12123
chore(dep): exclude end of life dependency by @deepgarg-visa in #12007
chore(version): bump kafka version by @chakru-r in #12136
build(ci): fix vercel setup script by @chakru-r in #12143
feat(ingest/airflow): Add way to disable Airflow plugin without a restart by @treff7es in #12098
fix(ingestion/tableau): honor the key projectNameWithin in pagination by @sid-acryl in #12107
fix(ingest/datahub): Use server side cursor instead of local one by @treff7es in #12129
feat(ingestion/tableau): verify role assignment to user in test_connection. by @sid-acryl in #12042
docs(ingest): fix sink recipe to correct config parameter by @kousiknandy in #12132
feat(ui) Add finishing touches to the structured props feature by @chriscollins3456 in #12111
feat(ingest/sqlite): Support sqlite < 3.24.0 by @asikowitz in #12137
feat(cli): added cli option for ingestion source by @kevinkarchacryl in #11980
fix(patch): Add Finegrained Lineage patch support for DatajobInputOutput (#4749) by @treff7es in #12146
fix(ingest/s3): incorrectly parsing path in s3_uri by @eagle-25 in #12135
feat(ingest/datahub): report progress on db ingestion by @hsheth2 in #12117
build(ingest/sqlglot): Bump pin to support snowflake CREATE ... WITH TAG by @asikowitz in #12003
fix(frontend): fix typo datahub-frontend logback.xml by @deepgarg-visa in #12134
feat(git): add subdir support to GitReference by @hsheth2 in #12131
fix(ui) Fix nesting logic in properties tab by @chriscollins3456 in #12151
fix(ingest/snowflake): improve lineage parse failure logging by @hsheth2 in #12153
fix(ingest/pulsar): handle Avro schema with missing namespace or name by @Alice-608 in #12058
fix(cli/properties): allow structured properties without a graph instance by @hsheth2 in #12144
fix(ingest/gc): more logging, error handling, explicit flag by @anshbansal in #12124
fix(ingest/kafka): update dependency, tests by @mayurinehate in #12159
feat(api): authorization extended for soft-delete and suspend by @david-leifker in #12158
fix(env) Fix forms hook env var default config by @chriscollins3456 in #12155
feat(ingest/mlflow): Support configurable base_external_url by @asikowitz in #12167
fix(cli/properties): fix data type validation by @hsheth2 in #12170
fix(pgsql): Postgres doesn't support UNION select with FOR UPDATE by @david-leifker in #12169
refactor(ingest/kafka-connect): define interface for new connector impl by @mayurinehate in #12149
feat(ingest): add looker meta extractor support in sql parsing by @sagar-salvi-apptware in #12062
feat(ingest/iceberg): Improve iceberg connector by @skrydal in #12163
feat(python): split out temp wheel builds by @hsheth2 in #12157
d...

Contributors

sgomezvillamor, kousiknandy, and 24 other contributors

Assets 2

11 Dec 17:20

RyanHolstien

v0.15.0rc3

b091e46

v0.15.0rc3

What's Changed

fix(ingest): ensure sentry is initialized with graph tags by @hsheth2 in #11949
fix(ingest): more error handling by @anshbansal in #11969
feat(datahub-gc): add truncation days param by @david-leifker in #11967
docs(release): Update v_0_3_7.md by @david-leifker in #11937
fix(ci): fix build-and-test by @david-leifker in #11974
refactor(ingest/powerbi): organize code within the module based on responsibilities by @sid-acryl in #11924
fix(schematron): fix for jdk8 by @david-leifker in #11975
fix(automations docs): Update snowflake-tag-propagation.md to include permissions required for the Automation by @jjoyce0510 in #11977
chore(bump): bump version of akka for datahub-frontend by @david-leifker in #11979
feat(ingestion): extend feast plugin to ingest tags and owners by @margaridafernandes-trip in #11784
fix(validation): additional URN validation adjustments by @david-leifker in #11973
feat(search): Update search_config.yaml by @david-leifker in #11971
docs(release): update recommended CLI by @anshbansal in #11986
fix(ingest/kafka):add poll for admin client for oauth_cb by @mayurinehate in #11985
fix(ingestion/iceberg): Improvements to iceberg source by @skrydal in #11987
feat(ingest): standardize sql type mappings by @hsheth2 in #11982
feat(ingest): bump typing_extensions dep by @hsheth2 in #11965
feat(ingest): add tests for colon characters in urns by @hsheth2 in #11976
feat(ingest/athena): handle partition fetching errors by @hsheth2 in #11966
fix: Add option for disabling ownership extraction by @sagar-salvi-apptware in #11970
feat(ingest/dremio): Retrieve default_schema for SQL views by @acrylJonny in #11832
fix(docs): fix sample business glossary by @acrylJonny in #11669
fix(java-sdk): custom properties patch client by @shirshanka in #11984
fix[ingest/build]: Disable preflight script as it is not needed anymore by @treff7es in #11989
feat: connector for Neo4j by @k-bartlett in #11526
fix(ingestion/dremio): Fixed lineage view for dremio EE by @sagar-salvi-apptware in #11990
fix(ingest/gc): delete invalid dpis by @anshbansal in #11998
feat(airflow): show dag/task logs in CI by @hsheth2 in #11981
chore(ingest): remove deprecated calls to Urn.create_from_string by @hsheth2 in #11983
fix(ingest): resolve missing numeric types for profiling by @mayurinehate in #11991
fix(docs): Add spark.datahub.stage_metadata_coalescing to recommended configuration for databricks by @acrylJonny in #11800
build(coverage): enable code coverage for java and python by @chakru-r in #11992
chore(docs): Update v_0_3_7.md - v0.3.7.5 by @david-leifker in #12005
feat(java-sdk): add utils classes to give equivalence with python uti… by @shirshanka in #12002
fix(ingest/sagemaker): Gracefully handle missing model group by @treff7es in #12000
fix(ingest/gc): typo fix, do not delete empty entities by @anshbansal in #12011
fix(ingest/gc): do not cleanup empty job/flow by @anshbansal in #12013
fix(test): fix metadata-io tests by @david-leifker in #12006
fix(ingest/looker): Don't fail on unknown liquid filters by @treff7es in #12014
feat(docs-website) fix links by @jayacryl in #12019
fix(ci): fix datahub-client validatePythonEnv by @david-leifker in #12023
test(urn-validation): additional test case by @david-leifker in #12001
feat(hudi): add hudi platform to the list of default platforms by @shirshanka in #11993
fix(airflow): fix AthenaOperator extraction by @steffengr in #11857
feat(tableau): review reporting and debug traces by @sgomezvillamor in #12015
fix(ingest/tableau): make sites.get_by_id call optional by @hsheth2 in #12024
feat(cli): add platform filter for undo soft delete by @anshbansal in #12012
feat(mcp): add kafka batch processing mode option (#4449) by @david-leifker in #12021
chore: update label for team by @anshbansal in #12032
fix(ui): Adding overflow handling (also goes to oss) by @jjoyce0510 in #12022
fix(ingest/pulsar): handle missing/invalid schema objects by @Alice-608 in #11945
fix(filters) Fix issues with structured properties filters by @chriscollins3456 in #11946
fix(ingest): avoid bad IPython version by @hsheth2 in #12035
feat(ingest/kafka): additional validation for oauth_db signature by @mayurinehate in #11996
fix(ingest/gc): Adding test and more checks to gc source by @treff7es in #12027
fix(graph-edge): fix graph edge delete exception by @david-leifker in #12025
feat(ingest): add urn validation test files by @hsheth2 in #12036
chore(deps): bump cross-spawn from 7.0.3 to 7.0.6 in /datahub-web-react by @dependabot in #11978
fix(datahub-client): prevent unneeded classes in datahub-client jar by @david-leifker in #12037
fix(entity-service): no-op batches by @david-leifker in #12047
docs(compliance-forms) update guide for creating form via UI by @maggiehays in #11936
feat(snowflake): adding oauth token bypass to snowflake by @gabe-lyons in #12048
fix(ingest): avoid shell entities during view lineage generation by @mayurinehate in #12044
fix(logs): add actor urn on unauthorised by @anshbansal in #12030
fix(ingest/snowflake): Add handling of Hybrid Table type for Snowflake ingestion by @siong-tcha in #12039
fix(ingest/powerbi): reduce type cast usage by @hsheth2 in #12004
refactor(ingest/sql): add _get_view_definition helper method by @hsheth2 in #12033
feat(ingest/superset): initial support for superset datasets by @hwmarkcheng in #11972
fix(ingest/sagemaker): Adding option to control retry for any aws source by @treff7es in #8727
fix(ingest/gc): Additional dataprocess cleanup fixes by @treff7es in #12049
feat(tableau): adds more reporting metrics to better understand lineage construction in tableau ingestion by @sgomezvillamor in #12008
feat(ingestion/tableau): hidden asset handling by @haeniya in #11559
feat(airflow): drop Airflow < 2.3 support + make plugin v2 the default by @hsheth2 in #12056
fix(web) disallow deselecting all degrees on impact analysis view by @jayacryl in #12063
feat: Add parent container hierarchy label to the container by @kanavnarula in #11705
fix(py-sdk): DataJobPatchBuilder handling timestamps, output edges by @shirshanka in #12067
fix(plugin-logging): adjust error logging in plugin registry by @david-leifker in https://github.com/datahub-proj...

Contributors

sgomezvillamor, shirshanka, and 25 other contributors

Assets 2

27 Nov 03:03

david-leifker

v0.15.0rc2

3b00fd7

v0.15.0rc2 Pre-release

Pre-release

What's Changed

fix(shadowJar): fix shadowJar by @david-leifker in #11968

Full Changelog: v0.15.0rc1...v0.15.0rc2

Contributors

david-leifker

Assets 2

17 Sep 21:48

david-leifker

v0.14.1

6a165a8

v0.14.1 Latest

Latest

DataHub v0.14.1 Release Notes

User Experience

Enhanced Data Propagation UI: New features allow viewing propagated column documentation, source information, and asset-level propagation details. This improves visibility into data lineage and enables better understanding of data flow across the organization. (#11047)
Improved Search Result Tracking: Added page number to search result click events, enabling better measurement of search ranking performance. This helps users understand and optimize their search experience. (#11151)
Fixed Display Issues: Resolved issues with displaying "0" values for last ingested data and improved handling of multilingual characters in descriptions. These fixes ensure more accurate and readable information presentation. (#10840, #10975)

Developer Experience

Performance Improvements:
- Implemented lazy dataLoaders for GraphQL queries, significantly reducing latency for local environments. (#11293)
- Added option to log slow GraphQL queries, helping identify and address performance bottlenecks. (#11308)
- Introduced session authorization caching for faster access checks. (#11327)
Enhanced Search Capabilities:
- Added support for custom highlighting fields in GraphQL queries, allowing faster and more customizable data retrieval. (#11339)
- Implemented new search query functionality to filter by parents/children of Domains or Containers. (#11279)
- Added support for multiple values in 'CONTAIN', 'START_WITH', and 'END_WITH' operators, enabling more flexible and precise searches. (#11068)
API Improvements:
- Extended throttling to API requests, supporting non-browser ingestion/write requests and manual throttling for better control over system load. (#11325)
- Added support for 'START_WITH' and 'END_WITH' operators in GraphQL API, enhancing string query capabilities. (#11026)
Bug Fixes:
- Resolved issues with forward slash handling in search queries, empty key-value pairs in Elasticsearch mapping, and support for various data types in object fields. These fixes improve search accuracy and data representation. (#10932, #11004, #11066)
- Addressed Postgres regression by upgrading the ebean library from version 12.x to 15.x, resolving a read lock NPE issue. (#11379)

Metadata Ingestion

S3 Integration Enhancements:
- Enhanced partition support for S3 dataset ingestion, improving metadata representation and enabling advanced partition detection. (#11083)
- Enhanced S3 ingestion process to support reading specific file types, allowing more granular control over data ingestion. (#11177)
BigQuery Improvements:
- Implemented query log extractor for BigQuery, creating "Query" entities with usage statistics, lineage, and operation details. (#10994)
- Added support for filtering GCP project ingestion based on project labels, enabling more targeted data collection. (#11169)
- Implemented query job retries for transient errors, improving system robustness. (#11162)
Snowflake Updates:
- Added support for Iceberg tables in Snowflake access history, enhancing lineage capture capabilities. (#10961)
- Introduced ability to define clustering key formulas for Snowflake datasets. (#11254)
- Fixed tag exclusion issues in Snowflake ingestion process. (#11250)
New and Updated Connectors:
- Added ingestion source for SAP Analytics Cloud, expanding DataHub's integration capabilities. (#109 58)
- Enhanced Salesforce connector with customizable API version and improved error messages. (#11145, #11266)
- Updated Tableau ingestion process with new parameters and improved field type parsing. (#11255, #11202)
Other Ingestion Improvements:
- Added support for MongoDB database ingestion as containers. (#11178)
- Implemented automatic capturing of Snowflake assets with Pandas I/O Manager in Dagster module. (#11189)
- Enhanced Fivetran ingestion with destination ID filtering capabilities. (#11277)
- Added support for browse-only tables in Databricks ingestion. (#10766)

Other Improvements and Fixes

Upgraded various dependencies including Kafka, Azure Identity, Acryl-SQLglot, and GraphQL/Spring versions.
Improved error handling and logging across multiple components.
Enhanced test coverage and reliability.
Updated documentation for various features and processes.

Breaking Changes

Notable breaking changes include:

Removal of lower method from get_db_name in SQLAlchemySource, affecting URNs of related entities.
Changes to default sink mode and aspect handling that require server version 0.14.0+.

See the full details here.

Contributors

We extend our heartfelt thanks to all contributors for their valuable work on this release:

Your contributions are invaluable in making DataHub better for everyone. Thank you!

What's Changed

test(smoke-test): updates to smoke-tests by @david-leifker in #11152
feat(dbt): support prefer_sql_parser_lineage with sources enabled by @hsheth2 in #11168
feat(actions): updates to gha workflows by @david-leifker in #11150
build: fix docker warnings by @anshbansal in #11163
feat(hooks): Make hook enable flag non-default by @pedro93 in #11159
fix(ci): smoke-test changes do not need to build images by @david-leifker in #11174
fix(ci): fix single tag comma split by @david-leifker in #11179
lint(restore-indices): clean-up restore indices class by @david-leifker in #11176
fix(ci): typo by @david-leifker in #11180
fix(ci): additional ci and smoke-test updates by @david-leifker in #11183
test(smoke-test): minor update to openapi test by @david-leifker in #11184
feat(ingest): use pre-built dockerize binary by @hsheth2 in #11181
doc: mark deprecated feature by @anshbansal in #11175
fix(delete) Fix removing completed/verified forms references by @chriscollins3456 in #11172
feat(docs): update docs for new release by @RyanHolstien in #11164
fix(ingest): invalid urn should not fail full batch of changes by @RyanHolstien in #11187
fix(kafka-setup): add missing script to image by @david-leifker in #11190
fix(config): fix hash algo config by @david-leifker in #11191
feat(ingest): allow custom SF API version by @skrydal in #11145
fix(ingestion/transformer): extend dataset_to_data_product_urns_pattern to support containers by @sagar-salvi-apptware in #11124
fix(ui) Fix bug with editing entity names by @chriscollins3456 in #11186
ci(smoke-test): allow smoke-test only PRs by @david-leifker in #11194
feat(ingestion/lookml): support looker -- if comments by @sid-acryl in #11113
fix(elasticsearch): refactor idHashAlgo setting by @david-leifker in #11193
fix(ingestion/airflow-plugin): fixed missing inlet/outlets by @dushayntAW in #11101
docs(readme): add security notes by @david-leifker in #11196
docs: Update README.md by @prashanthic23 in #11144
feat(ingest/dbt): skip CLL on sources with skip_sources_in_lineage by @hsheth2 in #11195
fix(graphql): Correct ownership check when removing owners by @pedro93 in #11154
feat(propagation): UI for rendering propagated column documentation by @jjoyce0510 in #11047
fix(ui): checks truthy value for last ingested by @pinakipb2 in #10840
docs(scim): document okta integration with datahub for scim provisioning by @ksrinath in #11120
fix(ingestion/tableau): Tableau field type parsing by @skrydal in #11202
feat(analytics): Add page numb...

Contributors

shirshanka, esselius, and 42 other contributors

Assets 2

21 Aug 15:29

RyanHolstien

v0.14.0.2

98ad824

v0.14.0.2

DataHub v0.14.0.2 Release Notes

User Experience

Renamed: Validation --> Quality: The Validation tab has been renamed to Quality to make it more intuitive to end-users that it contains outcomes from data quality checks. [#10935]
Data Contract UI: A new Data Contract UI is now available under the Quality Tab, allowing users to handle various data assertion types and add/remove contracts more easily. [#10625]
Updates to Customized Search Ranking: By default, explore (* ) query results are ranked based on enrichment (tags, terms, owners, description, domains, row/column counts) as well as incident status. [#10774]
Custom Dataset Names: Business users can now maintain an editable dataset name separate from default properties, providing more control over dataset identification. [#10608]
Documentation Propagation Setting Page: A new settings page has been added to the UI for managing Documentation Propagation, giving users more control over how documentation is shared across the platform. [#11038]

Developer Experience

NEW: DataHub Open Assertions Specification:
- Announcing a universal assertions specification for declaring Data Quality checks and compiling them into artifacts for use by 3rd party Data Quality tools like Great Expectations, dbt tests, and Snowflake via Data Quality DMFs. [#1 0609]
- Added ability to define data quality rules using a YAML specification file, enabling users to set assertions like volume metrics and conditions, with the ability to compile and schedule them to run on Snowflake as the assertion backend. [#10602]
API and SDK Enhancements:
- New GraphQL APIs added for managing forms, structured properties, and data contracts. [#10826, #10825, #10632]
- Updates to Java and Python SDKs to support creating and updating structured properties on assets. [#10823, #10824]
- Support for conditional write semantics including If-Modified-Since, If-Unmodified-Since, and If-Version-Match in MetadataChangeProposals (MCP) and OpenAPI. [#10868]
CLI Improvements:
- A new check server-config command has been added to test server credentials and retrieve diagnostic information. [#10990]
- The get command now includes a --details/--no-details flag for more detailed output, facilitating easier issue debugging. [#10815]
- Update to CLI to optionally display server configuration settings. [#10676]
- Added functionality to the CLI by introducing the ability to assign actors (users or groups) to forms in the forms YAML API. [#10683 ]
Improved Logging and Monitoring:
- Unified request logging implemented across GraphQL, OpenAPI, and Restli requests, including additional information like actor, IP address, and API type. [#10802]
- New CLI command check server-config added to test server credentials and retrieve diagnostic information. [#10990]
Performance Optimizations:
- Implemented throttling for the mce-consumer based on mae-consumer lag. [#10626]
- Unified request logging now includes additional information like actor, IP address, and API type across GraphQL, OpenAPI, and Restli requests. [#10802]
- Added an ASYNC_BATCH mode to the rest sink for improved performance. [#10733]
- Improved the performance of read queries in Neo4j by specifying labels and combining multiple Neo4j statements within the addEdge function into a single statement, improving efficiency and performance. [#10593, #10598]
Security Enhancements:
- Updated encryption and decryption methods with a stronger cryptographic algorithm. [#11059]
- Optimized regular expressions to prevent potential ReDoS vulnerabilities. [#10315]

Metadata Ingestion

New Ingestion Sources:
- Azure Blob Storage: Added as a new ingestion source with support for Path Specs. [#10813]
- Grafana: New connector to ingest dashboards, providing documentation within DataHub for DevOps members on call. [#10891]
- IBM DB2: Added support for this platform. [#10601]
Snowflake Improvements:
- Enhanced view lineage parsing without query-based lineage/usage. [#10905]
- Added support for more than 10k views in a Snowflake database. [#10718]
- Implemented parallel schema extraction for improved performance. [#10653]
- Added snowflake-queries source for lineage, usage, queries, and operational metadata to improve performance and configurability. [#10835]
BigQuery Enhancements:
- Refactored and parallelized dataset metadata extraction for better performance. [#10884]
- Added support for new data types including BIGNUMERIC, NUMERIC, DECIMAL, BIGDECIMAL, FLOAT64, and RANGE. [#10950]
- Added support for ingesting View labels during ingestion. [#10648]
Looker Updates:
- Ingested explore tags into DataHub. [#10547]
- Fixed issues related to CLL generation when the view definition language is SQL. [#10542]
- Added support for including platform instance details in URNs for dashboards and charts. [#10771]
Other Improvements:
- dbt: Enhanced flexibility in lineage generation with the new experimental prefer_sql_parser_lineage flag. [#11039]
- Airflow: Task ownership info can now be set as a group rather than an individual user. [#10742]
- Athena: Enhanced profiling capabilities to support column quantiles and medians. [#10723]
- Fivetran: Improved connector performance for faster ingestion. [#10556]
- SageMaker: Added stateful ingestion capability to remove deleted assets during ingestion runs. [#10573]
- Tableau: Support added for ingesting multiple Tableau sites in a single configuration, with sites appearing as containers in DataHub. [#10498]
- Added support for ingesting schemas from schema registry in the Kafka module. [#10612]
- Introduced a TagsToTermMapper transformer for mapping specific tags to glossary terms. [#10758]
- Enhanced the SQL lineage parser with an optional default_dialect parameter for customized dialect selection. [#10830]

Other Improvements and Fixes

Fixed high vulnerabilities related to sensitive information logging. [#11088]
Optimized regular expressions to prevent potential ReDoS vulnerabilities. [#10315]
Improved error handling and logging across various modules.
Enhanced test coverage for new features and existing functionality.

Breaking Changes

Protobuf CLI will no longer create binary encoded protoc custom properties by default.
Changes to Data flow info and data job info aspects may require a server upgrade.
OpenAPI V3 - Creation of aspects now requires wrapping within a value key.
Profiling configuration for Glue source has been updated.

For full details on breaking changes, please refer to the updating guide.

Contributors

Massive shoutout to all of the contributors who made this release possible:

First-Time Contributors

@aabharti-visa, @acrylJonny, @amit-apptware, @AndreasHegerNuritas, @aviv-julienjehannet, @brbrown25, @chardaway, @dragontail, @ipolding-cais, @joelmataKPN, @john-claro-cko, @jordanjeremy, @lima-renan, @nadavgross, @nephtyws, @obaltian, @PeamThom, @pie1nthesky, @pulsar256, @samblackk, @shtephlee, @simaov, @steffengr, @tkdrahn, @TristanHeisler, @wornjs, @xkollar

Repeat Contributors

@ajoymajumdar, @bossenti, @cburroughs, @cccs-eric, @deepgarg-visa, @dushayntAW, @fjmacagno, @githendrik, @haeniya, @jayasimhankv, @k7ragav, @kevin1chun, @ksrinath, @Kunal-kankriya, @looppi, @Masterchen09, @mayurinehate, @ngamanda, @nmbryant, @noggi, @pankajmahato-visa, @PatrickfBraz, @pinakipb2, @Rajasekhar-Vuppala, @rtekal, @sagar-salvi-apptware, @shubhamjagtap639, @siladitya2, @ssilb4, @Sukeerthi31, @sumitappt, @TonyOuyangGit, @walter9388

DataHub Maintainers

@anshbansal, @asikowitz, @chriscollins3456, @darnaut, @david-leifker, @eboneil, @ethan-cartwright, @gabe-lyons, @hsheth2, @jayacryl, @jjoyce0510, @maggiehays, @pedro93, @RyanHolstien, @shirshanka, @sid-acryl, @skrydal, @treff7es, @yoonhyejin

What's Changed

fix(ingest/unity-catalog) upstream lineage for hive_metastore external table with s3 location by @dushayntAW in #10546
feat(ingestion/looker): ingest explore tags into the DataHub by @sid-acryl in #10547
fix(instropection): fix configuration application order by @david-leifker in #10579
fix(ingest/slack): pull real names by @hsheth2 in #10565
fix(ingest): Remove env deprecation message by @treff7es in #10581
test(ingest/sql): refactor CLL generator + add tests by @hsheth2 in #10580
docs(remote-ingestion): update description and deployment instructions by @darnaut in #10574
fix(ingest): DataProcessInstance.emit_process_end() ignored start_timestamp_millis by @obaltian in #10539
fix(ingest/metabase): Fix for query template expressions and invalid URNs for Text Cards by @pulsar256 in #10381
feat(graphql): Support tagging incidents and assertions via GraphQL API by @jjoyce0510 in #10575
docs(update): updating-datahub by @david-leifker in #10585
docs: reorder semantics guide to the bottom by @yoonhyejin in #10541
feat(auth): add viewTests platform privilege by @ksrinath in https://github.com...

Contributors

cburroughs, githendrik, and 78 other contributors

Assets 2

13 Aug 18:40

RyanHolstien

v0.14.0

5e9188c

v0.14.0

Known Issues

Issue with kafka-setup missing a script for new deployments, hotfix will be released shortly

What's Changed

fix(ingest/unity-catalog) upstream lineage for hive_metastore external table with s3 location by @dushayntAW in #10546
feat(ingestion/looker): ingest explore tags into the DataHub by @sid-acryl in #10547
fix(instropection): fix configuration application order by @david-leifker in #10579
fix(ingest/slack): pull real names by @hsheth2 in #10565
fix(ingest): Remove env deprecation message by @treff7es in #10581
test(ingest/sql): refactor CLL generator + add tests by @hsheth2 in #10580
docs(remote-ingestion): update description and deployment instructions by @darnaut in #10574
fix(ingest): DataProcessInstance.emit_process_end() ignored start_timestamp_millis by @obaltian in #10539
fix(ingest/metabase): Fix for query template expressions and invalid URNs for Text Cards by @pulsar256 in #10381
feat(graphql): Support tagging incidents and assertions via GraphQL API by @jjoyce0510 in #10575
docs(update): updating-datahub by @david-leifker in #10585
docs: reorder semantics guide to the bottom by @yoonhyejin in #10541
feat(auth): add viewTests platform privilege by @ksrinath in #10413
feat(ingestion/SageMaker): Remove deprecated apis and add stateful ingestion capability by @TonyOuyangGit in #10573
fix(search): fix autocomplete filter by @david-leifker in #10599
fix(ingest/snowflake): handle column level lineage for dbt temporary tables by @john-claro-cko in #10258
fix(mae-consumer): fix UpdateIndicesHook ignoring events with forceIndexing property set to true by @Masterchen09 in #10586
feat(fieldpaths): prevent duplicate field paths by @david-leifker in #10590
docs: update Town Hall page by @maggiehays in #10588
fix(search): implement queryByDefault annotation for SearchableRef by @david-leifker in #10603
fix(ingest/sagemaker): remove unsupported config by @hsheth2 in #10606
feat(neo4j): combine neo4j statements in addEdge into one statement by @deepgarg-visa in #10598
feat(neo4j): improve neo4j read query performance by specifying labels by @deepgarg-visa in #10593
feat(ingest): fetch connections from the backend by @hsheth2 in #10511
feat(graphql): custom complexity calculator and separate configurable thread pool for graphQL by @RyanHolstien in #10562
feat(ingest): enable stateful ingestion safety threshold by @hsheth2 in #10516
fix(ingest/spark): Bumping OpenLineage version to 0.14.0 by @treff7es in #10559
fix(ingest/dbt): only generate one subtype by @hsheth2 in #10615
fix(ingest/snowflake): make test connection logs less noisy by @hsheth2 in #10587
fix(ingest): move status aspect fixer logic by @hsheth2 in #10591
feat(data quality): update models, add assertions cli with snowflake integration by @mayurinehate in #10602
fix(gms/autosuggestion): autosuggestion query not returning the result if the query text has a prefix or suffix '-' on the search field by @siladitya2 in #10512
feat(consumers): mce-consumer throttling based on mae-consumer lag by @david-leifker in #10626
Add support for runAssertion, runAssertions, and runAssertionsForAsset APIs by @noggi in #10605
feat(graphql) data contract resolvers for graphql by @jayacryl in #10618
Revert "feat(graphql) data contract resolvers for graphql" by @jayacryl in #10631
fix(views): Add relationship annotation to GlobalViewsSettings urn by @pedro93 in #10597
feat(cli) Delete form references when using delete CLI by @chriscollins3456 in #10629
feat(ingest/looker): add ownership info to independent looks by @k7ragav in #10624
log(custom-plugins): add additional logging for spring plugins by @david-leifker in #10627
refactor(ui/glossary): Clean up term deletion by @asikowitz in #10589
fix(views): handle unknown view when resolving a view to a filter by @darnaut in #10640
feat(lineage): change query structure for explored hop limit by @RyanHolstien in #10607
feat(ingest): measure sink bottlenecking by @hsheth2 in #10628
fix(ingest/iceberg): update iceberg source to support newer versions of pyiceberg at runtime by @cccs-eric in #10614
feat(ingest/redshift): Adding way to filter s3 paths in Redshift Source by @treff7es in #10622
feat(businessAttribute): parallelize-business-attribute-propagation by @deepgarg-visa in #10638
docs(ingest): remove trailing comma on athena permission by @nephtyws in #10634
doc(roles): update privileges by @ksrinath in #10528
docs(subscriptions): adding docs for assertion level subscriptions on managed DH by @jayacryl in #10495
feat(ingest): add fast query fingerprinting by @hsheth2 in #10619
fix(ingestion/airflow-plugin): updated the document for developers by @dushayntAW in #10633
fix(ingest/trino): variable reference before define by @anshbansal in #10646
feat(entity-client): restli batchGetV2 batchSize fix and concurrency by @david-leifker in #10630
docs(): Adding API docs for incidents, operations, and assertions by @jjoyce0510 in #10522
feat(ci): fix conditionals and consolidate change detection by @david-leifker in #10649
fix(ingest/snowflake): avoid overfetching schemas from datahub by @hsheth2 in #10527
docs: add note for subResourceType being a fieldPath by @anshbansal in #10660
fix(ingest/qlik): improve logging for debug by @anshbansal in #10659
fix(doc): Fix doc typo in transformer by @sid-acryl in #10658
feat(graphql) data contract resolvers by @jayacryl in #10632
fix(openapiv3): v3 scroll response fix by @david-leifker in #10654
Use type: string for enum schemas by @kevin1chun in #10663
fix(ingestion/airflow-plugin): airflow remove old tasks by @dushayntAW in #10485
feat(platform): added db2 platform by @pankajmahato-visa in #10601
feat(ingestion/kafka)-Add support for ingesting schemas from schema registry by @aabharti-visa in #10612
fix(azure_ad): print request URL on error by @darnaut in #10677
docs(ingest): Rename csv / s3 / file source and sink by @asikowitz in #10675
feat(ingest/glue): database parameters extraction by @skrydal in #10665
fix(azure_ad): fix infinite loop on request error by @darnaut in #10679
perf(ingestion/fivetran): Connector performance optimization by @shubhamjagtap639 in #10556
feat(ingest): make query formatting more robust by @hsheth2 in #10678
feat(cli) Add actors to forms yaml API by @chriscollins3456 in #10683
doc(glossary): add note for github action for glossary by @anshbansal in http...

Contributors

cburroughs, githendrik, and 78 other contributors

Assets 2

23 May 23:11

david-leifker

v0.13.3

121e08c

v0.13.3

DataHub Release Notes

User Experience

NEW: Business Attributes: Business Attributes are used to standardize and manage data elements across multiple domains, projects, and applications. By linking dataset attributes to Business Attributes, organizations ensure uniformity and ease of updates, as changes made to a Business Attribute are automatically propagated across all linked datasets. #9863
Improved UI for Dataset Properties: Added collapse functionality for long dataset properties, making it easier to navigate and view relevant information. #10203
Pagination for Ingestion Tasks Listing: Added pagination to the tasks listing page, making it easier to manage and navigate through tasks. #10293
Rich Text Support for Form Descriptions: Added support for rich text in form descriptions, enhancing the user experience. #104 2 5
New Analytics Charts: Added charts in the Analytics tab to identify Top Users and New Users. #10344
Enhanced search functionality with customizable autocomplete configuration. #10426

Developer Experience

Unified CI Workflow Updates: Improved CI build with unified workflow updates and disk space cleanup, making the build process more efficient. #10353
Improved Logging for GraphQL Requests: Enhanced logging for GraphQL requests, providing better insights and debugging capabilities. #10404
Enhanced Documentation for Lineage Feature Guide: Updated documentation for the lineage feature guide, making it easier to understand and implement. #10401
Improved Documentation for SchemaField.label: Updated documentation for SchemaField.label, providing clearer guidance for developers. #10251
Enhanced CI with Docker Image Publishing: Added Docker image publishing capabilities to the CI workflow, streamlining the deployment process. #10193
Redesigned Docs Site Feedback Button: Improved the design of the feedback button in the documentation, making it more user-friendly. #10182

Metadata Ingestion

Improved Data Profiling by early filtering of tables, correctly computing sample row counts, and combining unique count queries per table. #10378, #10319, #10322
Airflow: Introduced support for BigQueryInsertJobOperator. #10452
BigQuery: Added support for Table Clones and incremental column-level lineage.
Snowflake: Improved reporting for usage aggregation and handled lineage errors; Improved ingestion performance with system sampling on very large tables. #10279, #10430
Glue: Introduced support for delta schemas. #10299
Redshift: Improved usage extraction by filtering out system queries. #10247
Mode: Enhanced ingestion for Mode by adding dashboards into containers, improving data visualization and management. #10563
PowerBI: Added support to automatically extract table lineage between PowerBI and Databricks. #10416
dbt: Improved dbt ingestion by handling complex SQL and enhancing documentation, providing better data management and insights. #10323
NiFi: Enhanced ingestion for NiFi with process group as browse path and incremental lineage, improving data organization and tracking. #10202
Incubating Sigma and CockroachDB sources. #10037, #10226

Breaking Changes

DynamoDB Connector: aws_region is now a required configuration. The connector will no longer loop through all AWS regions; instead, it will only use the region passed into the recipe configuration. #10419
Custom Validators and Mutators: Dropped a previously required constructor. #10389
FabricType RVW: Added as a new FabricType. No rollbacks allowed once metadata with this fabric type is added without manual cleanups in databases. #10472

For full details on breaking changes, please refer to the updating DataHub documentation.

Contributors

A big thank you to all our contributors for this release!

First-Time Contributors

@bouaouda-achraf, @camilogutierrez, @dotan-mor, @egemenberk, @erikkvale, @guyr-ziprecruiter, @ishtartec, @jonasHanhan, @mrjefflewis, @noggi, @olgapenedo, @paguos, @richenc, @Rosmirose, @sagar-salvi-apptware, @timothyjin

Repeat Contributors

@ajoymajumdar, @deepgarg-visa, @dushayntAW, @filipe-caetano-ovo, @gaurav2733, @kevin1chun, @ksrinath, @Masterchen09, @mayurinehate, @ms32035, @Nelvin73, @rtekal, @sgomezvillamor, @shubhamjagtap639, @siladitya2, @skrydal

DataHub Maintainers

@anshbansal, @asikowitz, @chriscollins3456, @darnaut, @david-leifker, @eboneil, @gabe-lyons, @hsheth2, @jayacryl, @jjoyce0510, @RyanHolstien, @shirshanka , @sid-acryl, @treff7es, @yoonhyejin

Thank you all for your hard work and contributions!

What's Changed

fix(ingest/bigquery): Supporting lineage extraction in case the select query result's target table is set on job by @treff7es in #10191
fix(retention): fix time-based retention by @trialiya in #10118
feat(lineage): give via and paths in entity lineage response by @RyanHolstien in #10192
fix(ingestion/datahub): implemented the filter to ignore/include URN for ingestion by @dushayntAW in #10174
fix(ingestion/glue): fix to ingest the comment for partition key as description by @dushayntAW in #10189
feat(ingest/looker): cleanup usage generation code by @hsheth2 in #10153
fix(dev): fix env file overrides for profiles by @hsheth2 in #10194
fix(ingestion/hive): ignore sampling for tagged column/table by @dushayntAW in #10096
fix(ui/property): add collapse for long dataset properties by @gaurav2733 in #10203
saas release v0.3.1 release notes by @david-leifker in #10205
fix(ingest/databricks): pin pandas for databricks ingestion by @mayurinehate in #10204
Fixed issue where the custom defined aspects were missing from the API specification. by @ajoymajumdar in #10208
feat(ingestion/transformer): Handle overlapping while mapping in extract ownership from tags transformer by @shubhamjagtap639 in #10201
fix(build): avoid nested gradle commands by @hsheth2 in #10198
feat(ingest/great_expectations): support in-memory (Pandas) data assets by @bouaouda-achraf in #9811
ci(workflow): publish docker from pr with label by @david-leifker in #10193
bump(version): bump classgraph version, add early package filter by @david-leifker in #10207
fix(ingestion/mongodb): MongoDB source unable to parse datetimes with years > 9999 by @jonasHanhan in #10110
fix(graphql-core): DomainEntitiesResolver does not support values FacetFilterInput parameter by @siladitya2 in #10188
fix(graphql-core):Auto completion/suggestion of Domains are not working by @siladitya2 in #10150
chore(usage-stats): measure time for getting buckets and aggregations by @darnaut in #10220
test(search): introduce retry for search test by @david-leifker in #10206
feat(ingest/bigquery): fix support for incremental column lineage by @hsheth2 in #10222
fix(ingest/dbt): better dbt timestamp parsing by @hsheth2 in #10223
feat(ingest/sql): normalize bigquery partitioned tables when parsing by @hsheth2 in #10224
docs: fix feedback button design by @yoonhyejin in #10182
docs: add discourse to community tab by @yoonhyejin in #10181
docs: edit the text and destination for sign up link by @yoonhyejin in #10183
fix(ingestion/datahub): moved urn_pattern config to source config by @dushayntAW in #10215
fix(ingestio...

Contributors

sgomezvillamor, shirshanka, and 47 other contributors

Assets 2

16 Apr 19:09

david-leifker

v0.13.2

0a8ec37

v0.13.2

Hotfix Release

Fixes MCL message deserialization bug when using internal schema registry and running specific upgrade jobs.

policyFields (enabled by default):
BOOTSTRAP_SYSTEM_UPDATE_POLICY_FIELDS_ENABLED:true

dataJobNodeCLL (disabled by default):
BOOTSTRAP_SYSTEM_UPDATE_DATA_JOB_NODE_CLL_ENABLED:false

Example Error:

Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 1
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 13 out of bounds for length 2
        at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:460)
        at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:283)
        at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:188)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161)
        at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:260)
        at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:248)

Recovery Directions:

If currently affected, please remove the topic prior to upgrading to v0.13.2 to remove the corrupted message. The default topic name is MetadataChangeLog_Versioned_v1 however if you've customized the topic name be sure to remove that topic.

If running kafka per the example Helm chart for prerequisites the following command will delete the topic.

kubectl exec -it prerequisites-kafka-broker-0 -c kafka -- kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic MetadataChangeLog_Versioned_v1

Full Changelog: v0.13.1...v0.13.2

Assets 2

02 Apr 19:40

david-leifker

v0.13.1

2873736

v0.13.1

DataHub Release Notes

User Experience

Capture and Manage Common Joins between Datasets: Users can now view and manage common join relationships between datasets, making it easier than ever to capture best practices and bespoke join logic. Watch the walkthrough here! 8325
- Head's up: you'll need to enable the ER_MODEL_RELATIONSHIP_FEATURE_ENABLED env variable to use this feature!
Enhanced UI Interactions: Users can now enjoy an improved markdown editor and filter policies by active/inactive statuses, resulting in a more intuitive and manageable interface. 9949, 9958
Visual Context for Groups: You can now include picture links for groups in the UI, adding a richer visual context and enhancing the navigational experience. 9882
Improved Error Visibility: The UI now displays error messages related to data size limitations, allowing for better troubleshooting and user experience. 10038

Developer Experience

Enhanced Kafka Compatibility: Updated client version for Kafka setup ensures better compatibility and functionality for developers. 9962
Optimized Docker Build: Docker setups now respect pip mirrors, optimizing the build process especially in restricted network environments. 9963
Advanced Error Handling: New error handling for duplicate class names and improved fspath lint error management enhance the code reliability and quality. 9960, 9976
Latest OpenSearch Image: Incorporation of OpenSearch image version 2.11.0 aligns with the latest stable releases, boosting performance and security. 9984

Metadata Ingestion

NEW: Dagster Integration: You can now seamlessly ingest your Dagster Pipelines, Jobs, Ops, and lineage into DataHub. 10071
Expanded Field Classification Support: This release introduces support for field-level classification during ingestion for Redshift, BigQuery, DynamoDB, and SQL Sources. 10013, 10031
Enhanced Ingestion Capabilities: DataHub now offers stateful ingestion by default, optimizing routines for REST sinks and improving metadata accuracy across diverse sources like dbt and BigQuery. 9934, 10158, 10080
Better Data Lineage: This release introduced support for Openlineage in service of the Spark Lineage Beta Plugin; additionally, we now support incremental Column-Level Lineage, improving the accuracy of detecting column-level relationships during ingestion.9870, 9967, 10090
Schema Clarity: New descriptions support for JSON schema arrays and a mechanism to escape special characters in BigQuery table descriptions aid in clearer schema validation and ingestion processes. Databricks ingestion now supports Hive Metastore schemas with special characters. 9757, 9932, 10049

Version Upgrades

Kafka client and OpenSearch image were updated to the latest versions.

Breaking Changes

This release introduces default settings for stateful ingestion and updates in handling dbt ingestion. For details on all breaking changes, view the full documentation here.

Contributors

MASSIVE shoutout to our contributors!

First-Time Contributors

akarsh991, alexs-101, AvaniSiddhapuraAPT, diegmonti, dushayntAW, filipe-caetano-ovo, HuanjieGuo, jayacryl, k7ragav, kopax-polyconseil, LePuppy, Nelvin73, pinakipb2, poorvi767, rae89, trialiya, valeral.

Repeat Contributors

ANich, shubhamjagtap639, sgomezvillamor, siladitya2, skrydal, sumitappt, Masterchen09, mayurinehate, ngamanda, gaurav2733, githendrik, jayasimhankv.

DataHub Maintainers

anshbansal, asikowitz, chriscollins3456, darnaut, david-leifker, eboneil, ethan-cartwright, gabe-lyons, hsheth2, pedro93, RyanHolstien, treff7es, yoonhyejin.

What's Changed

bump(kafka-setup): client version bump by @david-leifker in #9962
feat(ingest): throw codegen error on duplicate class names by @hsheth2 in #9960
feat(docker): respect pip mirrors with uv by @hsheth2 in #9963
Openlineage endpoint and Spark Lineage Beta Plugin by @treff7es in #9870
fix(ingest/json-schema): adding support descriptions for array by @AvaniSiddhapuraAPT in #9757
fix(ingest/redshift): fix bug in lineage v2 table renames by @hsheth2 in #9967
feat(ingest): speed up to_obj() and validate() by @hsheth2 in #9969
feat(ingest): fix fspath lint error by @hsheth2 in #9976
docs: archive old version before 0.12.0 & fix broken links by @yoonhyejin in #9957
fix(ui/markdown-editor): arrows change field when editing description… by @gaurav2733 in #9949
feat(ui/policies): add filter for Active/Inactive/All on policy page by @gaurav2733 in #9958
feat(ui): add option to add picture link for groups by @akarsh991 in #9882
feat(ingest): add Looks subtype + stop reemitting browsePathV2 by @hsheth2 in #9978
fix(ingest/bigquery): escape special characters for table descriptions by @AvaniSiddhapuraAPT in #9932
feat(ui): add loading spin to access management table by @filipe-caetano-ovo in #9974
fix(ingestion/fivetran): Fix fivetran get connector jobs bug by @shubhamjagtap639 in #9975
feat(ingest/dbt): generate CLL for all node types by @hsheth2 in #9964
chore(search): bump OpenSearch image version to 2.11.0 by @darnaut in #9984
feat(ingest): enable stateful_ingestion by default for DataHub rest sink by @shubhamjagtap639 in #9934
feat(ingestion/cli): Adding check option to validate allow/deny and path_specs by @treff7es in #9983
fix(ingest): only import PathSpec when necessary by @hsheth2 in #9989
feat(config): add configuration to reprocess UI sourced events by @RyanHolstien in #9988
feat(pluginRegistry): add configuration to reduce runnable frequency by @RyanHolstien in #9990
build(react): Fix typescript errors in test files by @sumitappt in #9982
feat(docs): disable last update timestamps by @hsheth2 in #9987
feat: add versioned content for 0.12.1 by @yoonhyejin in #9944
doc: add version 0.13.0 by @yoonhyej...

Contributors

githendrik, sgomezvillamor, and 41 other contributors

Assets 2

29 Feb 23:20

RyanHolstien

v0.13.0

8b6790e

v0.13.0

DataHub v0.13.0 Release Notes Summary

User Experience

NEW - Asset Documentation Forms & UI-Editable Properties: Define specific documentation requirements via a Form, and empower your asset owners to capture their valuable knowledge via UI-Editable Properties. Watch the demo here!
NEW - DataHub Incidents: Create, communicate, and data quality and observability incidents when they inevitably arise. Watch the demo here!
UI Improvements: Editing secrets, handling forms, and rendering token pages and lineage diagrams have been improved for a smoother user interface experience.
UI Improvements: Editing secrets, handling forms, and rendering token pages and lineage diagrams have been improved for a smoother user interface experience.

Developer Experience

Security Upgrades: Core dependencies like shiro-core and FastAPI have been upgraded to fix vulnerabilities, ensuring a safer development environment.
GraphQL/OpenAPI Enhancements: New GraphQL endpoints and better OpenAPI documentation provide more powerful tools for API interaction, making developers' jobs easier.
Performance Tuning: Backend improvements for search operations and ingestion processes make the platform faster and more reliable.

Metadata Ingestion

Platform Integrations: Enhanced support for dbt, Metabase, BigQuery, AWS Glue, Oracle, and Redshift allows for more comprehensive metadata capture, making integration with these platforms smoother.
Ingestion Framework: The reliability of ingestion has been improved, with new capabilities like support for tags from Tableau datasources and compatibility with Airflow 2.5.0, facilitating a broader range of data synchronization tasks.
Connector Improvements: Ingestion connectors for external data tools have been streamlined, ensuring easier integration and data synchronization.

Other Improvements and Fixes

Enhanced internal testing frameworks with Cypress and pytest-random-order for ingestion tests.
Simplified developer workflows with configurable Docker Compose project names in CLI.
Addressed various ingestion-related bugs for platforms like Feast and Snowflake.
Enhanced the UI codebase with TypeScript compilation linting and updated styles.
Streamlined CI processes for pull requests and linting conditions.
Version Upgrades: Upgraded pytest-docker, Pegasus, and SQLglot, among others, to improve stability and performance. Security vulnerabilities addressed by upgrading FastAPI, gitdb, and follow-redirects.

Notable Breaking Changes

Updates to MySQL version for quickstarts and migration to Neo4j 5.x may impact existing setups.
JDK17 build requirement and Docker Compose > 2.20 needed for building DataHub.
Python 3.8+ requirement for the acryl-datahub CLI.
Changes in Unity Catalog ingestion source configs and Redshift lineage generation.
Deprecation of Spark 2.x and associated JDK8 build requirements.

For full details on breaking changes, please visit DataHub's update guide.

Acknowledgements

A huge thank you to all our contributors for making this release possible. Your hard work and dedication are greatly appreciated.

First-Time Contributors

7onn, Adityamalik123, atjones0011, BlueHorn07, diegoreico, dim-ops, fer-marino, Gerrit-K, gp1105739, ilpianista, ingthorb, KaYunKIM, Kunal-kankriya, muzzacode, nnnkkk7, pankajmahato-visa, rubiojr, ryaminal, scalvanese452, sleeperdeep, stevenayers.

Repeat Contributors

allizex, arunvasudevan, cburroughs, feldjay, gaurav2733, iprentic, KulykDmytro, kushagra-apptware, mayurinehate, nmbryant, noggi, purnimagarg1, rinzool, sgomezvillamor, shubhamjagtap639, siddiquebagwan-gslab, siladitya2, skrydal, sumitappt, TonyOuyangGit, wngus606, yangjiandan, Salman-Apptware.

DataHub Maintainers

anshbansal, asikowitz, chriscollins3456, darnaut, david-leifker, eboneil, ethan-cartwright, gabe-lyons, hsheth2, jjoyce0510, maggiehays, pedro93, RyanHolstien, shirshanka, sid-acryl, treff7es, yoonhyejin.

What's Changed

fix(ingest/transformer): correct registration by @anshbansal in #9418
docs(ingest/sql-queries): Rearrange sections by @asikowitz in #9426
fix: Adjusting the view of the Column Stats by @Salman-Apptware in #9430
feat(patch): support fine grained lineage patches by @RyanHolstien in #9408
fix(CVE-2023-6378): update logback classic by @RyanHolstien in #9438
feat: allow the sidebar size to be draggable by @Salman-Apptware in #9401
fix(json-schema): do not send invalid URLs by @anshbansal in #9417
fix(ingest/profiling) Fixing profile eligibility check by @treff7es in #9446
fix(ingest): avoid git dependency in dbt by @hsheth2 in #9447
feat(ingest): add retries for tableau by @hsheth2 in #9437
docs(updating-datahub): update docs for v0.12.1 by @david-leifker in #9441
feat: Allow specifying Data Product URN via UI by @Salman-Apptware in #9386
Add button to copy urn of an Ownership Type by @Salman-Apptware in #9452
docs(ingest/tableau): add token to sink config in sample recipe by @KaYunKIM in #9411
feat(glossary): add ability to clone glossary term(name and documentation) from term profile menu by @allizex in #9445
feat(ingestion): Add typeUrn handling to ownership transformers by @skrydal in #9370
fix(ingest): reduce GraphQL Logs to warning for circuit breaker by @arunvasudevan in #9436
fix: support Apollo caching for settings / Policies by @Salman-Apptware in #9442
refactor | PRD-785 | datahub oss: migrate use of useGetAuthenticatedU… by @sumitappt in #9456
refactor(ui): Minor improvements & refactoring by @jjoyce0510 in #9420
feat(ingest): add ingest --no-progress option by @BlueHorn07 in #9300
fix(powerbi): add access token refresh by @anshbansal in #9405
fix | PRD-463 | Stop trying to ping the track endpoint on login home … by @sumitappt in #9462
feat(ingest/unity): enable hive metastore ingestion by @mayurinehate in #9416
feat(ingestion/transformer): create tag if not exist by @siddiquebagwan-gslab in #9076
fix(ingest): make user_urn and group_urn generation consider user and… by @shirshanka in #9026
feat(ingestion): Add test_connection methods for important sources by @shubhamjagtap639 in #9334
docs: fix sample command for container logs by @nnnkkk7 in #9427
fix(ingest): bump source configs json schema version by @hsheth2 in https://github...