DataHub Roadmap

Here is DataHub's roadmap for the next six months (starting Jan 2021).

We publish only a short roadmap, because we are evolving very fast and want to adapt to the community's needs. We will be checking off against this roadmap as we make progress over the next few months.

Caveat: ETA-s are subject to change. Do let us know before you commit to your stakeholders about deploying these capabilities at your company.

If you would like to suggest new items or request timeline changes to the existing items, please submit your request through this form or submit a GitHub feature request.

Of course, you always have access to our community through Slack or our town halls to chat with us live!

Q1 2021 [Jan - Mar 2021]

React UI

Build a new UI based on React
Deprecate open-source support for Ember UI

Python-based Metadata Integration

Build a Python-based Ingestion Framework
Support common people repositories (LDAP)
Support common data repositories (Kafka, SQL databases, AWS Glue, Hive)
Support common transformation sources (dbt, Looker)
Support for push-based metadata emission from Python (e.g. Airflow DAGs)

Dashboards and Charts

Support for dashboard and chart entity page
Support browse, search and discovery

SSO for Authentication

Support for Authentication (login) using OIDC providers (Okta, Google etc)

Business Glossary

Support for business glossary model (definition + storage)
Browse taxonomy
UI support for attaching business terms to entities and fields

Jobs, Flows / Pipelines

Use case: Search and Discover your Pipelines (e.g. Airflow DAGs) and understand lineage with datasets

Support for Metadata Models + Backend Implementation
Metadata Integrations with systems like Airflow.

Data Profiling and Dataset Previews

Use Case: See sample data for a dataset and statistics on the shape of the data (column distribution, nullability etc.)

Support for data profiling and preview extraction through ingestion pipeline
Out of scope for Q1: Access control of data profiles and sample data

Q2 2021 (Apr - Jun 2021)

Cloud Deployment

Production-grade Helm charts for Kubernetes-based deployment
How-to guides for deploying DataHub to all the major cloud providers (AWS, Azure, GCP)

Data Quality

Support for data quality visualization
Support for data health score based on data quality results and pipeline observability
Integration with systems like Great Expectations, AWS deequ etc.

Product Analytics for DataHub

Helping you understand how your users are interacting with DataHub
Integration with common systems like Google Analytics etc.

Usage-Based Insights

Display frequently used datasets, dashboards
Improved search relevance through usage data

Role-based Access Control

Support for fine-grained access control for metadata operations (read, write, modify)
Scope: Access control on entity-level, aspect-level and within aspects as well.
This provides the foundation for Tag Governance, Dataset Preview access control etc.

No-code Metadata Model Additions

Use Case: Developers should be able to add new entities and aspects to the metadata model easily

No need to write any code (in Java or Python) to store, retrieve, search and query metadata

Beyond the horizon

Let us know what you want!

Submit requests here or
Submit a GitHub feature request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roadmap.md

roadmap.md

DataHub Roadmap

Q1 2021 [Jan - Mar 2021]

React UI

Python-based Metadata Integration

Dashboards and Charts

SSO for Authentication

Tags

Business Glossary

Jobs, Flows / Pipelines

Data Profiling and Dataset Previews

Q2 2021 (Apr - Jun 2021)

Cloud Deployment

Data Quality

Product Analytics for DataHub

Usage-Based Insights

Role-based Access Control

No-code Metadata Model Additions

Beyond the horizon

Let us know what you want!

Files

roadmap.md

Latest commit

History

roadmap.md

File metadata and controls

DataHub Roadmap

Q1 2021 [Jan - Mar 2021]

React UI

Python-based Metadata Integration

Dashboards and Charts

SSO for Authentication

Tags

Business Glossary

Jobs, Flows / Pipelines

Data Profiling and Dataset Previews

Q2 2021 (Apr - Jun 2021)

Cloud Deployment

Data Quality

Product Analytics for DataHub

Usage-Based Insights

Role-based Access Control

No-code Metadata Model Additions

Beyond the horizon

Let us know what you want!