Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create Lineage from a SQL query #12838

Open
HectorPascual opened this issue Mar 11, 2025 · 0 comments
Open

Unable to create Lineage from a SQL query #12838

HectorPascual opened this issue Mar 11, 2025 · 0 comments
Labels
bug Bug report

Comments

@HectorPascual
Copy link

HectorPascual commented Mar 11, 2025

Currently I am unable to proceed with an ingestion of a basic SQL query through the CLI in order to generate lineage between tables existing in datahub, using the following command.

❯ datahub ingest -c lineage_datahub.yaml

Error :

'Unable to emit metadata to DataHub GMS: Failed to validate record with class com.linkedin.dataset.UpstreamLineage:'

My config for the YAML file looks like :

source:
  type: sql-queries
  config:
    platform: "athena"
    query_file: "./queries.json"

sink: 
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"

In queries.json I only have the following content :

{"query": "CREATE TABLE \"dev-datalake-gold-entities\".countries AS select * FROM\n \"dev-datalake-silver\".countries_hub_vw"}

I also attempted to do this through the Python SDK, I have a minimal example where I obtain the lineage results, but I am unsure whether I can emit this metadata from the Python SDK, I haven't found any example or similar usecase.

from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph

query = """        
        INSERT INTO "dev-datalake-gold-entities".countries
        SELECT
        countries_hub.country_code,
        FROM
        "dev-datalake-silver".countries_hub_vw countries_hub
"""

gms_endpoint = "http://localhost:8080"
token = ""
graph = DataHubGraph(
    DatahubClientConfig(
        server=gms_endpoint,
        token=token,
    )
)

lineage = graph.parse_sql_lineage(sql=query, platform="athena")
print(lineage)


emitter = DatahubRestEmitter(
    gms_server=gms_endpoint,
    token=token,
)

# TODO : how to emit above lineage to datahub?

Environment

acryl-datahub 0.15.0.5
MacOS M3 14.5

@HectorPascual HectorPascual added the bug Bug report label Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

1 participant