Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[datahub migration] Failed to configure the sink (datahub-rest) #12164

Open
laurajsdias opened this issue Dec 18, 2024 · 3 comments
Open

[datahub migration] Failed to configure the sink (datahub-rest) #12164

laurajsdias opened this issue Dec 18, 2024 · 3 comments
Labels
bug Bug report

Comments

@laurajsdias
Copy link

laurajsdias commented Dec 18, 2024

Describe the bug
GMS connection not working via token.

To Reproduce
Steps to reproduce the behavior:

I'm trying to migrate Datahub from GKE to another cluster in AWS EKS. Even after redeploying Datahub and restoring the database, data is not there. Then I found this: https://datahubproject.io/docs/generated/ingestion/sources/datahub/

So, when I try to run this recipe on source Datahub, it fails to configure the sink on target Datahub. I have the right token (created again a few times already) and right permissions but no luck.

Also, not sure if it's important, but I've exposed gms service on a URL like: https://datahub-aws.dev.example.com/gms. (example here just to not disclose the name of the company)

And I do have METADATA_SERVICE_AUTH_ENABLED set to true both on frontend and gms.

Expected behavior
I need the sink to be created.

Error

raise PipelineInitError(f"Failed to {step}: {e}") from e
datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure the sink (datahub-rest): 💥 Failed to connect to DataHub with DataHubRestEmitter: configured to talk to https://datahub-aws.dev.example.com/gms with token: eyJh**********2NTw

This is the recipe I have:

run_id: 'urn:li:dataHubExecutionRequest:90b157aa-870e-4362-8135-391b890286c9'
sink:
  type: datahub-rest
  config:
    server: 'https://datahub-aws.dev.example.com/gms'
    token: >token<
datahub_api:
  server: 'http://datahub-datahub-gms.datahub-dev:8080'
  token: >token<
flags:
  set_system_metadata: false
source:
  type: datahub
  config:
    include_all_versions: false
    kafka_connection:
      bootstrap: 'http://datahub-prerequisites-kafka.datahub-dev:9092'
      schema_registry_url: 'https://datahub.dev.example.com/schema-registry/api/'
    urn_pattern:
      allow:
        - '^allowed.urn.*'
      deny:
        - '^denied.urn.*'
    stateful_ingestion:
      ignore_old_state: false
      enabled: true
    database_connection:
      password: ""
      database: datahub
      scheme: postgresql+psycopg2
      host_port: 'datahub-prerequisites-gcloud-sqlproxy.datahub-dev:5432'
      username: [email protected]
pipeline_name: datahub_source_1
@laurajsdias laurajsdias added the bug Bug report label Dec 18, 2024
@laurajsdias laurajsdias changed the title Failed to configure the sink (datahub-rest) [datahub migration] Failed to configure the sink (datahub-rest) Dec 18, 2024
@deepgarg-visa
Copy link
Contributor

@laurajsdias
Copy link
Author

@deepgarg-visa hey, thanks, now the sink is working. But, getting this different error:

[2025-01-07 12:44:34,367] INFO     {datahub.cli.ingest_cli:149} - DataHub CLI version: 0.14.0.5
[2025-01-07 12:44:34,590] INFO     {datahub.ingestion.run.pipeline:271} - Sink configured successfully. DataHubRestEmitter: configured to talk to https://datahub-aws.dev.example.com/api/gms with token: eyJh**********2NTw
[2025-01-07 12:44:35,675] ERROR    {datahub.entrypoints:218} - Command failed: Failed to find a registered source for type datahub: datahub is disabled; try running: pip install 'acryl-datahub[datahub]'

@laurajsdias
Copy link
Author

It run successfully now, I had to add this at the last step of the Ingestion, on Advanced:

  • Extra DataHub plugins field:
    ["acryl-datahub"]

  • Extra Pip Libraries field:
    ["acryl-datahub==0.14.0.5","httpx","cachetools","psycopg2"]

After the recipe successfully running, I was able to restore ElasticSearch indices:

kubectl create job --from=cronjob/datahub-datahub-restore-indices-job-template datahub-restore-indices-adhoc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

2 participants