Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airbyte 1.2.0 on Kubernetes/EKS: Cannot create new sources as "Test the Source" fails with HTTP 500 #48809

Open
rpopov opened this issue Dec 4, 2024 · 2 comments
Labels

Comments

@rpopov
Copy link

rpopov commented Dec 4, 2024

Helm Chart Version

1.2.0

What step the error happened?

Other

Relevant information

Observation
I just installed Airbyte 1.2.0 on Kubernetes / AWS EKS. When creating a new connection and clicking the "Set up source" button, the "Test the Source" panel shows "An unexpected error occurred. Please report this if the issue persists. (HTTP 500)". This happens always when setting up a JIRA and GitLab sources (not tested with other connectors).

Analysis
The source testing involves the step where the airbyte-workload-launcher pod spawns a new pod source-jira-check which terminates with exceptions. Then the launcher reports internal server error to the UI because of the exception:

The exception in workload-launcher

Caused by: java.lang.RuntimeException: Init container for Pod: pods did not complete successfully. Actual termination reason: Error
   at io.airbyte.workload.launcher.pods.KubePodLauncher.waitForPodInitComplete(KubePodLauncher.kt:118)
   at io.airbyte.workload.launcher.pods.KubePodClient.waitForPodInitComplete(KubePodClient.kt:270)
   at io.airbyte.workload.launcher.pods.KubePodClient.launchConnectorWithSidecar(KubePodClient.kt:234)
   at io.airbyte.workload.launcher.pods.KubePodClient.launchCheck(KubePodClient.kt:167)
   at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:49)
   at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:24)
   at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:42)
   ... 53 common frames omitted

The exception in the source-jira-chek pod is:

init com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [EnvironmentVariableCredentialsProvider: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)), SystemPropertiesCredentialsProvider: Unable to load AWS credentials from Java system properties (aws.accessKeyId and aws.secretKey), WebIdentityTokenCredentialsProvider: You must specify a value for roleArn and roleSessionName, com.amazonaws.auth.profile.ProfileCredentialsProvider@729d6ee2: profile file cannot be null, com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper@611587f7: Unauthorized (Service: null; Status Code: 401; Error Code: null; Request ID: null; Proxy: null)]
init     at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:142)
init     at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1269)
init     at com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:845)
init     at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:794)
init     at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
init     at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
init     at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
init     at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
init     at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
init     at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
init     at com.amazonaws.services.secretsmanager.AWSSecretsManagerClient.doInvoke(AWSSecretsManagerClient.java:2334)
init     at com.amazonaws.services.secretsmanager.AWSSecretsManagerClient.invoke(AWSSecretsManagerClient.java:2301)
init     at com.amazonaws.services.secretsmanager.AWSSecretsManagerClient.invoke(AWSSecretsManagerClient.java:2290)
init     at com.amazonaws.services.secretsmanager.AWSSecretsManagerClient.executeDescribeSecret(AWSSecretsManagerClient.java:669)
init     at com.amazonaws.services.secretsmanager.AWSSecretsManagerClient.describeSecret(AWSSecretsManagerClient.java:638)
init     at com.amazonaws.secretsmanager.caching.cache.SecretCacheItem.executeRefresh(SecretCacheItem.java:102)
init     at com.amazonaws.secretsmanager.caching.cache.SecretCacheItem.executeRefresh(SecretCacheItem.java:32)
init     at com.amazonaws.secretsmanager.caching.cache.SecretCacheObject.refresh(SecretCacheObject.java:188)
init     at com.amazonaws.secretsmanager.caching.cache.SecretCacheObject.getSecretValue(SecretCacheObject.java:286)
init     at com.amazonaws.secretsmanager.caching.SecretCache.getSecretString(SecretCache.java:123)
init     at io.airbyte.config.secrets.persistence.AwsSecretManagerPersistence.read(AwsSecretManagerPersistence.kt:46)
init     at io.airbyte.config.secrets.SecretsHelpers.getOrThrowSecretValue(SecretsHelpers.kt:280)
init     at io.airbyte.config.secrets.SecretsHelpers.combineConfig(SecretsHelpers.kt:169)
init     at io.airbyte.config.secrets.SecretsHelpers$combineConfig$1.invoke(SecretsHelpers.kt:179)
init     at io.airbyte.config.secrets.SecretsHelpers$combineConfig$1.invoke(SecretsHelpers.kt:173)
init     at io.airbyte.config.secrets.SecretsHelpers.combineConfig$lambda$2(SecretsHelpers.kt:173)
init     at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
init     at io.airbyte.config.secrets.SecretsHelpers.combineConfig(SecretsHelpers.kt:173)
init     at io.airbyte.config.secrets.hydration.RealSecretsHydrator.hydrateFromDefaultSecretPersistence(RealSecretsHydrator.kt:21)

Indeed, the source test pod is instantiated with the manifest:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2024-12-04T20:59:04Z"
  labels:
    actor_type: source
    airbyte: job-pod
    attempt_id: "0"
    auto_id: 7eb53de9-7519-432c-8f5d-c4847c521a75
    job_id: 8808e119-d606-4868-a0f4-0d7c534f7cff
    job_type: check
    workspace_id: fe6d8ede-6956-4c1c-9d9a-8e95712efd27
  name: source-jira-check-8808e119-d606-4868-a0f4-0d7c534f7cff-0-onxwn
  namespace: airbyte
  resourceVersion: "345822213"
  uid: 198055a6-d1f4-4809-889b-0cd3d9021c78
spec:
  automountServiceAccountToken: true
  containers:
  - command:
    - sh
    - -c
    - |-
      trap "touch TERMINATED" EXIT
      /app/airbyte-app/bin/airbyte-connector-sidecar
    env:
    - name: ACCEPTANCE_TEST_ENABLED
      value: "false"
    - name: AWS_SECRET_ACCESS_KEY
      value: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    - name: WORKLOAD_API_READ_TIMEOUT_SECONDS
      value: "300"
    - name: KEYCLOAK_CLIENT_ID
    - name: AWS_DEFAULT_REGION
      value: eu-west-1
    - name: AIRBYTE_API_AUTH_HEADER_NAME
    - name: STORAGE_BUCKET_ACTIVITY_PAYLOAD
      value: airbyte-storage
    - name: MICRONAUT_ENVIRONMENTS
      value: worker-v2,control-plane,oss
    - name: AWS_ACCESS_KEY_ID
      value: XXXXXXXXXXXXXXXXXXXXXXXXXXXX
    - name: WORKLOAD_API_RETRY_DELAY_SECONDS
      value: "2"
    - name: DATA_PLANE_SERVICE_ACCOUNT_EMAIL
    - name: LOG_LEVEL
      value: INFO
    - name: INTERNAL_API_HOST
      value: http://airbyte-airbyte-server-svc:8001
    - name: WORKLOAD_API_CONNECT_TIMEOUT_SECONDS
      value: "30"
    - name: STORAGE_BUCKET_LOG
      value: airbyte-poc-s3-653136ba0da14602a6e6e01836a4edb2
    - name: WORKLOAD_API_MAX_RETRIES
      value: "5"
    - name: WORKLOAD_API_HOST
      value: http://airbyte-workload-api-server-svc:8007
    - name: STORAGE_BUCKET_STATE
      value: airbyte-poc-s3-653136ba0da14602a6e6e01836a4edb2
    - name: S3_PATH_STYLE_ACCESS
    - name: AIRBYTE_API_AUTH_HEADER_VALUE
    - name: DATA_PLANE_SERVICE_ACCOUNT_CREDENTIALS_PATH
    - name: CLOUD_STORAGE_APPENDER_THREADS
      value: "1"
    - name: STORAGE_BUCKET_WORKLOAD_OUTPUT
      value: airbyte-poc-s3-653136ba0da14602a6e6e01836a4edb2
    - name: KEYCLOAK_INTERNAL_REALM_ISSUER
    - name: STORAGE_TYPE
      value: S3
    - name: CONTROL_PLANE_AUTH_ENDPOINT
    - name: WORKLOAD_API_BEARER_TOKEN
      valueFrom:
        secretKeyRef:
          key: WORKLOAD_API_BEARER_TOKEN
          name: airbyte-airbyte-secrets
    image: airbyte/connector-sidecar:1.2.0
    imagePullPolicy: IfNotPresent
    name: connector-sidecar
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /config
      name: airbyte-config
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-2c7d2
      readOnly: true
    workingDir: /config
  - command:
    - sh
    - -c
    - |-
      # fail loudly if entry point not set
      if [ -z "$AIRBYTE_ENTRYPOINT" ]; then
        echo "Entrypoint was not set! AIRBYTE_ENTRYPOINT must be set in the container."
        exit 127
      else
        echo "Using AIRBYTE_ENTRYPOINT: $AIRBYTE_ENTRYPOINT"
      fi

      # run connector in background and store PID
      (eval "$AIRBYTE_ENTRYPOINT check  --config /config/connectionConfiguration.json" > /config/jobOutput.json) &
      CHILD_PID=$!

      # run busy loop in background that checks for termination file and if present kills the connector operation and exits
      (while true; do if [ -f TERMINATED ]; then kill $CHILD_PID; exit 0; fi; sleep 10; done) &

      # wait on connector operation
      wait $CHILD_PID
      EXIT_CODE=$?

      # write its exit code to a file for the sidecar
      echo $EXIT_CODE > TEMP_EXIT_CODE.txt
      # use a swap file to make creation and writing atomic
      mv TEMP_EXIT_CODE.txt exitCode.txt

      # propagate connector exit code by assuming it
      exit $EXIT_CODE
    env:
    - name: AIRBYTE_VERSION
      value: 1.2.0
    - name: AIRBYTE_ROLE
    - name: DEPLOYMENT_MODE
      value: OSS
    - name: USE_RUNTIME_SECRET_PERSISTENCE
      value: "false"
    - name: OPERATION_TYPE
      value: check
    - name: WORKLOAD_ID
      value: 68e63de2-bb83-4c7e-93fa-a8a9051e3993_8808e119-d606-4868-a0f4-0d7c534f7cff_0_check
    image: airbyte/source-jira:3.2.1
    imagePullPolicy: IfNotPresent
    name: main
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /config
      name: airbyte-config
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-2c7d2
      readOnly: true
    workingDir: /config
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - {}
  initContainers:
  - env:
    - name: PUBLISH_METRICS
      value: "false"
    - name: ACCEPTANCE_TEST_ENABLED
      value: "false"
    - name: SECRET_PERSISTENCE
      value: aws_secret_manager
    - name: SECRET_STORE_GCP_PROJECT_ID
    - name: WORKLOAD_API_READ_TIMEOUT_SECONDS
      value: "300"
    - name: KEYCLOAK_CLIENT_ID
    - name: AIRBYTE_API_AUTH_HEADER_NAME
    - name: MICRONAUT_ENVIRONMENTS
      value: worker-v2,control-plane,oss
    - name: DD_INTEGRATION_GOOGLE_HTTP_CLIENT_ENABLED
      value: "false"
    - name: AWS_KMS_KEY_ARN
    - name: WORKLOAD_API_RETRY_DELAY_SECONDS
      value: "2"
    - name: DD_SERVICE
      value: airbyte-workload-init-container
    - name: METRIC_CLIENT
    - name: DATA_PLANE_SERVICE_ACCOUNT_EMAIL
    - name: LOG_LEVEL
      value: INFO
    - name: INTERNAL_API_HOST
      value: http://airbyte-airbyte-server-svc:8001
    - name: FEATURE_FLAG_CLIENT
    - name: WORKLOAD_API_CONNECT_TIMEOUT_SECONDS
      value: "30"
    - name: AB_AZURE_KEY_VAULT_VAULT_URL
    - name: AB_AZURE_KEY_VAULT_TENANT_ID
    - name: DD_INTEGRATION_NETTY_4_1_ENABLED
      value: "false"
    - name: DD_INTEGRATION_HTTPURLCONNECTION_ENABLED
      value: "false"
    - name: DD_INTEGRATION_URLCONNECTION_ENABLED
      value: "false"
    - name: WORKLOAD_API_MAX_RETRIES
      value: "5"
    - name: AB_AZURE_KEY_VAULT_TAGS
    - name: VAULT_ADDRESS
    - name: FEATURE_FLAG_BASEURL
    - name: DD_AGENT_HOST
    - name: DD_DOGSTATSD_PORT
    - name: WORKLOAD_API_HOST
      value: http://airbyte-workload-api-server-svc:8007
    - name: LAUNCHDARKLY_KEY
    - name: DD_INTEGRATION_GRPC_ENABLED
      value: "false"
    - name: DD_INTEGRATION_GRPC_CLIENT_ENABLED
      value: "false"
    - name: S3_PATH_STYLE_ACCESS
    - name: AWS_SECRET_MANAGER_SECRET_TAGS
    - name: AIRBYTE_API_AUTH_HEADER_VALUE
    - name: DATA_PLANE_SERVICE_ACCOUNT_CREDENTIALS_PATH
    - name: AWS_SECRET_MANAGER_REGION
      value: eu-west-1
    - name: DD_INTEGRATION_NETTY_ENABLED
      value: "false"
    - name: VAULT_PREFIX
    - name: FEATURE_FLAG_PATH
      value: /flags
    - name: CLOUD_STORAGE_APPENDER_THREADS
      value: "1"
    - name: OTEL_COLLECTOR_ENDPOINT
    - name: KEYCLOAK_INTERNAL_REALM_ISSUER
    - name: DD_INTEGRATION_GRPC_SERVER_ENABLED
      value: "false"
    - name: CONTROL_PLANE_AUTH_ENDPOINT
    - name: WORKLOAD_API_BEARER_TOKEN
      valueFrom:
        secretKeyRef:
          key: WORKLOAD_API_BEARER_TOKEN
          name: airbyte-airbyte-secrets
    - name: USE_RUNTIME_SECRET_PERSISTENCE
      value: "false"
    - name: OPERATION_TYPE
      value: check
    - name: WORKLOAD_ID
      value: 68e63de2-bb83-4c7e-93fa-a8a9051e3993_8808e119-d606-4868-a0f4-0d7c534f7cff_0_check
    image: airbyte/workload-init-container:1.2.0
    imagePullPolicy: IfNotPresent
    name: init
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /config
      name: airbyte-config
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-2c7d2
      readOnly: true
    workingDir: /config
  nodeName: ip-10-80-193-52.eu-west-1.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: airbyte-admin
  serviceAccountName: airbyte-admin
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: paysafe.com/purpose
    operator: Equal
    value: foundation
  - effect: NoSchedule
    key: paysafe.com/purpose
    operator: Equal
    value: internal-tools
  - effect: NoSchedule
    key: paysafe.com/purpose
    operator: Equal
    value: deployers
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir:
      medium: Memory
    name: airbyte-config
  - name: kube-api-access-2c7d2
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-12-04T20:59:12Z"
    status: "False"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2024-12-04T20:59:04Z"
    message: 'containers with incomplete status: [init]'
    reason: ContainersNotInitialized
    status: "False"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-12-04T20:59:04Z"
    reason: PodFailed
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-12-04T20:59:04Z"
    reason: PodFailed
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-12-04T20:59:04Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: airbyte/connector-sidecar:1.2.0
    imageID: ""
    lastState: {}
    name: connector-sidecar
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        reason: PodInitializing
  - image: airbyte/source-jira:3.2.1
    imageID: ""
    lastState: {}
    name: main
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        reason: PodInitializing
  hostIP: 10.80.193.52
  hostIPs:
  - ip: 10.80.193.52
  initContainerStatuses:
  - containerID: containerd://d6298b3480696892bfc324fc7201e99890f0839033706f47a7339ef60e42fedd
    image: docker.io/airbyte/workload-init-container:1.2.0
    imageID: docker.io/airbyte/workload-init-container@sha256:6e28850e18f621334a3545acfb4526c22f9a60e5d7104905103d872e29185b47
    lastState: {}
    name: init
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://d6298b3480696892bfc324fc7201e99890f0839033706f47a7339ef60e42fedd
        exitCode: 1
        finishedAt: "2024-12-04T20:59:10Z"
        reason: Error
        startedAt: "2024-12-04T20:59:05Z"
  phase: Failed
  podIP: 10.80.193.39
  podIPs:
  - ip: 10.80.193.39
  qosClass: BestEffort
  startTime: "2024-12-04T20:59:04Z"

Among the 3 containers in the source test pod, the init container fails with the exception above and stays not ready, Error state whereas the other containers are in not ready, Initializing state:

IDX↑   NAME                PF  IMAGE                                   READY   STATE               RESTARTS PROBES(L:R:S)     CPU   MEM   CPU/R:L   MEM/R:L   %CPU/R   %CPU/L   %MEM/R   %MEM/L PORTS   AGE 
I1     init                ●   airbyte/workload-init-container:1.2.0   false   Error                      0 off:off:off         0     0       0:0       0:0      n/a      n/a      n/a      n/a         34m 
M1     connector-sidecar   ●   airbyte/connector-sidecar:1.2.0         false   PodInitializing            0 off:off:off         0     0       0:0       0:0      n/a      n/a      n/a      n/a         34m 
M2     main                ●   airbyte/source-jira:3.2.1               false   PodInitializing            0 off:off:off         0     0       0:0       0:0      n/a      n/a      n/a      n/a         34m 

Analyzing the environment of the init container does not contain the AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID env. variables (and others), while the environment of the connector-sidecar container contains them. BTW, the env. of the main container contains almost nothing, so it might fail after the init container too.

The values.yaml this helm chart is installed with is:

global:
  serviceAccountName: &service-account-name "airbyte-admin"

  deploymentMode: "oss"
  edition: "community"

  auth:
    enabled: false
    instanceAdmin:
      secretName: "airbyte-config-secrets"
      firstName: ""
      lastName:  ""
      emailSecretKey: "instance-admin-email"
      passwordSecretKey: "instance-admin-password"
    
  env_vars: 
    AWS_ACCESS_KEY_ID: "XXXXXXXXXXXXXXXXXXXXX"
    AWS_SECRET_ACCESS_KEY: "XXXXXXXXXXXXXXXXXXXXX"

Note:

  • Some irrelevant details are skipped, including the additional tolerations needed for our specific EKS configuration.
  • The tests with airbyte 1.1.0 failed due to missing tolerations so the source test pods did not start at all.

Suggestion

  • Make sure the init container and the main container have at least the environment the sidecar container runs in (with possible overwrites)
  • This seems to be related to the logic of the InitContainerFactory and data ConnectorPodFactory classes and some difference between them.

Relevant log output

The logs are already provided above
@rpopov
Copy link
Author

rpopov commented Dec 7, 2024

I realized that the workload-launcher pod should have the following values provided in values.yaml in order the spawned source test pods not to fail:

workload-launcher:
    
  env_vars:
    # Controls the presence of the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in the spawned pods' environment
    AWS_ASSUME_ROLE_SECRET_NAME: airbyte-config-secrets
    
    AWS_SECRET_MANAGER_ACCESS_KEY_ID_REF_NAME: airbyte-config-secrets
    AWS_SECRET_MANAGER_ACCESS_KEY_ID_REF_KEY: aws-secret-manager-access-key-id
    
    AWS_SECRET_MANAGER_SECRET_ACCESS_KEY_REF_NAME: airbyte-config-secrets
    AWS_SECRET_MANAGER_SECRET_ACCESS_KEY_REF_KEY: aws-secret-manager-secret-access-key

This solved the problem, though it revealed a significant shortage in the documentation of airbyte 1.2.0.

@joeybenamy
Copy link

I have a similar issue in Airbyte 1.3.0 with assuming a role via a service account for S3 access

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants