Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting a Failed to find data source error when writing to kinesis from AWS Glue (Spark) #211

Open
RichardChester opened this issue Aug 14, 2024 · 0 comments

Comments

@RichardChester
Copy link

RichardChester commented Aug 14, 2024

I'm trying to take some data from a few sources do some transformations on it and load it into Kinesis using AWS glue and scala. The data is coming from static sources like tables and s3 buckets so it's not a streaming ETL job. Currently I'm using a dynamicFrame and trying to take my data sink and simply do a writeDynamicFrmae like so

// some logic to set up a source and do some transformations ending up with a Dynamic frame called myDynamicFrame

val kinesis = glueContext.getSinkWithFormat(
  conectionType = "kinesis",
  options = JsonOptions(
    Map(
      "streamArn" -> "arn:aws:kinesis:xxxxxxxxxxx/sink-stream",
      "startingPosition" -> "TRIM_HORIZON"
      "inferSchema" -> "true"
    )
  )
)
kinesis.writeDynamicFrame(myDynamicFrame)

My thought would be that this would take the data from the dynamic frame and push it into kinesis however I instead get this error.

Error writing to Kinesis: Failed to find data source: kinesis. Please find packages at https://spark.apache.org/third-party-projects.html

I'm using glue version 4 and in the documentation it says you can specify kinesis https://docs.aws.amazon.com/glue/latest/dg/glue-etl-scala-apis-glue-gluecontext.html#glue-etl-scala-apis-glue-gluecontext-defs-getSinkWithFormat

There are some other documentation that talks about creating a writer from a data frame and using forEachBatch methods but these look like there referring to jobs where kinesis is the source and it's a streaming etl job which I wouldn't think this is since we're getting the data in batches from s3.

also if it helps its scala version 2.12.19 spark 3.3 and glue v4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant