You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to take some data from a few sources do some transformations on it and load it into Kinesis using AWS glue and scala. The data is coming from static sources like tables and s3 buckets so it's not a streaming ETL job. Currently I'm using a dynamicFrame and trying to take my data sink and simply do a writeDynamicFrmae like so
// some logic to set up a source and do some transformations ending up with a Dynamic frame called myDynamicFrame
val kinesis = glueContext.getSinkWithFormat(
conectionType = "kinesis",
options = JsonOptions(
Map(
"streamArn" -> "arn:aws:kinesis:xxxxxxxxxxx/sink-stream",
"startingPosition" -> "TRIM_HORIZON"
"inferSchema" -> "true"
)
)
)
kinesis.writeDynamicFrame(myDynamicFrame)
My thought would be that this would take the data from the dynamic frame and push it into kinesis however I instead get this error.
Error writing to Kinesis: Failed to find data source: kinesis. Please find packages at https://spark.apache.org/third-party-projects.html
There are some other documentation that talks about creating a writer from a data frame and using forEachBatch methods but these look like there referring to jobs where kinesis is the source and it's a streaming etl job which I wouldn't think this is since we're getting the data in batches from s3.
also if it helps its scala version 2.12.19 spark 3.3 and glue v4
The text was updated successfully, but these errors were encountered:
I'm trying to take some data from a few sources do some transformations on it and load it into Kinesis using AWS glue and scala. The data is coming from static sources like tables and s3 buckets so it's not a streaming ETL job. Currently I'm using a dynamicFrame and trying to take my data sink and simply do a writeDynamicFrmae like so
My thought would be that this would take the data from the dynamic frame and push it into kinesis however I instead get this error.
Error writing to Kinesis: Failed to find data source: kinesis. Please find packages at https://spark.apache.org/third-party-projects.html
I'm using glue version 4 and in the documentation it says you can specify kinesis https://docs.aws.amazon.com/glue/latest/dg/glue-etl-scala-apis-glue-gluecontext.html#glue-etl-scala-apis-glue-gluecontext-defs-getSinkWithFormat
There are some other documentation that talks about creating a writer from a data frame and using forEachBatch methods but these look like there referring to jobs where kinesis is the source and it's a streaming etl job which I wouldn't think this is since we're getting the data in batches from s3.
also if it helps its scala version 2.12.19 spark 3.3 and glue v4
The text was updated successfully, but these errors were encountered: