You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello I was facing several parser errors while processing many csv files with glue, the enviroment is configured to run with Glue 4.0, and all the default configurations (also tried 3.0). The etl script I'm using is as follows:
The structure of my s3 repository is:
deviceId (eg: 21134412AB)
|--timestamp (eg: 2023-05-10)
|--|--data.csv
There are many folders, each representing a deviceId, with many timestamps within. When firing the ETL job the process is ending with the following error:
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 29745 in stage 0.0 failed 4 times, most recent failure: Lost task 29745.3 in stage 0.0 (TID 30301) (172.34.138.241 executor 8): com.amazonaws.services.glue.util.FatalException: Unable to parse file: data.csv
As you can see there is no info of which file is giving the error. The expected output could be at least the full path of "data.csv", something like "s3://bucket/21134412AB/2023-05-10/data.csv" so I would be able to fix the file.
The text was updated successfully, but these errors were encountered:
Hello I was facing several parser errors while processing many csv files with glue, the enviroment is configured to run with Glue 4.0, and all the default configurations (also tried 3.0). The etl script I'm using is as follows:
The structure of my s3 repository is:
deviceId (eg: 21134412AB)
|--timestamp (eg: 2023-05-10)
|--|--data.csv
There are many folders, each representing a deviceId, with many timestamps within. When firing the ETL job the process is ending with the following error:
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 29745 in stage 0.0 failed 4 times, most recent failure: Lost task 29745.3 in stage 0.0 (TID 30301) (172.34.138.241 executor 8): com.amazonaws.services.glue.util.FatalException: Unable to parse file: data.csv
As you can see there is no info of which file is giving the error. The expected output could be at least the full path of "data.csv", something like "s3://bucket/21134412AB/2023-05-10/data.csv" so I would be able to fix the file.
The text was updated successfully, but these errors were encountered: