Replies: 1 comment
-
Yes Hive's string limit is 2gb. We should add large_string to data type mappings. Would you open an issue for this, please? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to create an Athena table on top of existing parquet files on S3. These files have the pyarrow large_string type for some columns, so when doing a
wr.s3.read_parquet_metadata
I getawswrangler.exceptions.UnsupportedType: Unsupported Pyarrow type: large_string
.The pyarrow2athena function only checks for string and not large_string:
https://github.com/aws/aws-sdk-pandas/blob/6c0f65b6b63b223bec1059ecd037697b068f7e63/awswrangler/_data_types.py#L41C8-L41C8
Curious if the pyarrow large_string type could be supported here?
When I create a glue table with
string
as the type for these existing parquet files, Athena queries seem to function normally. I believe the string limit is 2gb in Athena, so not sure it that's motivation for not supporting the type.Beta Was this translation helpful? Give feedback.
All reactions