-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Failed to create a batch reader for type System.String[] #497
Comments
Hi @pathacke, what was used to write this file? That doesn't match the expected schema for a Map logical type: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps Do other libraries like PyArrow handle reading this column? It seems like whatever is writing these files should be fixed to use the correct schema, but if this is actually widely used and accepted behaviour then ParquetSharp might need to handle this. |
Hi @adamreeve We have been using the Parquet.Net library, which processes the same file without any issues. Since our customers utilize various engines to write checkpoint Parquet files, controlling this aspect would be challenging. Would it be possible to make the key_value field check in Parquet.Sharp less strict? |
OK yes allowing a map annotation on the inner |
Could you provide an estimated timeline for when this change will be available so we can plan accordingly? |
We don't have a fixed release schedule but I was planning on making a new beta release based on Arrow C++ 18.1.0 some time in the next couple of weeks and should be able to include a fix for this. |
This should be fixed by #499 |
This fix is now published in the 18.1.0-beta1 release. |
Issue Description
Getting below error while reading checkpoint parquet file.
Checkpoint file schema is attached.
DIM_Calendar_CheckpointFileSchema.txt
In the checkpoint file, there is a configuration Group node of type Map, which contains a
key_value
field with a LogicalType ofMap
. According to ParquetSharp's implementation, this field must be ofNone
Type. However, the checkpoint Parquet file defines it as aMap
type, leading to the above exception.The failure is caused by a check in
SchemaUtils.cs
that validates thekey_value
field. This check expectskey_value
to have aNoneType
, but in the checkpoint Parquet file, it is defined as aMap
type, leading to the exception.Here is the condition which checks for None type.
https://github.com/G-Research/ParquetSharp/blob/master/csharp/Schema/SchemaUtils.cs#L34
The exception is thrown from this line.
ParquetSharp/csharp/LogicalBatchReader/LogicalBatchReaderFactory.cs
Line 145 in 6a0bea7
Environment Information
Steps To Reproduce
Read the checkpoint Parquet file containing a Map-type field where the
key_value
field is also of type Map.Expected Behavior
ParquetSharp library should be able to read checkpoint parquet file with field of Logical type Map.
Additional Context (Optional)
No response
The text was updated successfully, but these errors were encountered: