Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Failed to create a batch reader for type System.String[] #497

Closed
pathacke opened this issue Jan 21, 2025 · 7 comments
Closed

[BUG]: Failed to create a batch reader for type System.String[] #497

pathacke opened this issue Jan 21, 2025 · 7 comments

Comments

@pathacke
Copy link

Issue Description

Getting below error while reading checkpoint parquet file.

  at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at ParquetSharp.LogicalBatchReaderFactory`2.MakeGenericReader(Type elementType, Node[] schemaNodes, Int16 nullDefinitionLevel, Int16 repetitionLevel)
   at ParquetSharp.LogicalBatchReaderFactory`2.GetCompoundReader[TElement](Node[] schemaNodes, Int16 definitionLevel, Int16 repetitionLevel)","innerExceptionData":[{"code":"InternalError","subCode":0,"message":"Exception has been thrown by the target of an invocation.","timeStamp":"2025-01-15T22:59:49.5463229Z","httpStatusCodeInternal":500,"exceptionType":"System.Reflection.TargetInvocationException","hresult":-2146232828,"details":[{"code":"RootActivityId","message":"ccc36f36-2e99-48a7-894e-3cbbe7ef7222"},{"code":"RootActivityId","message":"ccc36f36-2e99-48a7-894e-3cbbe7ef7222"},{"code":"ProcessId","message":"7400"},{"code":"Param1","message":"Exception has been thrown by the target of an invocation."}],"callStack":"   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)
   at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at ParquetSharp.LogicalBatchReaderFactory`2.MakeGenericReader(Type elementType, Node[] schemaNodes, Int16 nullDefinitionLevel, Int16 repetitionLevel)
   at ParquetSharp.LogicalBatchReaderFactory`2.GetCompoundReader[TElement](Node[] schemaNodes, Int16 definitionLevel, Int16 repetitionLevel)","innerExceptionData":[{"code":"InternalError","subCode":0,"message":"Failed to create a batch reader for type System.String[]","timeStamp":"2025-01-15T22:59:49.5463229Z","httpStatusCodeInternal":500,"exceptionType":"System.Exception","hresult":-2146233088,"details":[{"code":"RootActivityId","message":"ccc36f36-2e99-48a7-894e-3cbbe7ef7222"},{"code":"RootActivityId","message":"ccc36f36-2e99-48a7-894e-3cbbe7ef7222"},{"code":"ProcessId","message":"7400"},{"code":"Param1","message":"Failed to create a batch reader for type System.String[]"}],"callStack":"   at ParquetSharp.LogicalBatchReaderFactory`2.GetCompoundReader[TElement](Node[] schemaNodes, Int16 definitionLevel, Int16 repetitionLevel)"}]}]}]}

Checkpoint file schema is attached.
DIM_Calendar_CheckpointFileSchema.txt

In the checkpoint file, there is a configuration Group node of type Map, which contains a key_value field with a LogicalType of Map. According to ParquetSharp's implementation, this field must be of None Type. However, the checkpoint Parquet file defines it as a Map type, leading to the above exception.

The failure is caused by a check in SchemaUtils.cs that validates the key_value field. This check expects key_value to have a NoneType, but in the checkpoint Parquet file, it is defined as a Map type, leading to the exception.

Here is the condition which checks for None type.
https://github.com/G-Research/ParquetSharp/blob/master/csharp/Schema/SchemaUtils.cs#L34

The exception is thrown from this line.

throw new Exception($"Failed to create a batch reader for type {typeof(TElement)}");

Environment Information

  • ParquetSharp Version: [e.g. 1.0.1]
  • .NET Framework/SDK Version: [e.g. .NET Framework 4.7.2]
  • Operating System: [e.g. Windows 10]

Steps To Reproduce

Read the checkpoint Parquet file containing a Map-type field where the key_value field is also of type Map.

Image

Expected Behavior

ParquetSharp library should be able to read checkpoint parquet file with field of Logical type Map.

Additional Context (Optional)

No response

@adamreeve
Copy link
Contributor

Hi @pathacke, what was used to write this file? That doesn't match the expected schema for a Map logical type: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps

Do other libraries like PyArrow handle reading this column?

It seems like whatever is writing these files should be fixed to use the correct schema, but if this is actually widely used and accepted behaviour then ParquetSharp might need to handle this.

@pathacke
Copy link
Author

Hi @adamreeve We have been using the Parquet.Net library, which processes the same file without any issues. Since our customers utilize various engines to write checkpoint Parquet files, controlling this aspect would be challenging.

Would it be possible to make the key_value field check in Parquet.Sharp less strict?

@adamreeve
Copy link
Contributor

OK yes allowing a map annotation on the inner key_value node seems reasonable as this doesn't introduce any ambiguity.

@pathacke
Copy link
Author

Could you provide an estimated timeline for when this change will be available so we can plan accordingly?

@adamreeve
Copy link
Contributor

We don't have a fixed release schedule but I was planning on making a new beta release based on Arrow C++ 18.1.0 some time in the next couple of weeks and should be able to include a fix for this.

@adamreeve
Copy link
Contributor

This should be fixed by #499

@adamreeve
Copy link
Contributor

This fix is now published in the 18.1.0-beta1 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants