You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When reading a compressed json file, with repartition_file_scans = true (default value), datafusion try to uncompress the file with parallel read. This will cause ArrowError(IoError("invalid gzip header", Custom { kind: InvalidInput, error: "invalid gzip header" }), None) because there is no gzip header in the middle.
To Reproduce
let df = ctx.read_json("C:/path/to/file.gz",NdJsonReadOptions::default().file_compression_type(FileCompressionType::GZIP).file_extension("gz").schema(&s3_user_schema())).await.unwrap();
Expected behavior
the data should read correctly without errors
Additional context
by put a print statement before the JsonOpener, we can see
Describe the bug
When reading a compressed json file, with
repartition_file_scans = true
(default value), datafusion try to uncompress the file with parallel read. This will causeArrowError(IoError("invalid gzip header", Custom { kind: InvalidInput, error: "invalid gzip header" }), None)
because there is no gzip header in the middle.To Reproduce
Expected behavior
the data should read correctly without errors
Additional context
by put a print statement before the
JsonOpener
, we can seewhich suggest it's indeed reading a compressed json file in parallel.
The text was updated successfully, but these errors were encountered: