You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey, I'm trying to concatenate two files and to avoid reading everything to memory at once, I wanted to use read_row_group for my solution, but it fails.
I think it's due to fields like these:
pyarrow.Field<to: list<item: string>>
But I'm not sure. Is this a duplicate? The issue linked in the code is resolved
// ARROW-3762(wesm): If inout_array is a chunked array, we reject as this is
Stacktrace is
File "/data/teftel/teftel-data/teftel_data/parquet_stream.py", line 163, in read_batches table = pf.read_row_group(ix, columns=self._columns) File "/home/kuba/.local/share/virtualenvs/teftel-o6G5iH_l/lib/python3.6/site-packages/pyarrow/parquet.py", line 186, in read_row_group use_threads=use_threads) File "pyarrow/_parquet.pyx", line 695, in pyarrow._parquet.ParquetReader.read_row_group File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status pyarrow.lib.ArrowNotImplementedError: Nested data conversions not implemented for chunked array outputs
Wes McKinney / @wesm:
I fixed some cases where this occurs in ARROW-4688, but it is still possible to hit this error for very large row groups (> 2GB of string data in a row group). I didn't see a follow up JIRA to this or ARROW-3762 so we can use this one for the issue
Judah:
[~wesm_impala_7e40] I'm also running into this issue. Is this likely to be fixed / easy to fix? I'd be happy to give it a go but not really sure where to start.
Hey, I'm trying to concatenate two files and to avoid reading everything to memory at once, I wanted to use
read_row_group
for my solution, but it fails.I think it's due to fields like these:
pyarrow.Field<to: list<item: string>>
But I'm not sure. Is this a duplicate? The issue linked in the code is resolved
arrow/cpp/src/parquet/arrow/reader.cc
Line 915 in fd0b90a
Stacktrace is
File "/data/teftel/teftel-data/teftel_data/parquet_stream.py", line 163, in read_batches
table = pf.read_row_group(ix, columns=self._columns)
File "/home/kuba/.local/share/virtualenvs/teftel-o6G5iH_l/lib/python3.6/site-packages/pyarrow/parquet.py", line 186, in read_row_group
use_threads=use_threads)
File "pyarrow/_parquet.pyx", line 695, in pyarrow._parquet.ParquetReader.read_row_group
File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Nested data conversions not implemented for chunked array outputs
Reporter: Jakub Okoński
Related issues:
Note: This issue was originally created as ARROW-5030. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: