[Python] read_row_group fails with Nested data conversions not implemented for chunked array outputs #21526

asfimport · 2019-03-27T14:34:17Z

Hey, I'm trying to concatenate two files and to avoid reading everything to memory at once, I wanted to use read_row_group for my solution, but it fails.

I think it's due to fields like these:

pyarrow.Field<to: list<item: string>>

But I'm not sure. Is this a duplicate? The issue linked in the code is resolved

arrow/cpp/src/parquet/arrow/reader.cc

Line 915 in fd0b90a

// ARROW-3762(wesm): If inout_array is a chunked array, we reject as this is

Stacktrace is

File "/data/teftel/teftel-data/teftel_data/parquet_stream.py", line 163, in read_batches
table = pf.read_row_group(ix, columns=self._columns)
File "/home/kuba/.local/share/virtualenvs/teftel-o6G5iH_l/lib/python3.6/site-packages/pyarrow/parquet.py", line 186, in read_row_group
use_threads=use_threads)
File "pyarrow/_parquet.pyx", line 695, in pyarrow._parquet.ParquetReader.read_row_group
File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Nested data conversions not implemented for chunked array outputs

Reporter: Jakub Okoński

Related issues:

[C++] Support nested data conversions for chunked array (is duplicated by)
[C++][Parquet] 16MB limit on (nested) column chunk prevents tuning row_group_size (relates to)
[C++] Parquet arrow::Table reads error when overflowing capacity of BinaryArray (relates to)

_{Note: This issue was originally created as ARROW-5030. Please see the migration documentation for further details.}

The text was updated successfully, but these errors were encountered:

asfimport · 2019-03-28T22:15:35Z

Wes McKinney / @wesm:
I fixed some cases where this occurs in ARROW-4688, but it is still possible to hit this error for very large row groups (> 2GB of string data in a row group). I didn't see a follow up JIRA to this or ARROW-3762 so we can use this one for the issue

https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/reader.cc#L915

asfimport · 2022-04-27T16:50:24Z

Judah:
[~wesm_impala_7e40] I'm also running into this issue. Is this likely to be fixed / easy to fix? I'd be happy to give it a go but not really sure where to start.

Maxl94 · 2024-05-22T06:02:40Z

In case someone wants to load a pandas dataframe I want to share my workaround.

For me installing fastparquet and specifying the eninge='fastparquet' argument in the load_parquet function worked.

yuxi-liu-wired · 2024-09-14T00:34:12Z

In case someone wants to load a pandas dataframe I want to share my workaround.

For me installing fastparquet and specifying the eninge='fastparquet' argument in the load_parquet function worked.

Concurring. If the parquet file contains a dictionary/list/struct, then the following

import pandas as pd
df = pd.read_parquet(parquet_path)

throws an error "ArrowNotImplementedError: Nested data conversions not implemented for chunked array outputs"

But only if the parquet file is over 1 GB. If it is under 1 GB, then it loads with no problems.

minyoung mentioned this issue Feb 1, 2023

[Go] How to handle large lists? #33875

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] read_row_group fails with Nested data conversions not implemented for chunked array outputs #21526

[Python] read_row_group fails with Nested data conversions not implemented for chunked array outputs #21526

asfimport commented Mar 27, 2019 •

edited

Loading

asfimport commented Mar 28, 2019

asfimport commented Apr 27, 2022

Maxl94 commented May 22, 2024

yuxi-liu-wired commented Sep 14, 2024

[Python] read_row_group fails with Nested data conversions not implemented for chunked array outputs #21526

[Python] read_row_group fails with Nested data conversions not implemented for chunked array outputs #21526

Comments

asfimport commented Mar 27, 2019 • edited Loading

Related issues:

asfimport commented Mar 28, 2019

asfimport commented Apr 27, 2022

Maxl94 commented May 22, 2024

yuxi-liu-wired commented Sep 14, 2024

asfimport commented Mar 27, 2019 •

edited

Loading