- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 19.2k
Description
Pandas version checks
- 
I have checked that this issue has not already been reported. 
- 
I have confirmed this bug exists on the latest version of pandas. 
- 
I have confirmed this bug exists on the main branch of pandas. 
Reproducible Example
import pandas as pd
# df = pd.read_parquet("gs://cloud-samples-data/bigquery/us-states/us-states.parquet") # works
df = pd.read_parquet(["gs://cloud-samples-data/bigquery/us-states/us-states.parquet"]) # fails
df = pd.read_parquet(["gs://cloud-samples-data/bigquery/us-states/us-states.parquet"], storage_options={"token":"anon"}) # fails with a different errorIssue Description
When passing in a list of files (remote) to read with read_parquet I see failures vs when I pass in the file path directly.
When I don't pass in storage options here's the error:
File /.venv/lib/python3.11/site-packages/pyarrow/dataset.py:368, in <listcomp>(.0)
    361 is_local = (
    362     isinstance(filesystem, (LocalFileSystem, _MockFileSystem)) or
    363     (isinstance(filesystem, SubTreeFileSystem) and
    364      isinstance(filesystem.base_fs, LocalFileSystem))
    365 )
    367 # allow normalizing irregular paths such as Windows local paths
--> 368 paths = [filesystem.normalize_path(_stringify_path(p)) for p in paths]
    370 # validate that all of the paths are pointing to existing *files*
    371 # possible improvement is to group the file_infos by type and raise for
    372 # multiple paths per error category
    373 if is_local:
File /.venv/lib/python3.11/site-packages/pyarrow/_fs.pyx:1012, in pyarrow._fs.FileSystem.normalize_path()
File /.venv/lib/python3.11/site-packages/pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()
File /.venv/lib/python3.11/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()
ArrowInvalid: Expected a local filesystem path, got a URI: 'gs://cloud-samples-data/bigquery/us-states/us-states.parquet'When I pass in any storage options
File .venv/lib/python3.11/site-packages/pandas/io/parquet.py:258, in PyArrowImpl.read(self, path, columns, filters, use_nullable_dtypes, dtype_backend, storage_options, filesystem, **kwargs)
    256 if manager == "array":
    257     to_pandas_kwargs["split_blocks"] = True
--> 258 path_or_handle, handles, filesystem = _get_path_or_handle(
    259     path,
    260     filesystem,
    261     storage_options=storage_options,
    262     mode="rb",
    263 )
    264 try:
    265     pa_table = self.api.parquet.read_table(
    266         path_or_handle,
    267         columns=columns,
   (...)    270         **kwargs,
    271     )
File .venv/lib/python3.11/site-packages/pandas/io/parquet.py:129, in _get_path_or_handle(path, fs, storage_options, mode, is_dir)
    123         fs, path_or_handle = fsspec.core.url_to_fs(
    124             path_or_handle, **(storage_options or {})
    125         )
    126 elif storage_options and (not is_url(path_or_handle) or mode != "rb"):
    127     # can't write to a remote url
    128     # without making use of fsspec at the moment
--> 129     raise ValueError("storage_options passed with buffer, or non-supported URL")
    131 handles = None
    132 if (
    133     not fs
    134     and not is_dir
   (...)    139     # fsspec resources can also point to directories
    140     # this branch is used for example when reading from non-fsspec URLs
ValueError: storage_options passed with buffer, or non-supported URLExpected Behavior
read_parquet succeeding whether we pass in a single path to a directory/file or a list of files to read.
Workarounds:
Creating and passing in a filesystem object explicitly in the read_parquet works or reading files one by one and concatenating is another option.
Installed Versions
pandas                : 2.3.3
numpy                 : 1.26.4
pytz                  : 2025.2
dateutil              : 2.9.0.post0
pip                   : None
Cython                : None
sphinx                : None
IPython               : 9.6.0
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : 4.14.2
blosc                 : None
bottleneck            : None
dataframe-api-compat  : None
fastparquet           : None
fsspec                : 2024.12.0
html5lib              : None
hypothesis            : None
gcsfs                 : 2024.12.0
jinja2                : 3.1.6
lxml.etree            : 5.4.0
matplotlib            : None
numba                 : 0.61.2
numexpr               : None
odfpy                 : None
openpyxl              : None
pandas_gbq            : None
psycopg2              : None
pymysql               : None
pyarrow               : 22.0.0
pyreadstat            : None
pytest                : 8.4.2
python-calamine       : None
pyxlsb                : None
s3fs                  : None
scipy                 : 1.16.3
sqlalchemy            : None
tables                : None
tabulate              : 0.9.0
xarray                : None
xlrd                  : None
xlsxwriter            : None
zstandard             : None
tzdata                : 2025.2
qtpy                  : None
pyqt5                 : None