Skip to content

Commit

Permalink
feat(python): Support loading data from multiple Excel/ODS workbooks
Browse files Browse the repository at this point in the history
  • Loading branch information
alexander-beedie committed Dec 22, 2024
1 parent 676f10d commit fedb71f
Show file tree
Hide file tree
Showing 9 changed files with 171 additions and 96 deletions.
2 changes: 1 addition & 1 deletion py-polars/polars/_typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@ def fetchmany(self, *args: Any, **kwargs: Any) -> Any:
# LazyFrame engine selection
EngineType: TypeAlias = Union[Literal["cpu", "gpu"], "GPUEngine"]

ScanSource: TypeAlias = Union[
FileSource: TypeAlias = Union[
str,
Path,
IO[bytes],
Expand Down
5 changes: 2 additions & 3 deletions py-polars/polars/io/avro.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,8 @@ def read_avro(
source
Path to a file or a file-like object (by "file-like object" we refer to objects
that have a `read()` method, such as a file handler like the builtin `open`
function, or a `BytesIO` instance).
For file-like objects,
stream position may not be updated accordingly after reading.
function, or a `BytesIO` instance). For file-like objects, the stream position
may not be updated accordingly after reading.
columns
Columns to select. Accepts a list of column indices (starting at zero) or a list
of column names.
Expand Down
10 changes: 4 additions & 6 deletions py-polars/polars/io/csv/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,9 +90,8 @@ def read_csv(
Path to a file or a file-like object (by "file-like object" we refer to objects
that have a `read()` method, such as a file handler like the builtin `open`
function, or a `BytesIO` instance). If `fsspec` is installed, it will be used
to open remote files.
For file-like objects,
stream position may not be updated accordingly after reading.
to open remote files. For file-like objects, the stream position may not be
updated accordingly after reading.
has_header
Indicate if the first row of the dataset is a header or not. If set to False,
column names will be autogenerated in the following format: `column_x`, with
Expand Down Expand Up @@ -764,9 +763,8 @@ def read_csv_batched(
Path to a file or a file-like object (by "file-like object" we refer to objects
that have a `read()` method, such as a file handler like the builtin `open`
function, or a `BytesIO` instance). If `fsspec` is installed, it will be used
to open remote files.
For file-like objects,
stream position may not be updated accordingly after reading.
to open remote files. For file-like objects, the stream position may not be
updated accordingly after reading.
has_header
Indicate if the first row of the dataset is a header or not. If set to False,
column names will be autogenerated in the following format: `column_x`, with
Expand Down
15 changes: 6 additions & 9 deletions py-polars/polars/io/ipc/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,8 @@ def read_ipc(
Path to a file or a file-like object (by "file-like object" we refer to objects
that have a `read()` method, such as a file handler like the builtin `open`
function, or a `BytesIO` instance). If `fsspec` is installed, it will be used
to open remote files.
For file-like objects,
stream position may not be updated accordingly after reading.
to open remote files. For file-like objects, the stream position may not be
updated accordingly after reading.
columns
Columns to select. Accepts a list of column indices (starting at zero) or a list
of column names.
Expand Down Expand Up @@ -241,9 +240,8 @@ def read_ipc_stream(
Path to a file or a file-like object (by "file-like object" we refer to objects
that have a `read()` method, such as a file handler like the builtin `open`
function, or a `BytesIO` instance). If `fsspec` is installed, it will be used
to open remote files.
For file-like objects,
stream position may not be updated accordingly after reading.
to open remote files. For file-like objects, the stream position may not be
updated accordingly after reading.
columns
Columns to select. Accepts a list of column indices (starting at zero) or a list
of column names.
Expand Down Expand Up @@ -331,9 +329,8 @@ def read_ipc_schema(source: str | Path | IO[bytes] | bytes) -> dict[str, DataTyp
source
Path to a file or a file-like object (by "file-like object" we refer to objects
that have a `read()` method, such as a file handler like the builtin `open`
function, or a `BytesIO` instance).
For file-like objects,
stream position may not be updated accordingly after reading.
function, or a `BytesIO` instance). For file-like objects, the stream position
may not be updated accordingly after reading.
Returns
-------
Expand Down
5 changes: 2 additions & 3 deletions py-polars/polars/io/json/read.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,8 @@ def read_json(
source
Path to a file or a file-like object (by "file-like object" we refer to objects
that have a `read()` method, such as a file handler like the builtin `open`
function, or a `BytesIO` instance).
For file-like objects,
stream position may not be updated accordingly after reading.
function, or a `BytesIO` instance). For file-like objects, the stream position
may not be updated accordingly after reading.
schema : Sequence of str, (str,DataType) pairs, or a {str:DataType,} dict
The DataFrame schema may be declared in several ways:
Expand Down
5 changes: 2 additions & 3 deletions py-polars/polars/io/ndjson.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,8 @@ def read_ndjson(
source
Path to a file or a file-like object (by "file-like object" we refer to objects
that have a `read()` method, such as a file handler like the builtin `open`
function, or a `BytesIO` instance).
For file-like objects,
stream position may not be updated accordingly after reading.
function, or a `BytesIO` instance). For file-like objects, the stream position
may not be updated accordingly after reading.
schema : Sequence of str, (str,DataType) pairs, or a {str:DataType,} dict
The DataFrame schema may be declared in several ways:
Expand Down
13 changes: 6 additions & 7 deletions py-polars/polars/io/parquet/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,14 @@
from typing import Literal

from polars import DataFrame, DataType, LazyFrame
from polars._typing import ParallelStrategy, ScanSource, SchemaDict
from polars._typing import FileSource, ParallelStrategy, SchemaDict
from polars.io.cloud import CredentialProviderFunction


@deprecate_renamed_parameter("row_count_name", "row_index_name", version="0.20.4")
@deprecate_renamed_parameter("row_count_offset", "row_index_offset", version="0.20.4")
def read_parquet(
source: ScanSource,
source: FileSource,
*,
columns: list[int] | list[str] | None = None,
n_rows: int | None = None,
Expand Down Expand Up @@ -74,7 +74,7 @@ def read_parquet(
File-like objects are supported (by "file-like object" we refer to objects
that have a `read()` method, such as a file handler like the builtin `open`
function, or a `BytesIO` instance) For file-like objects, stream position
function, or a `BytesIO` instance). For file-like objects, the stream position
may not be updated accordingly after reading.
columns
Columns to select. Accepts a list of column indices (starting at zero) or a list
Expand Down Expand Up @@ -304,9 +304,8 @@ def read_parquet_schema(source: str | Path | IO[bytes] | bytes) -> dict[str, Dat
source
Path to a file or a file-like object (by "file-like object" we refer to objects
that have a `read()` method, such as a file handler like the builtin `open`
function, or a `BytesIO` instance).
For file-like objects,
stream position may not be updated accordingly after reading.
function, or a `BytesIO` instance). For file-like objects, the stream position
may not be updated accordingly after reading.
Returns
-------
Expand All @@ -322,7 +321,7 @@ def read_parquet_schema(source: str | Path | IO[bytes] | bytes) -> dict[str, Dat
@deprecate_renamed_parameter("row_count_name", "row_index_name", version="0.20.4")
@deprecate_renamed_parameter("row_count_offset", "row_index_offset", version="0.20.4")
def scan_parquet(
source: ScanSource,
source: FileSource,
*,
n_rows: int | None = None,
row_index_name: str | None = None,
Expand Down
Loading

0 comments on commit fedb71f

Please sign in to comment.