Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Support loading data from multiple Excel/ODS workbooks #20404

Merged

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Dec 22, 2024

Closes #20354.

Allows read_excel and read_ods to take a list or glob pattern in the "source" parameter. This enables loading a given sheet from multiple workbooks (for example: directories containing workbooks that contain the same sheet data for different dates - can be useful to be able to easily load them all into a single frame).

Also: tidied up some "source" docstrings (rogue linebreaks), and renamed the "ScanSource" type to "FileSource" (as it isn't just used for scan funcs).

Example

Load the "data" sheet from all "trades" workbooks found in subdirs of the "2024" directory into a single DataFrame.

df = pl.read_excel("~/2024/**/trades*.xlsx", sheet_name="data")

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Dec 22, 2024
@alexander-beedie alexander-beedie added the A-io-spreadsheet Area: reading/writing Excel/ODS files label Dec 22, 2024
Copy link

codecov bot commented Dec 22, 2024

Codecov Report

Attention: Patch coverage is 85.29412% with 5 lines in your changes missing coverage. Please review.

Project coverage is 78.96%. Comparing base (676f10d) to head (fedb71f).
Report is 31 commits behind head on main.

Files with missing lines Patch % Lines
py-polars/polars/io/spreadsheet/functions.py 84.84% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #20404      +/-   ##
==========================================
- Coverage   79.13%   78.96%   -0.17%     
==========================================
  Files        1572     1562      -10     
  Lines      219839   220103     +264     
  Branches     2462     2486      +24     
==========================================
- Hits       173961   173811     -150     
- Misses      45310    45719     +409     
- Partials      568      573       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46 ritchie46 merged commit 62ebbe5 into pola-rs:main Dec 22, 2024
22 checks passed
@ritchie46
Copy link
Member

Nice!

@alexander-beedie alexander-beedie deleted the read-excel-multiple-workbooks branch December 22, 2024 17:48
sheet_id, sheet_name, worksheets
)
if read_multiple_workbooks and return_multiple_sheets:
msg = "cannot return multiple sheets from multiple workbooks"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, I imagine it's possible that you might have multiple files, each containing multiple sheets of the same names and structure, and you might want to read all sheets from all files into a big dictionary of sheets. What was the reason for making these options mutually exclusive?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-spreadsheet Area: reading/writing Excel/ODS files enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support glob paths in read_excel
3 participants