Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Add a show method to DataFrame and LazyFrame #19634

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

guilhem-dvr
Copy link

@guilhem-dvr guilhem-dvr commented Nov 4, 2024

This adds a show method for both DataFrame and LazyFrame objects, taking inspiration from pyspark's show method and taking into account the requirements from @stinodego in #16534.

I choose to only expose config options that influence the result width's size, to mimic pyspark's truncate option.

I've provided tests, but I'm not super satisfied with them: they could break when changing the default display options. I was thinking of mocking Config, print and display_html, would that be okay?

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Nov 4, 2024
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Nov 5, 2024

I've provided tests, but I'm not super satisfied with them: they could break when changing the default display options. I was thinking of mocking Config, print and display_html, would that be okay?

No need to mockConfig; it can act as a decorator1, so you could decorate your tests such that you explicitly set the Config to some known values (eg: the current defaults) and then you can modify the values away from these defaults inside the test 👍

There are a few other options it would be nice to expose as well, such as tbl_formatting, tbl_cell_alignment, tbl_cell_numeric_alignment, etc.

I'd also change the parameter name n to limit, and allow it to be None (but maintain the default of 5). This would allow the caller to print the entire table (eg: with no limit).

Footnotes

  1. Config as a decorator:
    https://docs.pola.rs/api/python/stable/reference/config.html#use-as-a-decorator

Copy link

codecov bot commented Nov 5, 2024

Codecov Report

Attention: Patch coverage is 82.14286% with 5 lines in your changes missing coverage. Please review.

Project coverage is 79.88%. Comparing base (e8e0295) to head (399ddb1).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
py-polars/polars/dataframe/frame.py 77.27% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #19634      +/-   ##
==========================================
- Coverage   79.88%   79.88%   -0.01%     
==========================================
  Files        1593     1593              
  Lines      227649   227676      +27     
  Branches     2600     2607       +7     
==========================================
+ Hits       181860   181880      +20     
- Misses      45192    45198       +6     
- Partials      597      598       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@guilhem-dvr
Copy link
Author

@alexander-beedie I've added all config options that impact the display format of a dataframe.

I was thinking of hiding the dataframe shape by default because I find it a bit irrelevant when showing a dataframe, usually you would know what is the shape of the frame you are working with. But with the limitless option and for the sake of consistency I think I will leave it visible.

@alexander-beedie
Copy link
Collaborator

@alexander-beedie I've added all config options that impact the display format of a dataframe.

Good stuff, will take a look.

I was thinking of hiding the dataframe shape by default because I find it a bit irrelevant when showing a dataframe, usually you would know what is the shape of the frame you are working with.

Not necessarily - if it's wide (so cols are truncated in the repr) or you've just filtered the data you won't know the shape; definitely want to keep it 👍

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Nov 17, 2024

@guilhem-dvr: the tests seem to show a few issues still to resolve. Looks like some state might be escaping (either from the function itself or, more likely, the tests?) 🤔

@guilhem-dvr
Copy link
Author

@guilhem-dvr: the tests seem to show a few issues still to resolve. Looks like some state might be escaping (either from the function itself or, more likely, the tests?) 🤔

Somehow I was expecting this to be unrelated 😓 I will have a look.

@guilhem-dvr guilhem-dvr force-pushed the python-add-show-methods branch from 977e981 to 02d0bf1 Compare November 19, 2024 08:50
@guilhem-dvr
Copy link
Author

I found the issue @alexander-beedie: decorating tests with Config was causing a config leak to other tests. I tried to replace the config decorators with calls to Config.set inside the tests, but this made it worst and tests became flaky. I ended up wrapping each test inside a context managed config, and this worked.

@guilhem-dvr guilhem-dvr force-pushed the python-add-show-methods branch from 02d0bf1 to aca552e Compare November 19, 2024 09:24
@guilhem-dvr
Copy link
Author

guilhem-dvr commented Dec 2, 2024

Hi @alexander-beedie, I think this PR is ready, would you care to have a look at it?

@alexander-beedie alexander-beedie changed the title feat(python): Add show methods to DataFrame and LazyFrame feat(python): Add a show method to DataFrame and LazyFrame Jan 31, 2025
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jan 31, 2025

Apologies, lost track of this over the end of year crunch at work and Xmas break - will take a fresh look over it at the weekend, as I think this will be a nice addition 👌

Refactor both `show` methods to use a limit rather than a number of rows
to show.

The `limit` parameter is an extension of the `n` number of rows
parameter, and can be set to None. In that case, the `show` method will
display all frame rows.
@guilhem-dvr guilhem-dvr force-pushed the python-add-show-methods branch from aca552e to 399ddb1 Compare February 9, 2025 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants