feat(data frame): Support `polars` #1474

machow · 2024-06-21T14:21:06Z

This PR addresses #1439, by generalizing pandas.DataFrame specific logic to include Polars. It adds a module for DataFrame specific logic (_tbl_data.py) and simple tests for each piece.

From pairing with @schloerke, it seems like these next steps might be useful:

Move DataFrame specific logic into _tbl_data.py
Test and verify it works as expected on both Polars and Pandas
Evaluate subbing in narwhals for _tbl_data.py logic (cc @MarcoGorelli)
- This should be easy with all the logic in one place / tests around behavior. Seems doable!

Notes:

It appears that shiny currently evaluates datetime columns to "unknown"
Polars coerces subclasses of str back up to str, so certain htmltools tags can't be identified in a Polars Series.
- (we discussed subclassing collections.UserString instead, but also probably worth bringing up w/ Polars)
Barret and I have pairing time set up Monday and Tues, so should be able to knock it out then!

schloerke · 2024-06-21T15:22:11Z

Notes:

It appears that shiny currently evaluates datetime columns to "unknown"

We currently do not have special support in the browser... so we let the json to-string method handle it.

But we should definitely add support at some point!

machow · 2024-06-22T14:09:34Z

It seems like the JSON string method converts it to an int

opened issue here: #1477

MarcoGorelli · 2024-06-23T18:18:25Z

Evaluate subbing in narwhals for _tbl_data.py logic (cc @MarcoGorelli)

Thanks for the ping! Happy to help here, please do let me know if there's anything missing which you'd need or which would make your work easier

As a potential consumer, I'd be particularly interested in your thoughts on the StableAPI proposal here: narwhals-dev/narwhals#259 (comment). The objective is to give consumers a way such that, so long as they use StableAPI with a given version, then we will never break any public function they use. The idea's inspired by Rust's Editions

machow · 2024-06-24T15:30:48Z

From pairing w/ Barret, we're able to serialize pandas datetime series to ISO 8601 strings. There's a note in the code about possible results from pandas infer_dtype() function. A key to this function is that afaik it's responsible for inferring types from lists, so some of the possible outputs are induced by list[some_type].

# investigating the different outputs from infer_dtype
import pandas as pd
from datetime import date, datetime

val_dt = pd.to_datetime(["2020-01-01"])
val_per = pd.Period('2012-1-1', freq='D')

col_per = pd.Series([val_per])    # period
col_date = [date.today()]   # date
col_datetime = [datetime.now()]    # datetime
col_datetime64 = pd.Series(pd.to_datetime(["2020-01-01"]))    # datetime64


pd.api.types.infer_dtype(col_date)

machow · 2024-06-27T15:46:33Z

From pairing w/ @schloerke, we discovered that pandas.DataFrame.to_json has an interesting serialization strategy for custom objects.

import pandas as pd
from dataclasses import dataclass

class C:
    x: int

    def __init__(self, x: int):
        self.x = x

    def __str__(self):
        return f"I am C({self.x})"


@dataclass
class D:
    y: int

df = pd.DataFrame({"x": [C(1), D(2)]})
df.to_json()
#>  {"x":{"0":{"x":1},"1":{"y":2}}}

Notice that it somehow serialized C(1) to {"x": 1}. This is because it seems to use C(1).__dict__. However, you can pass a default handler to override this behavior

df.to_json(default_handler=str)
#> {"x":{"0":"I am C(1)","1":"D(y=2)"}}

Notice that the outputs are now the result of called .__str__(). Critically, .to_json() passes things like built-in dictionaries and lists through, so default_handler does not affect them.

pd.DataFrame({"x": [{"A": 1}, [8, 9]]}).to_json()
#> {"x":{"0":{"A":1},"1":[8,9]}

machow · 2024-06-27T16:49:19Z

Alright, handing off to @schloerke. There are two outstanding pieces:

replacing pandas logic in _data_frame.py (and gridtable file). The functions in _tbl_data are stubbed out with basic tests, and should be rough-but-ready to replace them. We need renderer tests (playwright?) to validate they work as expected when swapped in.
swapping in narwhals: I think a nice approach to this is the strangler fig strategy. Essentially, we could register narwhals implementations in _tbl_data.py. Once all paths work with narhwals, we could always wrap dataframes to be narwhals objects (then, can eventually remove other dataframe logic from _tbl_data.py / delete it.).

More on swapping in narwhals

Currently, _tbl_data.py means to hold all DataFrame specific logic. This makes it easy to communicate and test what DataFrame actions we need. This way, we can start on refactoring in narwhals, while manually adding extra bits of DataFrame logic as needed.

For example, @schloerke mentioned needing a get_column_names() function.

@singledispatch
def get_column_names(data: DataFrameLike) -> list[str]:
    raise TypeError()

@get_column_names.register
def _(data: pd.DataFrame) -> list[str]:
    # note that technically column names don't have to be strings in Pandas
    # so you might add validation, etc.. here
    return list(data.columns)

@get_column_names.register
def _(data: pl.DataFrame) -> list[str]:
    return data.columns

@get_column_names.register
def _(data: nw.DataFrame) -> list[str]:
    return data.columns

Notice that once narwhals is fully wired up everywhere, we can just always wrap inputs to the functions with narwhals.DataFrame. This is a classic refactor approach (strangler fig pattern), because we just wrap the old system, and keep things rolling, until we can fully flip the new one on (e.g. everything narwhals).

The number 1 reason IMO for not going directly to narwhals is that the requirements on the shiny side need to be fleshed out. I think it'll help to flesh out support for pandas DataFrames, and add test cases against Polars and Pandas, to indicate when a refactor has succeeded (or is blocked).

* main: fix(tests): dynamically determine the path to the shiny app (posit-dev#1485) tests(deploys): use a stable version of html tools instead of main branch (posit-dev#1483) feat(data frame): Support basic cell styling (posit-dev#1475) fix: support static files on pyodide / py.cafe under a prefix (posit-dev#1486) feat: Dynamic theming (posit-dev#1358) Add return type for `_task()` (posit-dev#1484) tests(controls): Change API from controls to controller (posit-dev#1481) fix(docs): Update path to reflect correct one (posit-dev#1478) docs(testing): Add quarto page for testing (posit-dev#1461) fix(test): Remove unused testrail reporting from nightly builds (posit-dev#1476)

…ifiableDict`

…StyleInfo

* main: test(controllers): Refactor column sort and filter methods for Dataframe class (posit-dev#1496) Follow up to posit-dev#1453: allow user roles when normalizing a dictionary (posit-dev#1495) fix(layout_columns): Fix coercion of scalar row height to list for python <= 3.9 (posit-dev#1494) Add `shiny.ui.Chat` (posit-dev#1453) docs(Theme): Fix example and clarify usage (posit-dev#1491) chore(pyright): Pin pyright version to `1.1.369` to avoid CI failures (posit-dev#1493) tests(dataframe): Add additional tests for dataframe (posit-dev#1487) bug(data frame): Export `render.StyleInfo` (posit-dev#1488)

… of scope

… by col

machow · 2024-07-08T14:25:30Z

From chatting with @schloerke, I glanced over the code just now and it LGTM. I think the _pandas.py module could be moved into _tbl_data.py, so there's a very clear rule: all pandas and polars logic live in one place. But IMO having an extra file is reasonable.

I didn't know enough about a lot of the shiny bits to have an opinion on things outside _tbl_data.py 😅

shiny/render/_data_frame_utils/_tbl_data.py

* main: api(playwright): Code review of complete playwright API (posit-dev#1501) fix: Move `www/shared/py-shiny` to `www/py-shiny` (posit-dev#1499)

* main: feat(data frame): Support `polars` (#1474) api(playwright): Code review of complete playwright API (#1501) fix: Move `www/shared/py-shiny` to `www/py-shiny` (#1499) test(controllers): Refactor column sort and filter methods for Dataframe class (#1496) Follow up to #1453: allow user roles when normalizing a dictionary (#1495) fix(layout_columns): Fix coercion of scalar row height to list for python <= 3.9 (#1494) Add `shiny.ui.Chat` (#1453) docs(Theme): Fix example and clarify usage (#1491) chore(pyright): Pin pyright version to `1.1.369` to avoid CI failures (#1493)

machow added 3 commits June 6, 2024 15:53

Stub out support for polars in data grid

f9c8e84

Stub out more grid support

90f8d01

flesh out tbl_data module and test

63101b7

Merge branch 'main' into feat-data-grid-polars

7af8424

schloerke added the data frame Related to @render.data_frame label Jun 21, 2024

schloerke added this to the v0.11.0 milestone Jun 21, 2024

fix: wrong variable in error messages

6f2403b

fix: serialize pandas datetimes as ISO8601

f6827d5

chore: remove unused import

9f753a3

machow added 3 commits June 27, 2024 11:47

fix: to_json by default converts objects to strings

8160692

tests: add playwright polars data_frame app (untested)

4239590

fix: add more _tbl_data tests, fixes and more typing

31daefe

schloerke added 12 commits July 2, 2024 10:23

Fix typing for df. Add render._types file. Expose `shiny.types.Json…

c31737d

…ifiableDict`

Update types for testing file

22d32a8

More robust tests from posit-dev#1475; Test and fix class support in …

7de1130

…StyleInfo

First pass at integration. Still more work to do

4fe7cf2

Update _controls.py

9686cc0

Expose render.DataFrameLike

aa1b9c6

lints

1ae281e

Use more tbl methods

2a81f5e

More tbl integrations; Add note for future integrations currently out…

54f37dc

… of scope

Fix styles app

61b1a3f

schloerke added 5 commits July 7, 2024 22:19

Fix styles class app / test

af6894b

lints

0406c65

Cleanups

534bd82

bug: Fix polars serialized data into an array of arrays by row, not…

89e0aa8

… by col

polars Html support updated tests

ea4be98

More testing with polars data frame apps

1a6128f

jcheng5 reviewed Jul 8, 2024

View reviewed changes

shiny/render/_data_frame_utils/_tbl_data.py Outdated Show resolved Hide resolved

jcheng5 reviewed Jul 8, 2024

View reviewed changes

shiny/render/_data_frame_utils/_tbl_data.py Show resolved Hide resolved

jcheng5 approved these changes Jul 8, 2024

View reviewed changes

schloerke changed the title ~~feat: support polars in data grid~~ feat(data frame): Support polars Jul 8, 2024

schloerke added 3 commits July 8, 2024 17:04

Use more generic base type for is_data_frame_like()

b2047ba

Test PandasCompatible object for @render.data_frame

dd2f867

Merge branch 'main' into machow-feat-data-grid-polars

9cfc3ec

* main: api(playwright): Code review of complete playwright API (posit-dev#1501) fix: Move `www/shared/py-shiny` to `www/py-shiny` (posit-dev#1499)

schloerke marked this pull request as ready for review July 8, 2024 21:05

schloerke added 7 commits July 8, 2024 17:08

Update _tbl_data.py

8683bf9

lints

fcf7d76

Update _datagridtable.py

28f3a14

lints

57bc292

splode more single dispatch registrations

1b8ce68

Update _tbl_data.py

c9d36e3

Update _tbl_data.py

5369ad2

schloerke added this pull request to the merge queue Jul 9, 2024

Merged via the queue into posit-dev:main with commit 08cd1f0 Jul 9, 2024
31 checks passed

schloerke mentioned this pull request Jul 9, 2024

Support polars directly in DataGrid and render.data_frame #1439

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(data frame): Support `polars` #1474

feat(data frame): Support `polars` #1474

machow commented Jun 21, 2024 •

edited

Loading

schloerke commented Jun 21, 2024 •

edited

Loading

machow commented Jun 22, 2024 •

edited

Loading

MarcoGorelli commented Jun 23, 2024

machow commented Jun 24, 2024

machow commented Jun 27, 2024

machow commented Jun 27, 2024 •

edited

Loading

machow commented Jul 8, 2024

feat(data frame): Support polars #1474

feat(data frame): Support polars #1474

Conversation

machow commented Jun 21, 2024 • edited Loading

schloerke commented Jun 21, 2024 • edited Loading

machow commented Jun 22, 2024 • edited Loading

MarcoGorelli commented Jun 23, 2024

machow commented Jun 24, 2024

machow commented Jun 27, 2024

machow commented Jun 27, 2024 • edited Loading

More on swapping in narwhals

machow commented Jul 8, 2024

feat(data frame): Support `polars` #1474

feat(data frame): Support `polars` #1474

machow commented Jun 21, 2024 •

edited

Loading

schloerke commented Jun 21, 2024 •

edited

Loading

machow commented Jun 22, 2024 •

edited

Loading

machow commented Jun 27, 2024 •

edited

Loading