-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(data frame): Support polars
#1474
Conversation
We currently do not have special support in the browser... so we let the json to-string method handle it. But we should definitely add support at some point! |
It seems like the JSON string method converts it to an int opened issue here: #1477 |
Thanks for the ping! Happy to help here, please do let me know if there's anything missing which you'd need or which would make your work easier As a potential consumer, I'd be particularly interested in your thoughts on the |
From pairing w/ Barret, we're able to serialize pandas datetime series to ISO 8601 strings. There's a note in the code about possible results from pandas # investigating the different outputs from infer_dtype
import pandas as pd
from datetime import date, datetime
val_dt = pd.to_datetime(["2020-01-01"])
val_per = pd.Period('2012-1-1', freq='D')
col_per = pd.Series([val_per]) # period
col_date = [date.today()] # date
col_datetime = [datetime.now()] # datetime
col_datetime64 = pd.Series(pd.to_datetime(["2020-01-01"])) # datetime64
pd.api.types.infer_dtype(col_date) |
From pairing w/ @schloerke, we discovered that pandas.DataFrame.to_json has an interesting serialization strategy for custom objects. import pandas as pd
from dataclasses import dataclass
class C:
x: int
def __init__(self, x: int):
self.x = x
def __str__(self):
return f"I am C({self.x})"
@dataclass
class D:
y: int
df = pd.DataFrame({"x": [C(1), D(2)]})
df.to_json()
#> {"x":{"0":{"x":1},"1":{"y":2}}} Notice that it somehow serialized C(1) to {"x": 1}. This is because it seems to use df.to_json(default_handler=str)
#> {"x":{"0":"I am C(1)","1":"D(y=2)"}} Notice that the outputs are now the result of called pd.DataFrame({"x": [{"A": 1}, [8, 9]]}).to_json()
#> {"x":{"0":{"A":1},"1":[8,9]} |
Alright, handing off to @schloerke. There are two outstanding pieces:
More on swapping in narwhalsCurrently, For example, @schloerke mentioned needing a @singledispatch
def get_column_names(data: DataFrameLike) -> list[str]:
raise TypeError()
@get_column_names.register
def _(data: pd.DataFrame) -> list[str]:
# note that technically column names don't have to be strings in Pandas
# so you might add validation, etc.. here
return list(data.columns)
@get_column_names.register
def _(data: pl.DataFrame) -> list[str]:
return data.columns
@get_column_names.register
def _(data: nw.DataFrame) -> list[str]:
return data.columns Notice that once narwhals is fully wired up everywhere, we can just always wrap inputs to the functions with The number 1 reason IMO for not going directly to narwhals is that the requirements on the shiny side need to be fleshed out. I think it'll help to flesh out support for pandas DataFrames, and add test cases against Polars and Pandas, to indicate when a refactor has succeeded (or is blocked). |
* main: fix(tests): dynamically determine the path to the shiny app (posit-dev#1485) tests(deploys): use a stable version of html tools instead of main branch (posit-dev#1483) feat(data frame): Support basic cell styling (posit-dev#1475) fix: support static files on pyodide / py.cafe under a prefix (posit-dev#1486) feat: Dynamic theming (posit-dev#1358) Add return type for `_task()` (posit-dev#1484) tests(controls): Change API from controls to controller (posit-dev#1481) fix(docs): Update path to reflect correct one (posit-dev#1478) docs(testing): Add quarto page for testing (posit-dev#1461) fix(test): Remove unused testrail reporting from nightly builds (posit-dev#1476)
* main: test(controllers): Refactor column sort and filter methods for Dataframe class (posit-dev#1496) Follow up to posit-dev#1453: allow user roles when normalizing a dictionary (posit-dev#1495) fix(layout_columns): Fix coercion of scalar row height to list for python <= 3.9 (posit-dev#1494) Add `shiny.ui.Chat` (posit-dev#1453) docs(Theme): Fix example and clarify usage (posit-dev#1491) chore(pyright): Pin pyright version to `1.1.369` to avoid CI failures (posit-dev#1493) tests(dataframe): Add additional tests for dataframe (posit-dev#1487) bug(data frame): Export `render.StyleInfo` (posit-dev#1488)
From chatting with @schloerke, I glanced over the code just now and it LGTM. I think the I didn't know enough about a lot of the shiny bits to have an opinion on things outside |
polars
* main: api(playwright): Code review of complete playwright API (posit-dev#1501) fix: Move `www/shared/py-shiny` to `www/py-shiny` (posit-dev#1499)
* main: feat(data frame): Support `polars` (#1474) api(playwright): Code review of complete playwright API (#1501) fix: Move `www/shared/py-shiny` to `www/py-shiny` (#1499) test(controllers): Refactor column sort and filter methods for Dataframe class (#1496) Follow up to #1453: allow user roles when normalizing a dictionary (#1495) fix(layout_columns): Fix coercion of scalar row height to list for python <= 3.9 (#1494) Add `shiny.ui.Chat` (#1453) docs(Theme): Fix example and clarify usage (#1491) chore(pyright): Pin pyright version to `1.1.369` to avoid CI failures (#1493)
This PR addresses #1439, by generalizing pandas.DataFrame specific logic to include Polars. It adds a module for DataFrame specific logic (
_tbl_data.py
) and simple tests for each piece.From pairing with @schloerke, it seems like these next steps might be useful:
_tbl_data.py
_tbl_data.py
logic (cc @MarcoGorelli)Notes:
str
back up tostr
, so certain htmltools tags can't be identified in a Polars Series.