We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.schema
.columns
See #2064 for context
For Dask / PySpark / DuckDB / polars.LazyFrame, we should probably cache schema and column names
Some guidelines:
First, do not do this for eager backends, because those may allow in-place operations which modify the data type. Example:
In [49]: df_pd = pd.DataFrame({'a':[1,1,2], 'b': [4,5,6]}) In [50]: df = nw.from_native(df) In [51]: df.schema Out[51]: Schema([('a', Int64), ('b', Int64)]) In [52]: df Out[52]: ┌──────────────────┐ |Narwhals DataFrame| |------------------| | a b | | 0 1 4 | | 1 1 5 | | 2 2 6 | └──────────────────┘ In [53]: df.schema Out[53]: Schema([('a', Int64), ('b', Int64)]) In [54]: df_native = df.to_native() In [55]: df_native.loc['a', 0] = 1.5 In [56]: df Out[56]: ┌──────────────────┐ |Narwhals DataFrame| |------------------| | a b 0 | | 0 1.0 4.0 NaN | | 1 1.0 5.0 NaN | | 2 2.0 6.0 NaN | | a NaN NaN 1.5 | └──────────────────┘ In [57]: df.schema Out[57]: Schema([('a', Float64), ('b', Float64), (0, Float64)])
Is this poor design on pandas' part? Arguably. But, it's just what we've got to deal with.
Second, careful about using lru_cache on properties: https://youtu.be/sVjtp6tGo0g
lru_cache
The text was updated successfully, but these errors were encountered:
It is a bit hidden on here (https://arrow.apache.org/docs/python/data.html#arrays)
Arrow data is immutable, so values can be selected but not assigned.
Second, careful about using lru_cache on properties: youtu.be/sVjtp6tGo0g
@MarcoGorelli functools.cached_property would be preferred for this case I assume?
functools.cached_property
Sorry, something went wrong.
No branches or pull requests
See #2064 for context
For Dask / PySpark / DuckDB / polars.LazyFrame, we should probably cache schema and column names
Some guidelines:
First, do not do this for eager backends, because those may allow in-place operations which modify the data type. Example:
Is this poor design on pandas' part? Arguably. But, it's just what we've got to deal with.
Second, careful about using
lru_cache
on properties: https://youtu.be/sVjtp6tGo0gThe text was updated successfully, but these errors were encountered: