Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: reuse Schema.to_<backend>() in from_numpy #2024

Merged
merged 2 commits into from
Feb 16, 2025

Conversation

dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Feb 16, 2025

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Mentioned in #2023 (comment).

I made an attempt at (#2024 (comment)) and (#2024 (comment)) - but ended up at almost the same LOC πŸ€¦β€β™‚οΈ

)
elif is_sequence_but_not_str(schema):
native_frame = native_namespace.DataFrame(data, columns=list(schema))
elif schema is None:
native_frame = native_namespace.DataFrame(
data, columns=["column_" + str(x) for x in range(data.shape[1])]
data, columns=[f"column_{x}" for x in range(data.shape[1])]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical line at

pa_arrays, names=[f"column_{x}" for x in range(data.shape[1])]

Comment on lines 496 to +501
if implementation is Implementation.POLARS:
if isinstance(schema, (Mapping, Schema)):
from narwhals._polars.utils import (
narwhals_to_native_dtype as polars_narwhals_to_native_dtype,
)

backend_version = parse_version(native_namespace.__version__)
schema = {
name: polars_narwhals_to_native_dtype( # type: ignore[misc]
dtype,
version=version,
backend_version=backend_version,
)
for name, dtype in schema.items()
}
elif schema is None:
native_frame = native_namespace.from_numpy(data)
elif not is_sequence_but_not_str(schema):
schema_pl: pl.Schema | Sequence[str] | None = Schema(schema).to_polars()
elif is_sequence_but_not_str(schema) or schema is None:
schema_pl = schema
else:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about normalizing to this before dispatching on implementation:

schema: nw.Schema | list[str] | None

That way the error message thats repeated 3x, can just be written and raised once:

msg = (
"`schema` is expected to be one of the following types: "
"Mapping[str, DType] | Schema | Sequence[str]. "
f"Got {type(schema)}."
)
raise TypeError(msg)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would only really work for polars and pyarrow though

@dangotbanned dangotbanned marked this pull request as ready for review February 16, 2025 15:52
Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dangotbanned, much simpler now πŸš€

SwiftmoTwitchGIF

@dangotbanned dangotbanned merged commit 43d072b into main Feb 16, 2025
29 of 30 checks passed
@dangotbanned dangotbanned deleted the refactor-from-numpy-schema branch February 16, 2025 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants