GH-44962: [Python] Clean-up name / field_name handling in pandas compat #44963

jorisvandenbossche · 2024-12-08T09:26:37Z

Rationale for this change

Small part of #44195 factored out into its own PR because this change is just a small refactor making #44195 easier to do, but in itself not changing any logic.

We currently both store name and field_name in the pandas metadata. field_name is guaranteed to be a string, and is always exactly the name used in the arrow schema. name can also be None if the original pandas DataFrame used None as the column label or if it was coming from an index level without name.

Right now we had several places where we used name but then checked for it being None. With this PR I made it more consistently use field_name in the cases it needs the string version, by more consistently passing through both names a field_names.

Are these changes tested?

Existing tests should cover this

Are there any user-facing changes?

No

jorisvandenbossche · 2024-12-08T09:29:55Z

@github-actions crossbow submit -g python

github-actions · 2024-12-08T09:32:34Z

Revision: 251cd97

Submitted crossbow builds: ursacomputing/crossbow @ actions-6fcc46cbf4

Task	Status
example-python-minimal-build-fedora-conda
example-python-minimal-build-ubuntu-venv
test-conda-python-3.10
test-conda-python-3.10-cython2
test-conda-python-3.10-hdfs-2.9.2
test-conda-python-3.10-hdfs-3.2.1
test-conda-python-3.10-pandas-latest-numpy-latest
test-conda-python-3.10-substrait
test-conda-python-3.11
test-conda-python-3.11-dask-latest
test-conda-python-3.11-dask-upstream_devel
test-conda-python-3.11-hypothesis
test-conda-python-3.11-pandas-latest-numpy-1.26
test-conda-python-3.11-pandas-latest-numpy-latest
test-conda-python-3.11-pandas-nightly-numpy-nightly
test-conda-python-3.11-pandas-upstream_devel-numpy-nightly
test-conda-python-3.11-spark-master
test-conda-python-3.12
test-conda-python-3.12-cpython-debug
test-conda-python-3.13
test-conda-python-3.9
test-conda-python-3.9-pandas-1.1.3-numpy-1.19.5
test-conda-python-emscripten
test-cuda-python-ubuntu-22.04-cuda-11.7.1
test-debian-12-python-3-amd64
test-debian-12-python-3-i386
test-fedora-39-python-3
test-ubuntu-22.04-python-3
test-ubuntu-22.04-python-313-freethreading
test-ubuntu-24.04-python-3

…ld-name

raulcd

Not an expert here but tests and CI are passing so it LGTM, just a minor question

raulcd · 2024-12-09T10:53:09Z

python/pyarrow/pandas_compat.py

+def construct_metadata(columns_to_convert, df, column_names, column_field_names,
+                       index_levels, index_descriptors, preserve_index, types):


This API is supposed to be internal use only, right?
Should we update the docstring, it seems it was already missing column_names.

This should indeed be internal yes (although it is accessible as pyarrow.pandas_compat.construct_metadata ... which doesn't necessarily look as private ..)

The docstring is not the most informative, but pushed an update to at least keep it up to date.

And good that you ask it, because based on a quick github search it seems that cudf is using this .. (https://github.com/search?q=repo%3Arapidsai%2Fcudf%20construct_metadata&type=code, eg https://github.com/rapidsai/cudf/blob/0c5bd6627159fe44a49e56020f0c0842696bc397/python/cudf/cudf/core/dataframe.py#L5772).
(and also legate (https://github.com/nv-legate/legate.pandas/blob/7a97b455999e49c328c1873e49fb65d2eade7f2a/legate/pandas/core/table.py#L1230), but that is not an active project)

While I don't think we should keep to much compatibility guarantees here, let me just make it backwards compatible by making it an optional keyword (and later I think we should consider deprecating this and making it private)

github-actions · 2024-12-11T08:05:38Z

⚠️ GitHub issue #44962 has been automatically assigned in GitHub to PR creator.

…ld-name

jorisvandenbossche · 2024-12-11T08:31:56Z

@github-actions crossbow submit -g python

github-actions · 2024-12-11T08:34:35Z

Revision: fc92d71

Submitted crossbow builds: ursacomputing/crossbow @ actions-5e2c1c2e60

Task	Status
example-python-minimal-build-fedora-conda
example-python-minimal-build-ubuntu-venv
test-conda-python-3.10
test-conda-python-3.10-cython2
test-conda-python-3.10-hdfs-2.9.2
test-conda-python-3.10-hdfs-3.2.1
test-conda-python-3.10-pandas-latest-numpy-latest
test-conda-python-3.10-substrait
test-conda-python-3.11
test-conda-python-3.11-dask-latest
test-conda-python-3.11-dask-upstream_devel
test-conda-python-3.11-hypothesis
test-conda-python-3.11-pandas-latest-numpy-1.26
test-conda-python-3.11-pandas-latest-numpy-latest
test-conda-python-3.11-pandas-nightly-numpy-nightly
test-conda-python-3.11-pandas-upstream_devel-numpy-nightly
test-conda-python-3.11-spark-master
test-conda-python-3.12
test-conda-python-3.12-cpython-debug
test-conda-python-3.13
test-conda-python-3.9
test-conda-python-3.9-pandas-1.1.3-numpy-1.19.5
test-conda-python-emscripten
test-cuda-python-ubuntu-22.04-cuda-11.7.1
test-debian-12-python-3-amd64
test-debian-12-python-3-i386
test-fedora-39-python-3
test-ubuntu-22.04-python-3
test-ubuntu-22.04-python-313-freethreading
test-ubuntu-24.04-python-3

raulcd

Thanks @jorisvandenbossche , LGTM
The CI failures are unrelated

[Python] Clean-up name / field_name handling in pandas compat

251cd97

github-actions bot added the awaiting committer review Awaiting committer review label Dec 8, 2024

jorisvandenbossche mentioned this pull request Dec 8, 2024

[Python] Clean-up name / field_name handling in pandas compat #44962

Closed

Merge remote-tracking branch 'upstream/main' into pandas-metadata-fie…

9db7b0b

…ld-name

jorisvandenbossche requested a review from raulcd December 9, 2024 09:11

github-actions bot added the Component: Python label Dec 9, 2024

raulcd approved these changes Dec 9, 2024

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Dec 9, 2024

update docstring

57d8686

github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Dec 11, 2024

make change back compat

796eaa3

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 11, 2024

jorisvandenbossche added 2 commits December 11, 2024 09:31

add comment

900a5cd

Merge remote-tracking branch 'upstream/main' into pandas-metadata-fie…

fc92d71

…ld-name

raulcd approved these changes Dec 11, 2024

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Dec 11, 2024

raulcd merged commit 6252e9c into apache:main Dec 11, 2024
12 of 14 checks passed

raulcd removed the awaiting merge Awaiting merge label Dec 11, 2024

jorisvandenbossche deleted the pandas-metadata-field-name branch December 11, 2024 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-44962: [Python] Clean-up name / field_name handling in pandas compat #44963

GH-44962: [Python] Clean-up name / field_name handling in pandas compat #44963

jorisvandenbossche commented Dec 8, 2024 •

edited by github-actions bot

Loading

jorisvandenbossche commented Dec 8, 2024

github-actions bot commented Dec 8, 2024

raulcd left a comment

raulcd Dec 9, 2024

jorisvandenbossche Dec 11, 2024

jorisvandenbossche Dec 11, 2024

github-actions bot commented Dec 11, 2024

jorisvandenbossche commented Dec 11, 2024

github-actions bot commented Dec 11, 2024

raulcd left a comment

		def construct_metadata(columns_to_convert, df, column_names, column_field_names,
		index_levels, index_descriptors, preserve_index, types):

GH-44962: [Python] Clean-up name / field_name handling in pandas compat #44963

GH-44962: [Python] Clean-up name / field_name handling in pandas compat #44963

Conversation

jorisvandenbossche commented Dec 8, 2024 • edited by github-actions bot Loading

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

jorisvandenbossche commented Dec 8, 2024

github-actions bot commented Dec 8, 2024

raulcd left a comment

Choose a reason for hiding this comment

raulcd Dec 9, 2024

Choose a reason for hiding this comment

jorisvandenbossche Dec 11, 2024

Choose a reason for hiding this comment

jorisvandenbossche Dec 11, 2024

Choose a reason for hiding this comment

github-actions bot commented Dec 11, 2024

jorisvandenbossche commented Dec 11, 2024

github-actions bot commented Dec 11, 2024

raulcd left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Dec 8, 2024 •

edited by github-actions bot

Loading