Skip to content

Commit

Permalink
fix: Avoid failure when index level shares name with a column
Browse files Browse the repository at this point in the history
Previously, report generation failed for DataFrames where an index level
had the same name as a column, resulting in a "ValueError: 'foo' is both
an index level and a column label, which is ambiguous."
This update removes index names for the relevant groupby operation,
ensuring the column is prioritized.
  • Loading branch information
ssiegel committed Nov 10, 2024
1 parent 920f8df commit f39f669
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 0 deletions.
1 change: 1 addition & 0 deletions src/ydata_profiling/model/pandas/duplicates_pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ def pandas_get_duplicates(
duplicated_rows = df.duplicated(subset=supported_columns, keep=False)
duplicated_rows = (
df[duplicated_rows]
.rename_axis(index=lambda _: None)
.groupby(supported_columns, dropna=False, observed=True)
.size()
.reset_index(name=duplicates_key)
Expand Down
20 changes: 20 additions & 0 deletions tests/unit/test_index_column_name_clash.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import pandas as pd
import pytest

from ydata_profiling import ProfileReport


@pytest.fixture()
def df():
df = pd.DataFrame(
{
"foo": [1, 2, 3],
},
index=pd.Index([1, 2, 3], name="foo"),
)
return df


def test_index_column_name_clash(df: pd.DataFrame):
profile_report = ProfileReport(df, title="Test Report", progress_bar=False)
assert len(profile_report.to_html()) > 0

0 comments on commit f39f669

Please sign in to comment.