Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check non-numeric columns with dataframe_regression #47

Open
DrGFreeman opened this issue Jan 28, 2021 · 1 comment
Open

Check non-numeric columns with dataframe_regression #47

DrGFreeman opened this issue Jan 28, 2021 · 1 comment

Comments

@DrGFreeman
Copy link
Contributor

It would be very useful if the dataframe_regression fixture could check non-numeric columns in dataframes.

One simple work around is to use data_regression and convert the dataframe to a dictionary:

data_regression.check(df.to_dict("records"))

However, this does not allow application of tolerances to numerical values.

As a workaround, I am currently defining a fixture in my conftest.py that leverages data_regression for the non-numeric columns and dataframe_regression for the numeric columns (with tolerances):

# conftest.py

@pytest.fixture()
def check_df(dataframe_regression, data_regression):
    """Fixture to check dataframe against expected values leveraging pytest-regression
    dataframe_regression and data_regression fixtures. This fixture allows verification
    of non-numeric columns as well as application of tolerances on numeric columns."""

    def check(df, basename=None, fullpath=None, tolerances=None, default_tolerance=None):
        data_regression.check(
            df.select_dtypes(exclude="number").to_dict("records"),
            basename=basename,
            fullpath=fullpath,
        )
        dataframe_regression.check(
            df.select_dtypes(include="number"),
            basename=basename,
            fullpath=fullpath,
            tolerances=tolerances,
            default_tolerance=default_tolerance,
        )

    yield check

# test_something.py

def test_something(check_df):
    df = some_operation()

    check_df(df, default_tolerance=dict(atol=1e-8, rtol=1e-5)

While this works, it is less elegant and requires to be run twice to generate the yaml and csv files of expected results.

@nicoddemus
Copy link
Member

Hi @DrGFreeman,

Indeed this would be a nice feature. Right now there's no plans to implement this, but we would be glad to review a PR adding this feature. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants