Skip to content

Commit

Permalink
Add comparison, update some small things
Browse files Browse the repository at this point in the history
  • Loading branch information
kwinkunks committed Oct 1, 2023
1 parent f454f02 commit a303f7e
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 34 deletions.
18 changes: 0 additions & 18 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,6 @@ @InProceedings{mckinney-2010
doi = {10.25080/Majora-92bf1922-00a}
}

@article{pandas-2.1.0,
title = {pandas-dev/pandas: Pandas},
doi = {10.5281/zenodo.8301632},
publisher = {Zenodo},
author = {The pandas development team},
year = {2023},
month = {Aug}
}

@article{pedregosa-etal-2011,
title = {Scikit-learn: Machine Learning in {P}ython},
author = {Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
Expand All @@ -43,12 +34,3 @@ @article{pedregosa-etal-2011
pages = {2825-2830},
year = {2011}
}

@article{sklearn-1.3.0,
title={scikit-learn/scikit-learn: Scikit-learn 1.3.0},
doi={10.5281/zenodo.8098905},
publisher={Zenodo},
author={Olivier Grisel, Andreas Mueller, Lars, Alexandre Gramfort, Gilles Louppe, Thomas J. Fan and Peter Prettenhofer and Mathieu Blondel and Vlad Niculae and Joel Nothman and et al.},
year={2023},
month={Jun}
}
36 changes: 20 additions & 16 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,37 +22,41 @@ bibliography: paper.bib

# Summary

Redflag is a Python library that applies "safety by design" to machine learning. It helps researchers and practitioners in this field ensure their models are safe and reliable by alerting them to potential pitfalls. These pitfalls could lead to overconfidence in the model or wildly spurious predictions. Redflag offers accessible ways for users to integrate safety checks into their workflows by providing `scikit-learn` transformers, `pandas` accessors, and standalone functions. These components can easily be incorporated into existing workflows, helping identify issues and enhance the quality and safety of predictive models. Redflag's aim is to empower users to design and implement higher-quality models that prioritize safety from the start.
_Redflag_ is a Python library that applies "safety by design" to machine learning. It helps researchers and practitioners in this field ensure their models are safe and reliable by alerting them to potential pitfalls. These pitfalls could lead to overconfidence in the model or wildly spurious predictions. _Redflag_ offers accessible ways for users to integrate safety checks into their workflows by providing `scikit-learn` transformers, `pandas` accessors, and standalone functions. These components can easily be incorporated into existing workflows, helping identify issues and enhance the quality and safety of predictive models. _Redflag_'s aim is to empower users to design and implement higher-quality models that prioritize safety from the start.

Redflag is distributed under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0). The source code is available [on GitHub](https://github.com/scienxlab/redflag) and includes tests and [documentation](https://scienxlab.org/redflag/). The package can be installed from the [Python package index](https://pypi.org/project/redflag/) with `pip install redflag` or using [Conda](https://anaconda.org/conda-forge/redflag) with `conda install -c conda-forge redflag`.

# Statement of need

_Safety by design_ means to 'design out' hazardous situations from complex machines or processes before they can do harm. The concept, also known as _prevention through design_, has been applied to civil engineering and industrial design for decades. Recently it has also been applied to software engineering and, more recently still, to machine learning [@van-gelder-etal-2021]. Redflag aims to help machine learning researchers and practitioners design safety into their workflows.
_Safety by design_ means to 'design out' hazardous situations from complex machines or processes before they can do harm. The concept, also known as _prevention through design_, has been applied to civil engineering and industrial design for decades. Recently it has also been applied to software engineering and, more recently still, to machine learning [@van-gelder-etal-2021]. _Redflag_ aims to help machine learning researchers and practitioners design safety into their workflows.

The practice of machine learning features a great many pitfalls that threaten the safe application of the resulting model. These pitfalls vary in the type and seriousness of their symptoms:

1. Minor issues resulting in overconfidence in the model (or, equivalently, underperformance of the model compared to expectations), such as having insufficient data, a few spurious data points, or failing to compute feature interactions.
2. Moderate issues arising from incorrect assumptions or incorrect application of the tools. These problems can be moderate, such as not dealing appropriately with class imbalance, not recoginizing spatial or temporal or other correlation in the data, or overfitting to the training or test data.
3. Major issues resulting in egregiously spurious predictions. Causes include feature leakage (using features unavailable in application), using distance-dependent algorithms on unscaled data, or forgetting to scale input features in application.
4. Project design and implementation issues that result in total failure, such as asking the wrong question, not writing tests or documentation, not training users of the model, or violating ethical standards.
1. **Minor issues** resulting in overconfidence in the model (or, equivalently, underperformance of the model compared to expectations), such as having insufficient data, a few spurious data points, or failing to compute feature interactions.
2. **Moderate issues** arising from incorrect assumptions or incorrect application of the tools. These problems can be moderate, such as not dealing appropriately with class imbalance, not recoginizing spatial or temporal or other correlation in the data, or overfitting to the training or test data.
3. **Major issues** resulting in egregiously spurious predictions. Causes include feature leakage (using features unavailable in application), using distance-dependent algorithms on unscaled data, or forgetting to scale input features in application.
4. **Critical issues**, especially project design and implementation issues, that result in total failure. For example, asking the wrong question, not writing tests or documentation, not training users of the model, or violating ethical standards.

While some of these pathologies are difficult to check with code (especially those in class 4, above), many of them could in principle be caught automatically by inserting checks into the workflow that trains, evaluates, and implements the predictive model. The goal of Redflag is to provide those checks.
While some of these pathologies are difficult to check with code (especially those in class 4, above), many of them could in principle be caught automatically by inserting checks into the workflow that trains, evaluates, and implements the predictive model. The goal of _Redflag_ is to provide those checks.

In the Python machine learning world, `pandas` [@mckinney-2010; @pandas-2.1.0] is the _de facto_ tabular data manipulation package, and `scikit-learn` [@pedregosa-etal-2011; @sklearn-1.3.0] is the preeminent prototyping and implementation framework. By integrating with these packages by providing accessors and transformers respectively, Redflag aims to be as simple to include in existing workflows as possible.
In the Python machine learning world, [`pandas`](https://pandas.pydata.org/) [@mckinney-2010] is the _de facto_ tabular data manipulation package, and [`scikit-learn`](https://scikit-learn.org/) [@pedregosa-etal-2011] is the preeminent prototyping and implementation framework. By integrating with these packages by providing accessors and transformers respectively, _Redflag_ aims to be as simple to include in existing workflows as possible.

Redflag offers three ways for users to insert safety checks into their machine learning workflows:
_Redflag_ offers three ways for users to insert safety checks into their machine learning workflows:

1. `scikit-learn` transformers which fit directly into the pipelines that most data scientists are already using, e.g. `redflag.ImbalanceDetector().fit_transform(X, y)`.
2. `pandas` accessors on Series and DataFrames, which can be called like a method on existing Pandas objects, e.g. `df['target'].redflag.is_imbalanced()`, where `df` is an instance of `pd.DataFrame`.
3. Standalone functions which the user can compose their own checks and tests with, e.g. `redflag.is_imbalanced(y)`.
1. **`scikit-learn` transformers** which fit directly into the pipelines that most data scientists are already using, e.g. `redflag.ImbalanceDetector().fit_transform(X, y)`.
2. **`pandas` accessors** on Series and DataFrames, which can be called like a method on existing Pandas objects, e.g. `df['target'].redflag.is_imbalanced()`, where `df` is an instance of `pd.DataFrame`.
3. **Standalone functions** which the user can compose their own checks and tests with, e.g. `redflag.is_imbalanced(y)`.

There are two kinds of `scikit-learn` transformer:

- **detectors** check every dataset they encounter. For example, `redflag.sklearn.ClippingDetector` checks for clipped data during both model fitting and during prediction.
- **comparators** learn some parameter in the model fitting step, then check subsequent data against those parameters. For example, `redflag.sklearn.DistributionComparator` learns the empirical univariate distributions of the training features, then checks that the features in subsequent datasets are tolerably close to these baselines (based on the Wasserstein metric $W_1$, also known as the earth mover's distance).
- **Detectors** check every dataset they encounter. For example, `redflag.sklearn.ClippingDetector` checks for clipped data during both model fitting and during prediction.
- **Comparators** learn some parameter in the model fitting step, then check subsequent data against those parameters. For example, `redflag.sklearn.DistributionComparator` learns the empirical univariate distributions of the training features, then checks that the features in subsequent datasets are tolerably close to these baselines (based on the Wasserstein metric $W_1$, also known as the earth mover's distance).

Although the `scikit-learn` components are implemented as transformers, subclassing `sklearn.base.BaseEstimator`, `sklearn.base.TransformerMixin`, they do not transform the data. They only raise warnings (or, optionally, exceptions) when a check fails. _Redflag_ does not attempt to fix any problems it encounters.

Although the `scikit-learn` components are implemented as transformers, subclassing `sklearn.base.BaseEstimator`, `sklearn.base.TransformerMixin`, they do not transform the data. They only raise warnings (or, optionally, exceptions) when a check fails. Redflag does not attempt to fix any problems it encounters.
There are some other packages with similar goals. For example, [`great_expectations`](https://greatexpectations.io/) provides a full-featured framework with a great deal of capability, especially oriented around cloud services, and a correspondingly large API. Meanwhile, [`pandas_dq`](https://github.com/AutoViML/pandas_dq), [`pandera`](https://github.com/unionai-oss/pandera), [`pandas-profiling`](https://github.com/ydataai/ydata-profiling) are all oriented around Pandas, Spark or DataFrame-like structures. Finally, [`evidently`](https://github.com/evidentlyai/evidently) provides on a Jupyter interface with lots of plots. _Redflag_ is lightweight, supports ordinary array-like data structures as well as Pandas objects, and focuses on concise calculations and emitting warnings rather than visualization, reporting, or data transformation.

By providing to machine learning practitioners a range of alerts and alarms, each of which can easily be inserted into existing workflows and pipelines, Redflag aims to allow anyone to create higher quality, more trustworthy prediction models that are safer by design.
By providing to machine learning practitioners a range of alerts and alarms, each of which can easily be inserted into existing workflows and pipelines, _Redflag_ aims to allow anyone to create higher quality, more trustworthy prediction models that are safer by design.

# Acknowledgements

Expand Down

0 comments on commit a303f7e

Please sign in to comment.