Skip to content

Commit

Permalink
Complete the documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
mirkobunse committed Jun 12, 2023
1 parent 5faa46b commit 51466ad
Show file tree
Hide file tree
Showing 9 changed files with 92 additions and 17 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

# qunfold | Quantification & Unfolding

This Python package implements composable methods for quantification and unfolding.
This Python package implements our unified framework of algorithms for quantification and unfolding. It is designed for enabling the composition of novel methods from existing and easily customized loss functions and data representations. Moreover, this package leverages a powerful optimization back-end to yield state-of-the-art performances for all compositions.


## Installation
Expand Down
37 changes: 35 additions & 2 deletions docs/source/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The `GenericMethod` defines the interface for many common quantification and unfolding algorithms. Most importantly, this interface consists of their `fit` and `predict` methods.

Instances of [](#popular-algorithms) for quantification and unfolding are created through specialized constructors. However, you can also define your own quantification algorithm as a `GenericMethod` that combines an arbitrary choice of [](#losses), [](#regularizers) and [](#feature-transformations).
Instances of [](#popular-algorithms) for quantification and unfolding are created through the corresponding constructors. However, you can also define your own quantification methods as a `GenericMethod` that combines an arbitrary choice of [](#losses), [](#regularizers) and [](#feature-transformations).

```{eval-rst}
.. autoclass:: qunfold.GenericMethod
Expand All @@ -27,6 +27,10 @@ We categorize existing, well-known quantification and unfolding algorithms into
### Distribution matching

```{eval-rst}
.. autoclass:: qunfold.EDx
.. autoclass:: qunfold.EDy
.. autoclass:: qunfold.HDx
.. autoclass:: qunfold.HDy
Expand All @@ -45,10 +49,12 @@ We categorize existing, well-known quantification and unfolding algorithms into
```{eval-rst}
.. autoclass:: qunfold.LeastSquaresLoss
.. autoclass:: qunfold.BlobelLoss
.. autoclass:: qunfold.EnergyLoss
.. autoclass:: qunfold.HellingerSurrogateLoss
.. autoclass:: qunfold.BlobelLoss
.. autoclass:: qunfold.CombinedLoss
```

Expand All @@ -71,5 +77,32 @@ You can use the `CombinedLoss` to create arbitrary, weighted sums of losses and
```{eval-rst}
.. autoclass:: qunfold.ClassTransformer
.. autoclass:: qunfold.DistanceTransformer
.. autoclass:: qunfold.HistogramTransformer
```


## Utilities

The following classes provide functionalities that go beyond the composition of quantification methods.

### QuaPy

The `qunfold.quapy` module allows you to wrap any quantification method for being used in [QuaPy](https://github.com/HLT-ISTI/QuaPy).

```{eval-rst}
.. autoclass:: qunfold.quapy.QuaPyWrapper
```

### Cross-validated training

The `qunfold.sklearn` module allows you to train classification-based quantification methods through cross-validation. Importing this module requires [scikit-learn](https://scikit-learn.org/stable/) to be installed.

```{eval-rst}
.. autoclass:: qunfold.sklearn.CVClassifier
```

```{hint}
If you use a bagging classifier (like random forests) with `oob_score=True`, you do not need to use cross-validation. Instead, the quantification method is then trained on the out-of-bag predictions of the bagging classifier.
```
2 changes: 1 addition & 1 deletion docs/source/developer-guide.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Developer guide

We provide best practices regarding the implementation [](#workflow) before going into detail about how to take out [](#custom-implementations).
In the following, we introduce best practices regarding the implementation [workflow](#workflow) before going into detail about how to take out [custom implementations](#custom-implementations).

## Workflow

Expand Down
23 changes: 20 additions & 3 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ developer-guide

# Quickstart

This Python package implements our unified framework of algorithms for quantification and unfolding.
The Python package [qunfold](https://github.com/mirkobunse/qunfold) implements our unified framework of algorithms for quantification and unfolding. It is designed for enabling the composition of novel methods from existing and easily customized loss functions and data representations. Moreover, this package leverages a powerful optimization back-end to yield state-of-the-art performances for all compositions.


## Installation
Expand All @@ -23,13 +23,17 @@ Moreover, you will need a [JAX](https://jax.readthedocs.io/) backend. Typically,
pip install "jax[cpu]"
```

**Updating:** To update an existing installation of `qunfold`, run
### Upgrading

To upgrade an existing installation of `qunfold`, run

```
pip install --force-reinstall --no-deps 'qunfold @ git+https://github.com/mirkobunse/qunfold@main'
```

**Troubleshooting:** Starting from `pip 23.1.2`, you have to install `setuptools` and `wheel` explicitly. If you receive a "NameError: name 'setuptools' is not defined", you need to execute the following command before installing `qunfold`.
### Troubleshooting

Starting from `pip 23.1.2`, you have to install `setuptools` and `wheel` explicitly. If you receive a "NameError: name 'setuptools' is not defined", you need to execute the following command before installing `qunfold`.

```
pip install --upgrade pip setuptools wheel
Expand All @@ -50,3 +54,16 @@ acc = ACC( # use OOB predictions for training the quantifier
acc.fit(X_trn, y_trn) # fit to training data
p_hat = acc.predict(X_tst) # estimate a prevalence vector
```

You can easily compose new quantification methods from existing loss functions and feature transformations. In the following example, we compose the ordinal variant of ACC and prepare it for being used in [QuaPy](https://github.com/HLT-ISTI/QuaPy).

```python
# the ACC loss, regularized with strength 0.01 for ordinal quantification
loss = TikhonovRegularized(LeastSquaresLoss(), 0.01)

# the original data representation of ACC with 10-fold cross-validation
transformer = ClassTransformer(CVClassifier(LogisticRegression(), 10))

# the ordinal variant of ACC, ready for being used in QuaPy
ordinal_acc = QuaPyWrapper(GenericMethod(loss, transformer))
```
6 changes: 3 additions & 3 deletions qunfold/losses.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ def _instantiate(self, q, M, N=None):
return lambda p: self.loss_function(p, q, M, N)

class LeastSquaresLoss(FunctionLoss):
"""The loss function of ACC, PACC, and ReadMe.
"""The loss function of ACC (Forman, 2008), PACC (Bella et al., 2019), and ReadMe (Hopkins & King, 2010).
This loss function computes the sum of squares of element-wise errors between `q` and `M*p`.
"""
Expand Down Expand Up @@ -132,7 +132,7 @@ def _hellinger_surrogate(p, q, M, indices):
return jnp.sum(jnp.array([ jnp.sum(v[i]) for i in indices ]))

class HellingerSurrogateLoss(AbstractLoss):
"""The loss function of HDx and HDy.
"""The loss function of HDx and HDy (González-Castro et al., 2013).
This loss function computes the average of the squared Hellinger distances between feature-wise (or class-wise) histograms. Note that the original HDx and HDy by González-Castro et al (2013) do not use the squared but the regular Hellinger distance. This approach is problematic because the regular distance is not always twice differentiable and, hence, complicates numerical optimizations.
Expand Down Expand Up @@ -204,7 +204,7 @@ def _instantiate(self, q, M, N):
# the inspection that the QuaPyWrapper takes out.

def TikhonovRegularized(loss, tau=0.):
"""Add TikhonovRegularization to any loss.
"""Add TikhonovRegularization (Blobel, 1985) to any loss.
Calling this function is equivalent to calling
Expand Down
20 changes: 14 additions & 6 deletions qunfold/methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,14 @@ class GenericMethod:
solver (optional): The `method` argument in `scipy.optimize.minimize`. Defaults to `"trust-ncg"`.
solver_options (optional): The `options` argument in `scipy.optimize.minimize`. Defaults to `{"gtol": 1e-8, "maxiter": 1000}`.
seed (optional): A random number generator seed from which a numpy RandomState is created. Defaults to `None`.
Examples:
Here, we create the ordinal variant of ACC (Bunse et al., 2023). This variant consists of the original feature transformation of ACC and of the original loss of ACC, the latter of which is regularized towards smooth solutions.
>>> GenericMethod(
>>> TikhonovRegularized(LeastSquaresLoss(), 0.01),
>>> ClassTransformer(RandomForestClassifier(oob_score=True))
>>> )
"""
def __init__(self, loss, transformer,
solver = "trust-ncg",
Expand Down Expand Up @@ -136,7 +144,7 @@ def solve(self, q, M, N=None): # TODO add argument p_trn
return Result(_np_softmax(opt.x), opt.nit, opt.message)

class ACC(GenericMethod):
"""Adjusted Classify & Count.
"""Adjusted Classify & Count by Forman (2008).
This subclass of `GenericMethod` is instantiated with a `LeastSquaresLoss` and a `ClassTransformer`.
Expand All @@ -157,7 +165,7 @@ def __init__(self, classifier, fit_classifier=True, **kwargs):
)

class PACC(GenericMethod):
"""Probabilistic Adjusted Classify & Count.
"""Probabilistic Adjusted Classify & Count by Bella et al. (2010).
This subclass of `GenericMethod` is instantiated with a `LeastSquaresLoss` and a `ClassTransformer`.
Expand Down Expand Up @@ -199,10 +207,10 @@ def __init__(self, transformer, *, tau=0., **kwargs):
class EDx(GenericMethod):
"""The energy distance-based EDx method by Kawakubo et al. (2016).
This subclass of `GenericMethod` is instantiated with a `EnergyLoss` and a `DistanceTransformer`.
This subclass of `GenericMethod` is instantiated with an `EnergyLoss` and a `DistanceTransformer`.
Args:
metric (optional): The metric with which the distance between data items is measured. Defaults to `"euclidean"`.
metric (optional): The metric with which the distance between data items is measured. Can take any value that is accepted by `scipy.spatial.distance.cdist`. Defaults to `"euclidean"`.
**kwargs: Keyword arguments accepted by `GenericMethod`.
"""
def __init__(self, metric="euclidean", **kwargs):
Expand All @@ -216,11 +224,11 @@ def __init__(self, metric="euclidean", **kwargs):
class EDy(GenericMethod):
"""The energy distance-based EDy method by Castaño et al. (2022).
This subclass of `GenericMethod` is instantiated with a `EnergyLoss` and a `DistanceTransformer`, the latter of which uses a `ClassTransformer` as a preprocessor.
This subclass of `GenericMethod` is instantiated with an `EnergyLoss` and a `DistanceTransformer`, the latter of which uses a `ClassTransformer` as a preprocessor.
Args:
classifier: A classifier that implements the API of scikit-learn.
metric (optional): The metric with which the distance between data items is measured. Defaults to `"euclidean"`.
metric (optional): The metric with which the distance between data items is measured. Can take any value that is accepted by `scipy.spatial.distance.cdist`. Defaults to `"euclidean"`.
fit_classifier (optional): Whether to fit the `classifier` when this quantifier is fitted. Defaults to `True`.
**kwargs: Keyword arguments accepted by `GenericMethod`.
"""
Expand Down
12 changes: 12 additions & 0 deletions qunfold/quapy.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,18 @@ class QuaPyWrapper(BaseQuantifier):
Args:
generic_method: A GenericMethod method to wrap.
Examples:
Here, we wrap an instance of ACC to perform a grid search with QuaPy.
>>> qunfold_method = QuaPyWrapper(ACC(RandomForestClassifier(obb_score=True)))
>>> quapy.model_selection.GridSearchQ(
>>> model = qunfold_method,
>>> param_grid = { # try both splitting criteria
>>> "transformer__classifier__estimator__criterion": ["gini", "entropy"],
>>> },
>>> # ...
>>> )
"""
def __init__(self, generic_method):
self.generic_method = generic_method
Expand Down
5 changes: 5 additions & 0 deletions qunfold/sklearn.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ class CVClassifier(BaseEstimator, ClassifierMixin):
Args:
estimator: A classifier that implements the API of scikit-learn.
n_estimators: The number of stratified cross-validation folds.
Examples:
Here, we create an instance of ACC that trains a logistic regression classifier with 10 cross-validation folds.
>>> ACC(CVClassifier(LogisticRegression(), 10))
"""
def __init__(self, estimator, n_estimators, random_state=None):
self.estimator = estimator
Expand Down
2 changes: 1 addition & 1 deletion qunfold/transformers.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ class DistanceTransformer(AbstractTransformer):
"""A distance-based feature transformation, as it is used in `EDx` and `EDy`.
Args:
metric (optional): The metric with which the distance between data items is measured. Defaults to `"euclidean"`.
metric (optional): The metric with which the distance between data items is measured. Can take any value that is accepted by `scipy.spatial.distance.cdist`. Defaults to `"euclidean"`.
preprocessor (optional): Another `AbstractTransformer` that is called before this transformer. Defaults to `None`.
"""
def __init__(self, metric="euclidean", preprocessor=None):
Expand Down

0 comments on commit 51466ad

Please sign in to comment.