Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conformal prediction #1171

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
ddb09ad
conformal_prediction
mverontarabeux Jan 5, 2023
922263a
conformal_prediction
mverontarabeux Jan 7, 2023
a9d31d8
conformal_prediction
mverontarabeux Jan 8, 2023
8e3ec68
new file: river/conformal_predictions/__init__.py
mverontarabeux Jan 8, 2023
5755d16
modified: river/conf/__init__.py
mverontarabeux Jan 8, 2023
6e2e783
modified: river/conf/__init__.py
mverontarabeux Jan 8, 2023
ecb6993
modified: river/conf/__init__.py
mverontarabeux Jan 8, 2023
682d058
modified: river/conf/base.py
mverontarabeux Jan 8, 2023
e313b53
modified: river/time_series/holt_winters.py
mverontarabeux Jan 8, 2023
97cc962
modified: river/time_series/evaluate.py
mverontarabeux Jan 8, 2023
e23e146
modified: river/conf/__init__.py
mverontarabeux Jan 12, 2023
5afc5f9
modified: river/conf/__init__.py
mverontarabeux Jan 13, 2023
f3c4bdd
modified: river/conf/base.py
mverontarabeux Jan 13, 2023
f4e88e9
modified: river/conf/gaussian.py
mverontarabeux Jan 13, 2023
c888ad0
modified: river/time_series/evaluate.py
mverontarabeux Jan 13, 2023
13bde68
modified: river/time_series/evaluate.py
mverontarabeux Jan 13, 2023
4ff354e
modified: river/time_series/intervals.py
mverontarabeux Jan 13, 2023
1ecc56b
new file: river/Prices_2016_2019_extract.csv
mverontarabeux Jan 14, 2023
4c4445c
Add files via upload
rboggio Jan 15, 2023
a0f74c0
new file: river/conf/ACP.py
mverontarabeux Jan 15, 2023
4051df1
CP_update
mverontarabeux Jan 15, 2023
7dcd0cd
modified: river/conf/CP.py
mverontarabeux Jan 16, 2023
bfc83a7
modified: river/conf/ACP.py
mverontarabeux Jan 16, 2023
13be3bf
CP_update
mverontarabeux Jan 16, 2023
a179a7f
modified: river/conf/ACP.py
mverontarabeux Jan 16, 2023
484d5ff
deleted: river/Prices_2016_2019_extract.csv
mverontarabeux Jan 29, 2023
770b62c
renamed: river/presentation.ipynb -> river/Notebook_ConformalPred…
mverontarabeux Jan 29, 2023
8181ee7
modified: river/Notebook_ConformalPrediction.ipynb
mverontarabeux Jan 29, 2023
166a44b
renamed: river/README_ConformalPrediction.md -> river/README.md
mverontarabeux Jan 29, 2023
4c1abeb
modified: river/README.md
mverontarabeux Jan 29, 2023
1d8fab0
modified: river/README.md
mverontarabeux Jan 29, 2023
b1e88bd
Présentations as pdf
mverontarabeux Jan 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added river/Diapo_ConformalPrediction.pdf
Binary file not shown.
940 changes: 940 additions & 0 deletions river/Notebook_ConformalPrediction.ipynb

Large diffs are not rendered by default.

Binary file added river/Notebook_ConformalPrediction.pdf
Binary file not shown.
68 changes: 68 additions & 0 deletions river/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Conformal prediction implementation in River

- BOGGIO Richard
- MDIHI Samy
- VERON Marc

This Conformal Prediction Implementation relies on the paper by Margaux Zaffran et al. "Adaptative Conformal Predictions for Time series" (https://arxiv.org/abs/2202.07282). This paper has 2 parts: expert aggregation for regression or classification, and the definition of **confidence intervals on streaming data**. We focus here on the implementation in River of these confidence interval estimation techniques. We rely on the work of the research group, implemented in Python and available on github : https://github.com/mzaffran/AdaptiveConformalPredictionsTimeSeries

Conformal prediction is a general term for identifying confidence interval definition methods in machine learning that go beyond the simple gaussian approach.

The aim of our work is to allow users to benefit from these method when using regression and prediction models with River. So we increased the **conf** module first. This one is present on the River git repo, but is not deployed on the downloadable Python version. It contains the parent class **interval** that is used as base for the different methods: **Gaussian**, **ConformalPrediction**, **AdaptativeConformalPrediction**. Next, we augment the **time_series** module, in which we update the evaluation method to allow for intervals at different horizons. Indeed the logic of this module is to predict not only at horizon 1, but further. The calculation of intervals must therefore be integrated into this way, hence the base definition in conf.

To ensure the integration of all these methods, the **\_\_init\_\_** and **base** files have been updated. This allows to have an almost functional environment as described below. A notebook is also provided in our Git repo : https://github.com/mverontarabeux/river/tree/ConformalPrediction/river


## Installation

Use the package manager [pip](https://pip.pypa.io/en/stable/) to install river.

```bash
pip install river
```

## Usage

```python
# import the relevant River modules
from river import datasets, metrics
from time_series.holt_winters import HoltWinters

# import the interval methods from the custom conf module (defined as a folder)
import time_series
import conf.ACP
import conf.CP
import conf.gaussian

# Get some data
dataset = datasets.AirlinePassengers()

# Define a forecasting model
model = HoltWinters(
alpha=0.3,
beta=0.1,
gamma=0.6,
seasonality=12,
multiplicative=True
)

# Set the metric and interval methods
metric = metrics.MAE()
interval = conf.gaussian.Gaussian(window_size=calib_period, alpha=0.10)

# Evaluate the model
time_series.evaluate(dataset,
model,
metric,
interval,
horizon=12,
residual_calibration_period = calib_period
)

```
## Contribution

A pull requests has been sent with the latest updates.
The code is flake8 compliant.

Please also check the Notebook_ConformalPrediction.ipynb for more insight.
121 changes: 121 additions & 0 deletions river/conf/ACP.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
from river import stats
from collections import deque
from typing import Tuple
import math

from conf.base import Interval


class AdaptativeConformalPrediction(Interval):
"""Adapatative Conformal Prediction method

Parameters
----------
regressor
The regressor to be wrapped.
confidence_level
The confidence level of the prediction intervals.
window_size
The size of the window used to compute the quantiles of the residuals. If `None`, the
quantiles are computed over the whole history. It is advised to set this if you expect the
model's performance to change over time.

Examples
--------

>>> from river import conf
>>> from river import datasets
>>> from river import linear_model
>>> from river import metrics
>>> from river import preprocessing
>>> from river import stats

>>> dataset = datasets.TrumpApproval()

>>> model = conf.AdaptativeConformalPrediction(
... (
... preprocessing.StandardScaler() |
... linear_model.LinearRegression(intercept_lr=.1)
... ),
... confidence_level=0.9,
gamma=.05
... )

>>> validity = stats.Mean()
>>> efficiency = stats.Mean()

>>> for x, y in dataset:
... interval = model.predict_one(x, with_interval=True)
... validity = validity.update(y in interval)
... efficiency = efficiency.update(interval.width)
... model = model.learn_one(x, y)

The interval's validity is the proportion of times the true value is within the interval. We
specified a confidence level of 90%, so we expect the validity to be around 90%.

>>> validity
Mean: 0.903097

The interval's efficiency is the average width of the intervals.

>>> efficiency
Mean: 3.430883

Lowering the confidence lowering will mechanically improve the efficiency.

References
----------
[^1]: [Margaux Zaffran, Olivier Féron, Yannig Goude, Julie Josse, Aymeric Dieuleveut.
"Adaptive Conformal Predictions for Time Series](https://arxiv.org/abs/2202.07282)

"""

def __init__(
self, window_size: int, gamma:float=0.5, alpha: float = 0.05,
):
self.gamma = gamma
self.alpha = alpha
self.alpha_t = alpha
self.window_size = window_size
self.residuals = deque()

def update(self, y_true: float, y_pred: float) -> "Interval":
"""Update the Interval."""

if len(self.residuals)==self.window_size:
# Remove the oldest residuals
self.residuals.popleft()
# Add the new one
self.residuals.append(abs(y_true - y_pred))

if(self.alpha_t >= 1): # => 1-alpha_t <= 0 => predict empty set
err = 1 # err = 1 if the point is not included, 0 otherwise

elif(self.alpha_t <= 0): # => 1-alpha_t >= 1 => predict the whole real line
self.lower, self.upper = -math.inf, math.inf
err = 0

else: # => 1-alpha_t in ]0,1[ => compute the quantiles
# Update the updated quantile
rolling_quantile = stats.Quantile(1-self.alpha_t)
for x in self.residuals:
_ = rolling_quantile .update(x)

# Get the window
half_inter = rolling_quantile .get()

# create the bounds for the ACP interval, centered around y_pred
self.lower, self.upper = y_pred-half_inter, y_pred+half_inter

err = 1-float((self.lower <= y_true) & (y_true <= self.upper))

# compute next value of alpha_t using updating scheme
self.alpha_t = self.alpha_t + self.gamma*(self.alpha_t-err)

else:
# Fill the residuals until it reaches window size
self.residuals.append(abs(y_true - y_pred))

def get(self) -> Tuple[float, float]:
"""Return the current value of the Interval."""
return (self.lower, self.upper)
103 changes: 103 additions & 0 deletions river/conf/CP.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
from river import stats
from collections import deque
from typing import Tuple

from conf.base import Interval


class ConformalPrediction(Interval):
"""Adapatative Conformal Prediction method

Parameters
----------
regressor
The regressor to be wrapped.
confidence_level
The confidence level of the prediction intervals.
window_size
The size of the window used to compute the quantiles of the residuals. If `None`, the
quantiles are computed over the whole history. It is advised to set this if you expect the
model's performance to change over time.

Examples
--------

>>> from river import conf
>>> from river import datasets
>>> from river import linear_model
>>> from river import metrics
>>> from river import preprocessing
>>> from river import stats

>>> dataset = datasets.TrumpApproval()

>>> model = conf.ConformalPrediction(
... (
... preprocessing.StandardScaler() |
... linear_model.LinearRegression(intercept_lr=.1)
... ),
... confidence_level=0.9,
gamma=.05
... )

>>> validity = stats.Mean()
>>> efficiency = stats.Mean()

>>> for x, y in dataset:
... interval = model.predict_one(x, with_interval=True)
... validity = validity.update(y in interval)
... efficiency = efficiency.update(interval.width)
... model = model.learn_one(x, y)

The interval's validity is the proportion of times the true value is within the interval. We
specified a confidence level of 90%, so we expect the validity to be around 90%.

>>> validity
Mean: 0.903097

The interval's efficiency is the average width of the intervals.

>>> efficiency
Mean: 3.430883

Lowering the confidence lowering will mechanically improve the efficiency.

References
----------
[^1]: [Margaux Zaffran, Olivier Féron, Yannig Goude, Julie Josse, Aymeric Dieuleveut.
"Adaptive Conformal Predictions for Time Series](https://arxiv.org/abs/2202.07282)

"""

def __init__(
self, window_size: int, alpha: float = 0.05
):
self.alpha = alpha
self.window_size = window_size
self.residuals = deque()
self.rolling_quantile = stats.Quantile((1-self.alpha)*(1+1/self.window_size))

def update(self, y_true: float, y_pred: float) -> "Interval":
"""Update the Interval."""

if len(self.residuals)==self.window_size:
# Remove the oldest residuals
self.residuals.popleft()
# Add the new one
self.residuals.append(abs(y_true - y_pred))

# Update the quantile
_ = self.rolling_quantile.update(self.residuals[-1])

# Compute the interval
half_inter = self.rolling_quantile.get()

# And set the borne
self.lower, self.upper = y_pred-half_inter, y_pred+half_inter
else:
# Fill the residuals until it reaches window size
self.residuals.append(abs(y_true - y_pred))

def get(self) -> Tuple[float, float]:
"""Return the current value of the Interval."""
return (self.lower, self.upper)
10 changes: 8 additions & 2 deletions river/conf/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
"""Conformal predictions. This modules contains wrappers to enable conformal predictions on any
regressor or classifier."""
from .interval import Interval
from .jackknife import RegressionJackknife
from conf.base import Interval
from conf.jackknife import RegressionJackknife
from conf.gaussian import Gaussian
from conf.ACP import AdaptativeConformalPrediction
from conf.CP import ConformalPrediction

__all__ = [
"Interval",
"Gaussian",
"ConformalPrediction",
'AdaptativeConformalPrediction',
"RegressionJackknife",
]
71 changes: 71 additions & 0 deletions river/conf/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
import abc
from typing import Tuple
from river import base
from conf import ConformalPrediction, Gaussian, AdaptativeConformalPrediction

__all__=[
"Interval",
"ConformalPrediction",
"AdaptativeConformalPrediction",
"Gaussian"
]


# from river/metrics/base.py
#@dataclasses.dataclass
class Interval(base.Base, abc.ABC):
"""Mother class for all intervals

An object to represent a (prediction) interval.

Users are not expected to use this class as-is. Instead, they should use the `with_interval`
parameter of the `predict_one` method of any regressor or classifier wrapped with a conformal
prediction method.

Parameters
----------
lower
The lower bound of the interval.
upper
The upper bound of the interval.

"""

lower: float
upper: float

# Begin : From initial interval.py in conf
@property
def center(self):
"""The center of the interval."""
return (self.lower + self.upper) / 2

@property
def width(self):
"""The width of the interval."""
return self.upper - self.lower

def __contains__(self, x):
return self.lower <= x <= self.upper
# End : From initial interval.py in conf

@abc.abstractmethod
def update(self, y_true, y_pred) -> "Interval":
"""Update the Interval."""

@abc.abstractmethod
def get(self) -> Tuple[float,float]:
"""Return the current value of the Interval."""

def __repr__(self) -> str:
"""Return the class name along with the current value of the Interval."""
return f"{self.__class__.__name__}: {self.get():{self._fmt}}".rstrip("0")

def __str__(self) -> str:
return repr(self)

def __eq__(self, other) -> bool:
"""Compare two instance of interval"""
if isinstance(other, Interval):
return self.lower == other.lower and self.upper == other.upper
return False
Loading