online-ml · mverontarabeux · Jan 5, 2023 · Jan 7, 2023 · Jan 8, 2023 · Jan 8, 2023
@@ -0,0 +1,68 @@
+# Conformal prediction implementation in River
+
+- BOGGIO Richard
+- MDIHI Samy
+- VERON Marc
+
+This Conformal Prediction Implementation relies on the paper by Margaux Zaffran et al. "Adaptative Conformal Predictions for Time series" (https://arxiv.org/abs/2202.07282). This paper has 2 parts: expert aggregation for regression or classification, and the definition of **confidence intervals on streaming data**. We focus here on the implementation in River of these confidence interval estimation techniques. We rely on the work of the research group, implemented in Python and available on github : https://github.com/mzaffran/AdaptiveConformalPredictionsTimeSeries
+
+Conformal prediction is a general term for identifying confidence interval definition methods in machine learning that go beyond the simple gaussian approach. 
+
+The aim of our work is to allow users to benefit from these method when using regression and prediction models with River. So we increased the **conf** module first. This one is present on the River git repo, but is not deployed on the downloadable Python version. It contains the parent class **interval** that is used as base for the different methods: **Gaussian**, **ConformalPrediction**, **AdaptativeConformalPrediction**. Next, we augment the **time_series** module, in which we update the evaluation method to allow for intervals at different horizons. Indeed the logic of this module is to predict not only at horizon 1, but further. The calculation of intervals must therefore be integrated into this way, hence the base definition in conf. 
+
+To ensure the integration of all these methods, the **\_\_init\_\_** and **base** files have been updated. This allows to have an almost functional environment as described below. A notebook is also provided in our Git repo : https://github.com/mverontarabeux/river/tree/ConformalPrediction/river
+
+
+## Installation
+
+Use the package manager [pip](https://pip.pypa.io/en/stable/) to install river.
+
+```bash
+pip install river
+```
+
+## Usage
+
+```python
+# import the relevant River modules
+from river import datasets, metrics
+from time_series.holt_winters import HoltWinters
+
+# import the interval methods from the custom conf module (defined as a folder)
+import time_series
+import conf.ACP
+import conf.CP
+import conf.gaussian
+
+# Get some data
+dataset = datasets.AirlinePassengers()
+
+# Define a forecasting model 
+model = HoltWinters(
+        alpha=0.3,
+        beta=0.1,
+        gamma=0.6,
+        seasonality=12,
+        multiplicative=True
+        )
+
+# Set the metric and interval methods
+metric = metrics.MAE()
+interval = conf.gaussian.Gaussian(window_size=calib_period, alpha=0.10)
+
+# Evaluate the model
+time_series.evaluate(dataset,
+                     model,
+                     metric,
+                     interval,
+                     horizon=12,
+                     residual_calibration_period = calib_period
+                     )
+
+```
+## Contribution
+
+A pull requests has been sent with the latest updates. 
+The code is flake8 compliant.
+
+Please also check the Notebook_ConformalPrediction.ipynb for more insight.
@@ -0,0 +1,121 @@
+from river import stats
+from collections import deque
+from typing import Tuple
+import math
+
+from conf.base import Interval
+
+
+class AdaptativeConformalPrediction(Interval):
+    """Adapatative Conformal Prediction method
+
+    Parameters
+    ----------
+    regressor
+        The regressor to be wrapped.
+    confidence_level
+        The confidence level of the prediction intervals.
+    window_size
+        The size of the window used to compute the quantiles of the residuals. If `None`, the
+        quantiles are computed over the whole history. It is advised to set this if you expect the
+        model's performance to change over time.
+
+    Examples
+    --------
+
+    >>> from river import conf
+    >>> from river import datasets
+    >>> from river import linear_model
+    >>> from river import metrics
+    >>> from river import preprocessing
+    >>> from river import stats
+
+    >>> dataset = datasets.TrumpApproval()
+
+    >>> model = conf.AdaptativeConformalPrediction(
+    ...     (
+    ...         preprocessing.StandardScaler() |
+    ...         linear_model.LinearRegression(intercept_lr=.1)
+    ...     ),
+    ...     confidence_level=0.9,
+            gamma=.05
+    ... )
+
+    >>> validity = stats.Mean()
+    >>> efficiency = stats.Mean()
+
+    >>> for x, y in dataset:
+    ...     interval = model.predict_one(x, with_interval=True)
+    ...     validity = validity.update(y in interval)
+    ...     efficiency = efficiency.update(interval.width)
+    ...     model = model.learn_one(x, y)
+
+    The interval's validity is the proportion of times the true value is within the interval. We
+    specified a confidence level of 90%, so we expect the validity to be around 90%.
+
+    >>> validity
+    Mean: 0.903097
+
+    The interval's efficiency is the average width of the intervals.
+
+    >>> efficiency
+    Mean: 3.430883
+
+    Lowering the confidence lowering will mechanically improve the efficiency.
+
+    References
+    ----------
+    [^1]: [Margaux Zaffran, Olivier Féron, Yannig Goude, Julie Josse, Aymeric Dieuleveut.
+    "Adaptive Conformal Predictions for Time Series](https://arxiv.org/abs/2202.07282)
+
+    """
+
+    def __init__(
+        self, window_size: int, gamma:float=0.5, alpha: float = 0.05,
+    ):
+        self.gamma = gamma
+        self.alpha = alpha
+        self.alpha_t = alpha
+        self.window_size = window_size
+        self.residuals = deque()
+
+    def update(self, y_true: float, y_pred: float) -> "Interval":
+        """Update the Interval."""
+
+        if len(self.residuals)==self.window_size:
+            # Remove the oldest residuals
+            self.residuals.popleft()
+            # Add the new one
+            self.residuals.append(abs(y_true - y_pred))
+
+            if(self.alpha_t >= 1): # => 1-alpha_t <= 0 => predict empty set
+                err = 1 # err = 1 if the point is not included, 0 otherwise
+
+            elif(self.alpha_t <= 0): # => 1-alpha_t >= 1 => predict the whole real line
+                self.lower, self.upper = -math.inf, math.inf
+                err = 0
+
+            else: # => 1-alpha_t in ]0,1[ => compute the quantiles
+                # Update the updated quantile
+                rolling_quantile = stats.Quantile(1-self.alpha_t)
+                for x in self.residuals:
+                    _ = rolling_quantile .update(x)
+
+                # Get the window
+                half_inter = rolling_quantile .get()
+
+                # create the bounds for the ACP interval, centered around y_pred
+                self.lower, self.upper = y_pred-half_inter, y_pred+half_inter
+
+                err = 1-float((self.lower <= y_true) & (y_true <= self.upper))
+
+            # compute next value of alpha_t using updating scheme
+            self.alpha_t = self.alpha_t + self.gamma*(self.alpha_t-err)
+
+        else:
+            # Fill the residuals until it reaches window size
+            self.residuals.append(abs(y_true - y_pred))
+
+    def get(self) -> Tuple[float, float]:
+        """Return the current value of the Interval."""
+        return (self.lower, self.upper)
@@ -0,0 +1,103 @@
+from river import stats
+from collections import deque
+from typing import Tuple
+
+from conf.base import Interval
+
+
+class ConformalPrediction(Interval):
+    """Adapatative Conformal Prediction method
+
+    Parameters
+    ----------
+    regressor
+        The regressor to be wrapped.
+    confidence_level
+        The confidence level of the prediction intervals.
+    window_size
+        The size of the window used to compute the quantiles of the residuals. If `None`, the
+        quantiles are computed over the whole history. It is advised to set this if you expect the
+        model's performance to change over time.
+
+    Examples
+    --------
+
+    >>> from river import conf
+    >>> from river import datasets
+    >>> from river import linear_model
+    >>> from river import metrics
+    >>> from river import preprocessing
+    >>> from river import stats
+
+    >>> dataset = datasets.TrumpApproval()
+
+    >>> model = conf.ConformalPrediction(
+    ...     (
+    ...         preprocessing.StandardScaler() |
+    ...         linear_model.LinearRegression(intercept_lr=.1)
+    ...     ),
+    ...     confidence_level=0.9,
+            gamma=.05
+    ... )
+
+    >>> validity = stats.Mean()
+    >>> efficiency = stats.Mean()
+
+    >>> for x, y in dataset:
+    ...     interval = model.predict_one(x, with_interval=True)
+    ...     validity = validity.update(y in interval)
+    ...     efficiency = efficiency.update(interval.width)
+    ...     model = model.learn_one(x, y)
+
+    The interval's validity is the proportion of times the true value is within the interval. We
+    specified a confidence level of 90%, so we expect the validity to be around 90%.
+
+    >>> validity
+    Mean: 0.903097
+
+    The interval's efficiency is the average width of the intervals.
+
+    >>> efficiency
+    Mean: 3.430883
+
+    Lowering the confidence lowering will mechanically improve the efficiency.
+
+    References
+    ----------
+    [^1]: [Margaux Zaffran, Olivier Féron, Yannig Goude, Julie Josse, Aymeric Dieuleveut.
+    "Adaptive Conformal Predictions for Time Series](https://arxiv.org/abs/2202.07282)
+
+    """
+
+    def __init__(
+        self, window_size: int, alpha: float = 0.05
+    ):
+        self.alpha = alpha
+        self.window_size = window_size
+        self.residuals = deque()
+        self.rolling_quantile = stats.Quantile((1-self.alpha)*(1+1/self.window_size))
+
+    def update(self, y_true: float, y_pred: float) -> "Interval":
+        """Update the Interval."""
+
+        if len(self.residuals)==self.window_size:
+            # Remove the oldest residuals
+            self.residuals.popleft()
+            # Add the new one
+            self.residuals.append(abs(y_true - y_pred))
+
+            # Update the quantile
+            _ = self.rolling_quantile.update(self.residuals[-1])
+
+            # Compute the interval
+            half_inter = self.rolling_quantile.get()
+
+            # And set the borne
+            self.lower, self.upper = y_pred-half_inter, y_pred+half_inter
+        else:
+            # Fill the residuals until it reaches window size
+            self.residuals.append(abs(y_true - y_pred))
+
+    def get(self) -> Tuple[float, float]:
+        """Return the current value of the Interval."""
+        return (self.lower, self.upper)
@@ -1,9 +1,15 @@
 """Conformal predictions. This modules contains wrappers to enable conformal predictions on any
 regressor or classifier."""
-from .interval import Interval
-from .jackknife import RegressionJackknife
+from conf.base import Interval
+from conf.jackknife import RegressionJackknife
+from conf.gaussian import Gaussian
+from conf.ACP import AdaptativeConformalPrediction
+from conf.CP import ConformalPrediction
 
 __all__ = [
     "Interval",
+    "Gaussian",
+    "ConformalPrediction",
+    'AdaptativeConformalPrediction',
     "RegressionJackknife",
 ]
@@ -0,0 +1,71 @@
+import abc
+from typing import Tuple
+from river import base
+from conf import ConformalPrediction, Gaussian, AdaptativeConformalPrediction
+
+__all__=[
+    "Interval",
+    "ConformalPrediction",
+    "AdaptativeConformalPrediction",
+    "Gaussian"
+]
+
+
+# from river/metrics/base.py
+#@dataclasses.dataclass
+class Interval(base.Base, abc.ABC):
+    """Mother class for all intervals
+
+    An object to represent a (prediction) interval.
+
+    Users are not expected to use this class as-is. Instead, they should use the `with_interval`
+    parameter of the `predict_one` method of any regressor or classifier wrapped with a conformal
+    prediction method.
+
+    Parameters
+    ----------
+    lower
+        The lower bound of the interval.
+    upper
+        The upper bound of the interval.
+
+    """
+
+    lower: float
+    upper: float
+
+    # Begin : From initial interval.py in conf
+    @property
+    def center(self):
+        """The center of the interval."""
+        return (self.lower + self.upper) / 2
+
+    @property
+    def width(self):
+        """The width of the interval."""
+        return self.upper - self.lower
+
+    def __contains__(self, x):
+        return self.lower <= x <= self.upper
+    # End : From initial interval.py in conf
+
+    @abc.abstractmethod
+    def update(self, y_true, y_pred) -> "Interval":
+        """Update the Interval."""
+
+    @abc.abstractmethod
+    def get(self) ->  Tuple[float,float]:
+        """Return the current value of the Interval."""
+
+    def __repr__(self) -> str:
+        """Return the class name along with the current value of the Interval."""
+        return f"{self.__class__.__name__}: {self.get():{self._fmt}}".rstrip("0")
+
+    def __str__(self) -> str:
+        return repr(self)
+
+    def __eq__(self, other) -> bool:
+        """Compare two instance of interval"""
+        if isinstance(other, Interval):
+            return self.lower == other.lower and self.upper == other.upper
+        return False