Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Bayesian Linear Regression using Normal Conjugate Prior #500

Merged
merged 57 commits into from
Jan 12, 2025
Merged
Show file tree
Hide file tree
Changes from 50 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
9589ef8
Initialized bayesian conjugate class
Nov 21, 2024
2d58430
Added docstring example
Nov 21, 2024
08ae9d8
Added BayesianConjugateLinearRegressor
Nov 21, 2024
153f561
Changed parameterization
Nov 21, 2024
de68ef5
Removed dependency
Nov 21, 2024
fc3baac
added init
Nov 21, 2024
419f0c7
Changed test
Nov 22, 2024
82a8c75
Changed test
Nov 22, 2024
c3e6c56
Changed test
Nov 22, 2024
17d8e6a
Removed numpy array conversion in init
Nov 22, 2024
fc07dd9
Only allows initialization using Normal distribution from skpro
Nov 22, 2024
f35d064
Naming changes
Nov 22, 2024
87ca46a
Changed the shape of the test samples
Nov 22, 2024
ed5f81f
Changed the initiation process to numpy
Nov 22, 2024
ab59af2
Reworked logic
Nov 22, 2024
df7d0bb
Reworked recentering logic
Nov 22, 2024
cdb46b9
Make y_pred a series
Nov 22, 2024
4bae2f3
Only infer num_features during fit
Nov 22, 2024
d765033
Change col names
Nov 22, 2024
270be35
Change col names
Nov 22, 2024
fdd246f
Change col names
Nov 22, 2024
fb9f8db
Change col names
Nov 22, 2024
6439dc9
added second test param
Nov 22, 2024
9801221
changed update logic
Nov 22, 2024
8d2bb57
changed update logic
Nov 22, 2024
5e8c069
Remove centering
Nov 22, 2024
86b3aa0
Added example notebook
Nov 22, 2024
b20a255
Clarified that Normal doesn't result in multivariate normal
Nov 22, 2024
dc6cc54
removed kernelspec
Nov 22, 2024
f9a942f
reverted changes to 03_ example notebook
Dec 6, 2024
f32f887
formatting
Dec 6, 2024
4e998df
restructured example folder
Dec 6, 2024
d6342b9
Renamed notebook for consistency with the name of estimator
Dec 6, 2024
dddaa6c
Added line on BayesianMCMCLinearRegressor
Dec 6, 2024
9807a93
Renamed subfolder to abyesian
Dec 6, 2024
4ee08e6
Changed estimator name removed MCMC
Dec 6, 2024
0044efa
Changed the paths for Bayesian estimators
Jan 3, 2025
e5a6643
Renamed variables
Jan 3, 2025
6069012
Updated docstrings by specifying Gaussian conjugacy
Jan 3, 2025
6d12dca
Modified examples
Jan 3, 2025
a93c963
Modified examples
Jan 3, 2025
bdf5687
Ensured the provided mu prior is a column vector
Jan 3, 2025
8af237e
Ensured the provided mu prior is a column vector
Jan 3, 2025
eaa67ac
Ensured the provided mu prior is a column vector
Jan 3, 2025
54f7d97
Added example notebooks
Jan 3, 2025
6430f08
Added example notebooks
Jan 3, 2025
b03df94
Merge remote-tracking branch 'upstream/main' into pymc_dev_conjugate
Jan 3, 2025
bda0dc5
Added example notebooks
Jan 3, 2025
93c5ad9
Added example notebooks
Jan 3, 2025
a297acb
Added artefacts for example notebooks
Jan 4, 2025
b2f2f19
Removed notebooks and artefacts from this PR
Jan 9, 2025
4b3c84b
Added math info on docstring
Jan 9, 2025
490a100
Renamed submodule, making it private
Jan 9, 2025
d962798
Changed imports
Jan 9, 2025
f7afaa6
Changed imports
Jan 9, 2025
ff2bbb7
changed API name to BayesianRegressor
Jan 10, 2025
3466182
Merge remote-tracking branch 'upstream/main' into pymc_dev_conjugate
Jan 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/source/api_reference/regression.rst
Original file line number Diff line number Diff line change
Expand Up @@ -195,3 +195,15 @@ Base classes
:template: class.rst

BaseProbaRegressor

Bayesian
--------

.. currentmodule:: skpro.regression.bayesian

.. autosummary::
:toctree: auto_generated/
:template: class.rst

BayesianConjugateLinearRegressor
BayesianLinearRegressor
884 changes: 884 additions & 0 deletions examples/bayesian/A_Linear_Regression_Introduction.ipynb

Large diffs are not rendered by default.

993 changes: 993 additions & 0 deletions examples/bayesian/B_Conjugate_Prior.ipynb

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions examples/bayesian/test_data.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
,x1,x2,y_true,y_train
11,0.9699098521619943,0.3046137691733707,3.8536610118441006,3.6611198716359423
16,0.3042422429595377,0.4951769101112702,3.094015216252886,2.674406454641567
1,0.9507143064099162,0.19967378215835974,3.5004499592949117,3.5861340998898967
5,0.15599452033620265,0.6075448519014384,3.1346235963767204,2.774701492179366
3,0.5986584841970366,0.5924145688620425,3.9745606749802005,3.824008827185556
21 changes: 21 additions & 0 deletions examples/bayesian/train_data.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
,x1,x2,y_true,y_train
17,0.5247564316322378,0.034388521115218396,2.152678426610131,1.9980722386845238
7,0.8661761457749352,0.06505159298527952,2.927507070505709,3.4560681836151668
12,0.8324426408004217,0.09767211400638387,2.9579016236199953,2.619440623467016
18,0.43194501864211576,0.9093204020787821,4.591851243520578,4.75748295922236
15,0.18340450985343382,0.12203823484477883,1.732923724241204,2.1985637837993033
8,0.6011150117432088,0.9488855372533332,5.048886635246418,5.220695780030648
10,0.020584494295802447,0.8083973481164611,3.466361032940988,3.6284030176383855
20,0.6118528947223795,0.662522284353982,4.211272642506705,3.97168552358406
19,0.2912291401980419,0.2587799816000169,2.358798225196135,2.8465707887573144
6,0.05808361216819946,0.17052412368729153,1.6277395953982734,1.3974202099183797
9,0.7080725777960455,0.9656320330745594,5.313041254815769,4.431521177134401
21,0.13949386065204183,0.31171107608941095,2.2141209495723166,2.121291461240408
23,0.3663618432936917,0.5467102793432796,3.3728545246172223,2.774751212576887
22,0.29214464853521815,0.5200680211778108,3.1444933606038687,2.5913258736008546
13,0.21233911067827616,0.6842330265121569,3.477377300893023,3.783215445313457
14,0.18182496720710062,0.4401524937396013,2.6841074156330054,3.199607176880981
4,0.15601864044243652,0.046450412719997725,1.4513885190448663,0.7121275238611526
24,0.45606998421703593,0.18485445552552704,2.466703335010653,2.872966246207752
0,0.3745401188473625,0.7851759613930136,4.104608121873766,4.473841411871471
2,0.7319939418114051,0.5142344384136116,4.0066911988636456,3.9488670576695255
76 changes: 76 additions & 0 deletions examples/bayesian/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
"""Utility function for Bayesian example notebooks."""

import pandas as pd


def style_data(data, vmax=None, subset=None, cmap="coolwarm", hide_index=False):
"""
Apply styling to a Series or DataFrame.

Parameters
----------
data : pd.DataFrame or pd.Series
The data to style.
vmax : float, optional
The maximum numeric value for the color spectrum.
Defaults to the max value in the data.
subset : list, optional
List of columns to which the formatting is to be applied (for DataFrames).
cmap : str, optional
The color map to apply for the gradient. Defaults to 'coolwarm'.
hide_index : bool, optional
If True, hide the index in the output. Defaults to False.

Returns
-------
pd.io.formats.style.Styler
The styled Series or DataFrame.
"""
# Check if input is a Series or DataFrame
if isinstance(data, pd.Series):
# For Series, directly apply background gradient and format
if vmax is None:
vmax = data.max()
styled_data = (
data.to_frame()
.style.background_gradient(cmap=cmap, axis=0, vmin=-vmax, vmax=vmax)
.format("{:.3f}")
)

# Hide the index if requested
if hide_index:
styled_data = styled_data.hide(axis="index")

elif isinstance(data, pd.DataFrame):
# Determine the max value for the gradient if not provided
if vmax is None:
vmax = data.select_dtypes(include=["number"]).max().max()

# If no subset provided, apply to all float columns by default
if subset is None:
subset = pd.IndexSlice[:, data.select_dtypes(include=["float64"]).columns]

# Apply background gradient to numeric columns and format to 3 decimal points
styled_data = data.style.background_gradient(
cmap=cmap, axis=None, vmin=-vmax, vmax=vmax, subset=subset
).format("{:.3f}", subset=subset)

# Color boolean columns (pink for False, lightblue for True)
bool_columns = data.select_dtypes(include=["bool"]).columns

def color_boolean(val):
color = "lightblue" if val else "pink"
return f"background-color: {color}"

# Apply the boolean-specific styling if any boolean columns exist
if not bool_columns.empty:
styled_data = styled_data.applymap(color_boolean, subset=bool_columns)

# Hide the index if requested
if hide_index:
styled_data = styled_data.hide(axis="index")

else:
raise TypeError("Input must be a pandas DataFrame or Series.")

return styled_data
8 changes: 8 additions & 0 deletions skpro/regression/bayesian/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
"""Base classes for Bayesian probabilistic regression."""
# copyright: skpro developers, BSD-3-Clause License (see LICENSE file)

__all__ = ["BayesianConjugateLinearRegressor"]

from skpro.regression.bayesian.bayesian_conjugate import (
BayesianConjugateLinearRegressor,
)
Loading
Loading