Rudimentary `fluid.polynomialregressor` object #246

lewardo · 2023-08-18T17:02:37Z

Simple N-to-N polynomial regressor, N parallel monovariate polynomial regressors.

DataClient-inherited member functions all work, including print, read, write etc.
same interface as fluid.MLPRegressor, with fit, predictpoint, predict etc. iface
regressors parameter sets input and output dataset size, namely the number of parallel regressors
degree parameter sets the degree of the polynomial regression, currently all regressors are set to the same degree at once

@weefuzzy Potential for discussion on and rethinking of the implemented regression algorithm e.g. Gaussian regression may be more appropriate instead of simple polynomial regression, this is just a proof of concept and initial posit for a deterministic alternative to the artificial neural network objects.

tested in max against lewardo/flucoma-max@02d3b22

current issues:

when reading from json, the parameters update in the algorithm but not in the client so you have to set the parameters (degree, number of regressors) correctly before reading for it to be recognised by the client

patch used to test new object


----------begin_max5_patcher----------
1806.3oc2assaaiCD84Tf9OPXzBjtqq.ueI+JKVTHayjnt1RBRxsIaw1u8kj
RJwwlVlpkVtMEAwUzzdl4vCOb3Plu812b0rEEOnqmAtA7Wfqt5alVtx0lskq
5a3pYaReX45zZWGmsrXyFcdyr4cuYi9gF2abM6CIfk2mlemFrJsIsV2.pzqS
axJxquOq7oOw5rb8xhs4tOFtu0xJcs46008O4uK4a2TrsYstw4Iv9lyV4rew
hO+QJ9IqTl1r79r769TkdYSaHREpDnfBUHk.pvPFmLGfj3DNR.wPEmxwXpRZ
ZjASfyADZBD726X9r7dqibM9eu8M1WMuLOB.nv.fU5zUfaqJ1.9bcQ9rQD5n
gBckhjvPRFFyfPkxDzHSTZhOIhZ9+BLDgIXE5oPGCmxPmNGbsxD82Y3Ll3rb
aCXSZYoIFpmEWBBbHTRJkGhRhVPBwULjv7CTY.IkLw1Fky3JAVJYnoFxtlXv
quVk0nAMEikrPTCRVvXOjEDJgRPRhPg3DAECEWHxB1PVjlf+1rFmPyXhawPw
s.ScipLLU.YRIWR8O7KQIRK.QgRlArvzy0neZSS01riDfn9laaq4wRcabLa1
yNRYZU5Fcit5S57zEq0u.Yre8N6TouyL6ptn54oa6hZrwKsvUIT6+3JrQlQx
D3mYK3ojsXblq4F1x0U5ObmNWWk17z5S0ig5PGBDvbbBFgMKgvfTESfTJG0A
Z0JnsPAVIEleCQheyYKqrrEsOlBlLZlBFR7nuRkmMlxFCQO8N8gLEmX5OC5s
KaAMdAVeqFSnm.GvwGGrofDIX.O75LLYnv.4rAC45uZ7zCQA8CkUf2cKB7Gc
+FAgwBUFjbPwrDAyFslWwbqvAkRjTAm2tJ74ZhwoPhHE8nAyQGIPsQOhaCTF
d50AJqx1Ygjexg5AS3fK3IXdHBhD7jOAnAr.r3Hv.9nvvBy9+lMu6U+PxfYS
H4POZBX0O.j7iSMVrso4n4SiFN1CbwzcAD5nADeIkhaWpf96OdLn.gDJt33w
wlxb65sYqR5Rt76.yNVWM5IP1IOGYhCb7ZILlmjqfxKgnpdU1xFPV9PvxnWK
QM5oNdkWkrIWdskqTVr9w7hMYoqeZKXeGrPmlWGQhCYzym7yaNIJ8K9hwngS
JmJBjsbFWLdR1bhbvDvb4cQX79BfN0QZuRQLkIDidF.Ax8ssT0ziG1xbYTMM
gZjPC13kC7MKfglbMyZmZvs.cdS0ie78Y2.d+slehEKYPcRBRztwDFMwHORP
JybDC9PHLaBphS.FjnCF1IIIkoK+GW0rtArrXc8M3QiE1ulrh7zpGOBpPCAU
HXWo8P3omT3vg6pJ1VF+XmEDiP5TI47oe0w0Eoq.OGEf2gmCh5Jl7fF7aWr.
cApRyYVQ.N3VQ3jNEgNt+qVM.AJ.X30oDf.FBC30qB.kDxP+kS.Xusbe7rj9
Q13DBFh7GE5xJDAgSdoY2K5yxiXvqBXfuKzUjKVQo6JO+eBhU4oGbqBjtpS6
+RibFI.SaE3FbUeL1MlKdp7ZSbQ1h7wQL7oQzmh2Y+zHNVzt8eyLdA3XG8DI
rBwad0tjfeQt.FrQP237IO+9802a6l6Fes+0LyYL6arGpTWrsZYePz4e.7y1
akttIK2cKf1sSlgQvNd0lrUkElPtt+34UIr1qsi8hcfITHmq3D6kRPhL6my2
6872VQ0Jc0IxfK73YOW8HAjM0SabGAKZu+Fm1hjAQPi3lQj+fClQgOrv.Je2
hpv6IlB8ciaDGZcO8LJ3ElbtvKLEGbrgOKwVDnBg5uSbjYuxE+rglxVZ2Cnt
ROm9nv6sdJ3dd4I4AwwsENEfFISP4oXg1hqGROiy7WUDXBboGl.yy3qh6oho
inmuJYWnHLDHEveE0YPwPmg8KoDpLjHSMblVLahVDkKeYAwkZk6IOYSglJW1
tO9P8YBVLnSCifSKBhBQ63PwvfvPLXuaECCFzvBoc0kHXOdP1KZgGkLhYJwX
EsfvSLLdFLn7D22sBQ.26pXbIKQvOcOiWrgNIYYXcNojDXrEJJDmXiFz3FKd
Dkf1MaDsGJXkk3r44Qra83Hkg3iYIsXXvflqinwyfAQR6wgXXvv1rCavY78I
FfkjcySv8Xb7RXPbMwvIyzkyUmWt6SwwICQ6Tbp7s59CiwcOD4vCd.K1M8Q2
SmozGCp3hhfpsH2E.j9xIZe7R5z6KT52qwHW8coDVKTae7LkzaXdcvkws8.2
epxsdbZ5TQ2OgvwNzcImIYDNTvYT2eRbDH7TuGVndgdi3BODgHAQrHhW301G
8303IyqgigZ0o7bLlEy+wbjVV9EcUcmgac3YaR+bg6CIm29bVd6yr1mqzeIq
+iz0TZ0x6yZzKa1V4hyYOv6tDVy1TXbg7sY8RIVHyZZ2AIkmtQWWl1BNtCb5
suwzg+G.nuC+O
-----------end_max5_patcher-----------

weefuzzy

Thanks @lewardo and congrats on pushing your way through a complete implementation of algo + client, especially with all the boilerplate needed for the dataset stuff. I've not looked in huge detail over the code, but just have general thoughts.

I've warmed a bit to the idea of this object since it was first suggested. I still don't think that higher-degree regressions are (often) all that useful (because of the overfitting), but certainly having a linear regressor is a generally handy thing.

We can think / dream about gaussian process regressors in the future. They're much more complex to implement (we'll need a sampling framework, for one thing) and open all kinds of interface questions (e.g. they produce generative models, which we don't have any precedent for yet).

The problem with refreshing the client's hyperparameters when json is read is meant to have been fixed for the MLPs, so perhaps check the clients for the MLP regressor to see what's done there.

Meanwhile, other thoughts:

if I ever get round to implementing an echo state network for flucoma, we'll need a linear fitter for that as well, except with Tikhonov regularization added ('ridge regression'). That might also be useful for this object, to mitigate overfitting. Thoughts?
Can you take me through why we need the regressors parameter in the client, rather than just inferring this from the dimensionality of the input dataset?
A UX challenge will be trying to know which degree works best for the data in question (and we have parallel challenges for the MLPs, in terms of making principled comparisons between alternative models). There's a few ways of approaching this, but I'm wondering if a separate object that implements something contemporary like Pareto Smoothed Importance Sampling (PSIS) or the Widely Applicable Information Critereon (WAIC) would be interesting / feasible. And how, regardless of algorithm, the interface for a model comparison object could work. If you have opinions / interest in looking into that...(those algos might well be too heavy / Bayes-ey for general use)

include/algorithms/public/PolynomialRegressor.hpp

lewardo · 2023-08-21T08:55:55Z

I've implemented some of the changes you mentioned, and am working on various others as of writing
Changes I've implemented:

the regressors parameter is no longer, it as been axed in favour of automatic setting, when saving or fitting the object
the client parameters now update to reflect the algorithm's state on load and read messages

I will look into a b-spline regressor to implement as well as this polynomial regressor, as it seems to be more appropriate and have less statistical issues when dealing with noisy data, but I'm also looking into the ridge regression as you suggested as it also seems to be an interesting path to follow

lewardo · 2023-08-21T09:01:29Z

After a discussion with @tremblap we conjured up a potential UI for 1-N rather than as well as N-N regressors - a simple repeating of the 1 input N times, which could be interesting for the user if they should want to map the same domain to many ranges, let me know what you think of this and I could add it in

weefuzzy · 2023-08-21T09:05:24Z

I've implemented some of the changes you mentioned, and am working on various others as of writing
Changes I've implemented:

Many thanks. Those changes would seem to bring the object up to readiness, save for all the documentation, testing, help files etc etc etc. If you wanted to be ultra virtuous (and postpone some of the foregoing), you could knock up some C++ tests for this algorithm and add them to the flucoma-core test suite (which, IIRC, would be the first algo to receive such hallowed treatment).

FWIW, adding regularisation for ridge regression should be a very simple change (addition of a single scalar parameter, and an extra term in the linear algebra), at least in the linear case. Anyway, it can be left till later.

weefuzzy · 2023-08-21T09:06:32Z

After a discussion with @tremblap we conjured up a potential UI for 1-N rather than as well as N-N regressors - a simple repeating of the 1 input N times, which could be interesting for the user if they should want to map the same domain to many ranges, let me know what you think of this and I could add it in

Sounds like Surprising Behaviour to me. Would rather a more explicit workflow with less magic (i.e. user duplicates cols of dataset n times if that's what they want)

tremblap · 2023-08-21T10:58:36Z

help and examples are definitely on @MattS6464 todo (with yours truly) indeed.

MattS6464 · 2023-08-21T12:19:49Z

I'd be more than happy to give it a look!

lewardo · 2023-08-21T13:37:05Z

@weefuzzy I've been reading into spline regression and wanted to know your thoughts on the interface

since polynomial regression is merely a special case of spline regression with zero knots, would integrating that algorithm into the same object be a wise UI choice, it could be as simple as specifying the number of knots or even the degrees of freedom and the object automatically placing those knots (or specifying them as percentiles along the input domain perhaps ?), and in the case of 0 knots falling back on vanilla polynomial regression ?
would natural splines be something I should look into, they are very similar except there are the additional edge case constraints to stop overfitting there

weefuzzy · 2023-08-21T15:28:26Z

My inkling is that knots could end up providing some complex interface decisions (like making it easy to have equidistant knots, vs say using k quantiles w.r.t the independent variable). As such, I'd opt for separate user facing objects with the option to do some elegant code reuse behind the scenes, if it seems to make sense. As such, we can defer thinking too hard about it until this simpler object is more done, and then refactor as desired once a putative b-spline regressor becomes a thing. I certainly wouldn't want to make this object more mysterious-seeming by adding knots, nor have to think through forms of magic to switch modes.

lewardo · 2023-08-21T15:38:46Z

I'll write another client for user-facing interface, for the elegant code reuse behind the scenes the same algorithm can be used with little to no change so I'll make a branch off of here and start the very few edits that are required to make it work, it definitely makes sense to separate the two in terms of user understanding as putting them as one client will just cause confusion

tremblap · 2023-08-28T09:16:22Z

Hello!

So I get an instacrash with the patch above if I do something naughty (the numberbox to 1 to declare 1 dim, yet having 3 dims generated) then fix then predict (= crash) - otherwise it seems to work fine. Also, we spoke about the 1 to many approach - @weefuzzy thought it was magic but did that grow on him? otherwise this is great thanks

AlexHarker · 2023-08-28T09:18:22Z

@lewardo - is there anything specific you need from me in terms of review? It looks like @tremblap is on testing, so it probably doesn't make sense to double too much, so please direct my attention as needed.

lewardo · 2023-08-28T09:23:31Z

So I get an instacrash with the patch above if I do something naughty (the numberbox to 1 to declare 1 dim, yet having 3 dims generated) then fix then predict (= crash)

@trembap Could you give me the patch to reproduce this, I'll look into the crash before finishing it off

I'm not sure if the idea of 1-N was grown upon, although in terms of implementation it is not something that would be very long to do

@AlexHarker since @tremblap is already on bugtesting, would it be possible for you to check for any terrible transgressions in the code, I tried to stay as faithful to the flucoma coding style and formatting as possible, but some aspects may have slipped by me

tremblap · 2023-08-28T09:26:23Z

format can be clang-format right?

codestyle @weefuzzy would have cried I reckon (and @AlexHarker and him might have different tastes :) )

bug patch: take your patch above and follow my instructions. I can help in person.

AlexHarker · 2023-08-28T10:09:53Z

It's unclear to me why Owen would have cried - I see no obvious issues with code formatting.

weefuzzy · 2023-08-28T10:49:36Z

Yeah, not unduly tearful (though clang-formatting is always appreciated). I didn't warm to the magic, no; much rather we just made it easier to duplicate a dataset column instead

lewardo · 2023-08-28T10:57:21Z

@weefuzzy while talking to @AlexHarker we came upon the subject of allocators, and hence the ScopedEigenMap vs raw Eigen object debate for controlled memory allocation - do you think this object requires any custom allocation considerations ? At the moment each object has a raw Eigen matrix for the coefficients, design matrix and filter matrix, would a scoped map be more appropriate ?

weefuzzy · 2023-08-28T11:09:03Z

The simple (or simple-sounding) guideline is: does it ever need to allocate in the audio thread? Eigen's built-in types are hardcoded to use malloc, so the ScopedEigenMap stuff is there to let us step around that.

However, where you have matrix multiplications, it's not always obvious where allocation is happening because Eigen's default behaviour is to create temporaries behind the scenes on the assumption that there is aliasing. If you know there isn't aliasing, e.g. A = BC then you can signal that with A.noalias() = B*C. For more complex expressions it's not always easy to tell though: Eigen has some utility macros you can use to throw asserts when it tries to allocate, which are useful for working out if you need to, say, make your own class-scope temporaries or whatnot.

(Eventually I'd be inclined to move towards using the ScopedMaps everywhere, so that we can use things like memory arenas for better in-algorithm locality where it makes sense, but that's still a pipedream)

Eigen docs on aliasing: https://eigen.tuxfamily.org/dox/group__TopicAliasing.html
Eigen docs on macros (see EIGEN_RUNTIME_NO_MALLOC): https://eigen.tuxfamily.org/dox/TopicPreprocessorDirectives.html

lewardo · 2023-08-28T11:26:10Z

I'll take some time to refactor this to only use ScopedEigenMaps just for paranoid safety then, if it's the goal in the long run anyway

Very interesting read about the eigen aliasing, not something I knew about before...

lewardo · 2023-08-28T12:37:28Z

@tremblap's bug has now been fixed, although the fix will be overwritten when I port the eigen stuffs over to the custom allocator - for now however it seems to be a clean object that works, it could be drip-fed to testers for ui testing and the memory things could be done in the bg while the front-end is road-tested

lewardo · 2023-08-28T17:49:41Z

The underlying storage for the various matricies has been ported to using the ScopedEigenMap object with the FluidDefaultAllocator, so all allocated memory is either through a FluidTensor or with the allocator

lewardo · 2023-09-01T09:50:20Z

Thank you @weefuzzy for the message, it really clarified my understanding of the ScopedEigenMap/FluidTensor memory allocation paradigm, I am aware that this code now needs refactoring for best memory practice, especially during the matrix multiplications - it allocates correctly but copies the data when it needn't.

I'll finish this at some point, in the meantime the eureka is really helping me with the LSTM object I'm working on (foreshadowing intended), so thank you for the clarification !!

lewardo added 16 commits August 9, 2023 14:57

base PolynomialRegressor object

cac7645

method call structure

430ac9c

design matrix construction

3883e5c

linear algebra algorithm implementation

c633e49

copy constructor and assignment operator optimisations

96f715c

getting obejct to show up in max (with lewardo/flucoma-max@02d3b22)

1719d8f

temporary commit

c3fa23a

client update test

3997c5b

json functions now configured

971dce8

init method for setting initial value on load

b6a3701

removed superfluous mIn/mOut members

b591674

predictpoint message now working and regressing

d880081

predict message now working with datasets

eb940de

multi-regressor interface

8792d20

ranme mDims for consistency

53aff44

actually initialise algorithm from read

cd15ed2

lewardo requested review from weefuzzy and AlexHarker August 18, 2023 17:08

weefuzzy requested changes Aug 18, 2023

View reviewed changes

include/algorithms/public/PolynomialRegressor.hpp Outdated Show resolved Hide resolved

lewardo added 4 commits August 21, 2023 09:02

fix parameter read/load updating

be581ff

automatic dimensioin setting from training mapping

6cf06aa

bugfix saving in wrong dimensionality

ee63aae

rename algo methods for consistency

7b92c53

lewardo added 3 commits August 21, 2023 11:15

added tikhonov regularisation, currently only ridge normalisation

983560a

slimmed json saving by removing redundant data

ab42687

saving of tikhonov factor

b615117

added write regression state catch

fa0dbfb

lewardo mentioned this pull request Aug 21, 2023

fluid.polynomialregressor reference documentation flucoma/flucoma-docs#187

Closed

lewardo added 2 commits August 22, 2023 09:34

remove parameter update bug on first fit

5b0bc48

fix get<>() definition location

cd16032

lewardo closed this Aug 28, 2023

lewardo deleted the polynomial-regressor branch August 28, 2023 08:55

lewardo restored the polynomial-regressor branch August 28, 2023 08:57

lewardo reopened this Aug 28, 2023

increase assignment clarity

c0023d2

run clang-format

dea9281

re-allocate memory on regression for first run

8702f6d

now using the fluid memory allocator

59d4932

lewardo mentioned this pull request Aug 30, 2023

Polynomial regressor flucoma/flucoma-max#403

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rudimentary `fluid.polynomialregressor` object #246

Rudimentary `fluid.polynomialregressor` object #246

lewardo commented Aug 18, 2023 •

edited

Loading

weefuzzy left a comment •

edited

Loading

lewardo commented Aug 21, 2023

lewardo commented Aug 21, 2023

weefuzzy commented Aug 21, 2023

weefuzzy commented Aug 21, 2023

tremblap commented Aug 21, 2023

MattS6464 commented Aug 21, 2023

lewardo commented Aug 21, 2023 •

edited

Loading

weefuzzy commented Aug 21, 2023

lewardo commented Aug 21, 2023

tremblap commented Aug 28, 2023

AlexHarker commented Aug 28, 2023

lewardo commented Aug 28, 2023

tremblap commented Aug 28, 2023

AlexHarker commented Aug 28, 2023

weefuzzy commented Aug 28, 2023 •

edited

Loading

lewardo commented Aug 28, 2023

weefuzzy commented Aug 28, 2023 •

edited

Loading

lewardo commented Aug 28, 2023

lewardo commented Aug 28, 2023

lewardo commented Aug 28, 2023

lewardo commented Sep 1, 2023

Rudimentary fluid.polynomialregressor object #246

Are you sure you want to change the base?

Rudimentary fluid.polynomialregressor object #246

Conversation

lewardo commented Aug 18, 2023 • edited Loading

weefuzzy left a comment • edited Loading

Choose a reason for hiding this comment

lewardo commented Aug 21, 2023

lewardo commented Aug 21, 2023

weefuzzy commented Aug 21, 2023

weefuzzy commented Aug 21, 2023

tremblap commented Aug 21, 2023

MattS6464 commented Aug 21, 2023

lewardo commented Aug 21, 2023 • edited Loading

weefuzzy commented Aug 21, 2023

lewardo commented Aug 21, 2023

tremblap commented Aug 28, 2023

AlexHarker commented Aug 28, 2023

lewardo commented Aug 28, 2023

tremblap commented Aug 28, 2023

AlexHarker commented Aug 28, 2023

weefuzzy commented Aug 28, 2023 • edited Loading

lewardo commented Aug 28, 2023

weefuzzy commented Aug 28, 2023 • edited Loading

lewardo commented Aug 28, 2023

lewardo commented Aug 28, 2023

lewardo commented Aug 28, 2023

lewardo commented Sep 1, 2023

Rudimentary `fluid.polynomialregressor` object #246

Rudimentary `fluid.polynomialregressor` object #246

lewardo commented Aug 18, 2023 •

edited

Loading

weefuzzy left a comment •

edited

Loading

lewardo commented Aug 21, 2023 •

edited

Loading

weefuzzy commented Aug 28, 2023 •

edited

Loading

weefuzzy commented Aug 28, 2023 •

edited

Loading