SDRClasifier for Prediction? #646

Thanh-Binh · 2019-08-22T14:56:06Z

Hi David @ctrl-z-9000-times
I am looking at your Predictor now.
If I understood you well, this function will be used for prediction any data, based on SDR-pattern.
I really do not understand what does "recordNum" mean exactly for infer and learn?

Predictions infer(UInt recordNum, const SDR &pattern);
void learn(UInt recordNum, const SDR &pattern, const std::vector<UInt> &bucketIdxList);

if I have 1D data like a vector v = {1.1, 5, 3, 9 ...}.
Is it a data index, or frame index? e.g. SDR for v[0] = 1.1, recordNum = 0?

Secondly:
If I use it for predict a 1D data, I think bucket index is not enough for this purpose, because Inference provides only the bucket index, not the real value.
To bring the full performance of Predictor like decode(), by learning we have to input a map of bucket and real value like

std::map<UInt, Real32> data; // 1st for bucket, 2nd for value

Do you think that we should do it in Predictor-class or user have to map them individually?
Thanks

EDIT:
The problem seems to be in insuffucient precision of Real, and Real64 should be used for PDF in SDRClassifier/Predictor. No test to replicate, though. #646 (comment)

The text was updated successfully, but these errors were encountered:

ctrl-z-9000-times · 2019-08-23T13:35:39Z

Hi Thanh-Binh,

I really do not understand what does "recordNum" mean exactly for infer and learn?

The Predictor class is used for time-series data, and it needs to know the record num to know how to process a data sample. If you don't want time-based functions you should use the Classifier which is like the Predictor but does not have any concept of time. Internally, the Predictor contains several Classifiers, a time-delay queue, and logic to sort out which classifier to use with which data.

Inference provides only the bucket index, not the real value

Numenta's SDR Classifier output real values. I removed that feature because it's implementation was questionable. I kept the system of bucket which tracks the rough value, but to find an exact real value given an input SDR is impossible. Numenta's classifier did it by taking an exponential moving average of the recently given inputs. This adds detail to the output, but those details are not calculated from the SDR but rather from the input statistics. I think that those extra details are misleading. If a user wants more detail then they need to add more buckets.

HTH

Thanh-Binh · 2019-08-23T14:35:12Z

@ctrl-z-9000-times thanks for your answer.
In the meantime, I think we can use

your Predictor
exponential moving average of the recently given inputs
statistics on data.
for estimating the predicted values ...
Thanks.

Thanh-Binh · 2019-08-24T14:32:30Z

@ctr-z-9000-times
I am testing Predictor for predicting bucket Index of a sinus wave data like
Data-> rdse encoder-> predictor-> bucketIdx
I compare the bucket index of encoder with index provided by predictor, but they are very different even after long learning.
Please note that SDRClassifier of nupic.core provides very good results.
Do you have any idea? Thanks

breznak · 2019-08-26T10:15:48Z

@Thanh-Binh have you seen

bindings/py/tests/algorithms/sdr_classifier_test.py

esp the def testMultiStepPredictions(self) test? That coud help you with your tests.

Thanh-Binh · 2019-08-26T14:05:37Z

@breznak no, i Found this issue from my c++ frameworks. I analyze the likelihood and the weight update too. I have no idea for improving

Thanh-Binh · 2019-08-27T17:28:16Z

Hi all, finally I found a reason for false classification is a limited computing precision by using data type Real for PDF and weight matrix. If you change from Real to Real64, then you can solve this problem

breznak · 2019-08-27T18:02:31Z

Thanks @Thanh-Binh 👍
that should be easy to fix. Do you have an example of code that would be broken by this, so we can add a unit test?

breznak · 2019-09-04T11:20:56Z

@Thanh-Binh do you have a test-case, so we can replicate the bug, please?

Thanh-Binh · 2019-09-04T15:32:02Z

@breznak No, I do not have any test case here. It was also my short observation and report, ... and found quickly a solution for it...

breznak · 2019-09-16T16:35:51Z

Data-> rdse encoder-> predictor-> bucketIdx
I compare the bucket index of encoder with index provided by predictor, but they are very different even after long learning.

@Thanh-Binh could you please share this code, or at least describe the details (encoders precision etc) to replicate? I'd like to resolve this issue (using Real64 is your found fix), but I need a test to replicate and validate the problem.

Thanh-Binh · 2019-09-16T18:40:53Z

@breznak I will share it asap

Thanh-Binh · 2019-09-17T08:20:59Z

@breznak I am trying to port my test codes into htm.core, but see some problems because no bucket information is available in the new encoders of htm.core.
In my framework, I also use my own encoder, which can provide bucket ID of the current signal value, so that the validating SDRClassifier will be done easily.

breznak · 2019-09-17T09:15:01Z

but see some problems because no bucket information is available in the new encoders of htm.core.

Does SDRClassifier.hpp help a bit with how-to obtain the buckets?:

* Categories are labeled using unsigned integers.  Other data types must be
* enumerated or transformed into postitive integers.  There are as many output
* units as the maximum category label.
*
* Example Usage:
*
*    // Make a random SDR and associate it with the category B.
*    SDR inputData({ 1000 });
*        inputData.randomize( 0.02 );
*    enum Category { A, B, C, D };
*    Classifier clsr;
*    clsr.learn( inputData, { Category::B } );
*    argmax( clsr.infer( inputData ) )  ->  Category::B
*
*    // Estimate a scalar value.  The Classifier only accepts categories, so
*    // put real valued inputs into bins (AKA buckets) by subtracting the
*    // minimum value and dividing by a resolution.
*    double scalar = 567.8;
*    double minimum = 500;
*    double resolution = 10;
*    clsr.learn( inputData, { (scalar - minimum) / resolution } );
*    argmax( clsr.infer( inputData ) ) * resolution + minimum  ->  560
*

Thanh-Binh · 2019-09-18T17:03:58Z

@ctrl-z-9000-times
If I look at the likelihood distribution provided by Predictor::infer() for sinus wave, I found that it is uniform distribution (all catergories have the same likelihood) after a given time period.
I really do not understand it, but think more about an overestimation effect.
What do you think?
Thanks

breznak · 2019-09-18T17:34:49Z

@Thanh-Binh please help review #675 , I'm running into some Predictor issue in c++

breznak added the bug Something isn't working label Aug 27, 2019

breznak added the newbie label Sep 16, 2019

breznak changed the title ~~SDRClaasifier for Prediction?~~ SDRClasifier for Prediction? Sep 16, 2019

breznak mentioned this issue Sep 16, 2019

SDRClassifier: fix precision by using Real64 for PDF #667

Merged

1 task

This was referenced Sep 18, 2019

Anomaly Likelihood does not work correctly! #665

Open

Hotgym predictor, anomaly tests #675

Open

breznak closed this as completed in #667 Sep 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDRClasifier for Prediction? #646

SDRClasifier for Prediction? #646

Thanh-Binh commented Aug 22, 2019 •

edited by breznak

Loading

ctrl-z-9000-times commented Aug 23, 2019

Thanh-Binh commented Aug 23, 2019

Thanh-Binh commented Aug 24, 2019

breznak commented Aug 26, 2019

Thanh-Binh commented Aug 26, 2019

Thanh-Binh commented Aug 27, 2019

breznak commented Aug 27, 2019

breznak commented Sep 4, 2019

Thanh-Binh commented Sep 4, 2019

breznak commented Sep 16, 2019

Thanh-Binh commented Sep 16, 2019

Thanh-Binh commented Sep 17, 2019

breznak commented Sep 17, 2019

Thanh-Binh commented Sep 18, 2019

breznak commented Sep 18, 2019

SDRClasifier for Prediction? #646

SDRClasifier for Prediction? #646

Comments

Thanh-Binh commented Aug 22, 2019 • edited by breznak Loading

ctrl-z-9000-times commented Aug 23, 2019

Thanh-Binh commented Aug 23, 2019

Thanh-Binh commented Aug 24, 2019

breznak commented Aug 26, 2019

Thanh-Binh commented Aug 26, 2019

Thanh-Binh commented Aug 27, 2019

breznak commented Aug 27, 2019

breznak commented Sep 4, 2019

Thanh-Binh commented Sep 4, 2019

breznak commented Sep 16, 2019

Thanh-Binh commented Sep 16, 2019

Thanh-Binh commented Sep 17, 2019

breznak commented Sep 17, 2019

Thanh-Binh commented Sep 18, 2019

breznak commented Sep 18, 2019

Thanh-Binh commented Aug 22, 2019 •

edited by breznak

Loading