Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDRClasifier for Prediction? #646

Closed
Thanh-Binh opened this issue Aug 22, 2019 · 15 comments · Fixed by #667
Closed

SDRClasifier for Prediction? #646

Thanh-Binh opened this issue Aug 22, 2019 · 15 comments · Fixed by #667
Labels
bug Something isn't working newbie

Comments

@Thanh-Binh
Copy link

Thanh-Binh commented Aug 22, 2019

Hi David @ctrl-z-9000-times
I am looking at your Predictor now.
If I understood you well, this function will be used for prediction any data, based on SDR-pattern.
I really do not understand what does "recordNum" mean exactly for infer and learn?

Predictions infer(UInt recordNum, const SDR &pattern);
void learn(UInt recordNum, const SDR &pattern, const std::vector<UInt> &bucketIdxList);

if I have 1D data like a vector v = {1.1, 5, 3, 9 ...}.
Is it a data index, or frame index? e.g. SDR for v[0] = 1.1, recordNum = 0?

Secondly:
If I use it for predict a 1D data, I think bucket index is not enough for this purpose, because Inference provides only the bucket index, not the real value.
To bring the full performance of Predictor like decode(), by learning we have to input a map of bucket and real value like

std::map<UInt, Real32> data; // 1st for bucket, 2nd for value

Do you think that we should do it in Predictor-class or user have to map them individually?
Thanks

EDIT:
The problem seems to be in insuffucient precision of Real, and Real64 should be used for PDF in SDRClassifier/Predictor. No test to replicate, though. #646 (comment)

@ctrl-z-9000-times
Copy link
Collaborator

Hi Thanh-Binh,

I really do not understand what does "recordNum" mean exactly for infer and learn?

The Predictor class is used for time-series data, and it needs to know the record num to know how to process a data sample. If you don't want time-based functions you should use the Classifier which is like the Predictor but does not have any concept of time. Internally, the Predictor contains several Classifiers, a time-delay queue, and logic to sort out which classifier to use with which data.

Inference provides only the bucket index, not the real value

Numenta's SDR Classifier output real values. I removed that feature because it's implementation was questionable. I kept the system of bucket which tracks the rough value, but to find an exact real value given an input SDR is impossible. Numenta's classifier did it by taking an exponential moving average of the recently given inputs. This adds detail to the output, but those details are not calculated from the SDR but rather from the input statistics. I think that those extra details are misleading. If a user wants more detail then they need to add more buckets.

HTH

@Thanh-Binh
Copy link
Author

@ctrl-z-9000-times thanks for your answer.
In the meantime, I think we can use

  1. your Predictor
  2. exponential moving average of the recently given inputs
  3. statistics on data.
    for estimating the predicted values ...
    Thanks.

@Thanh-Binh
Copy link
Author

@ctr-z-9000-times
I am testing Predictor for predicting bucket Index of a sinus wave data like
Data-> rdse encoder-> predictor-> bucketIdx
I compare the bucket index of encoder with index provided by predictor, but they are very different even after long learning.
Please note that SDRClassifier of nupic.core provides very good results.
Do you have any idea? Thanks

@breznak
Copy link
Member

breznak commented Aug 26, 2019

@Thanh-Binh have you seen

bindings/py/tests/algorithms/sdr_classifier_test.py

esp the def testMultiStepPredictions(self) test? That coud help you with your tests.

@Thanh-Binh
Copy link
Author

@breznak no, i Found this issue from my c++ frameworks. I analyze the likelihood and the weight update too. I have no idea for improving

@Thanh-Binh
Copy link
Author

Hi all, finally I found a reason for false classification is a limited computing precision by using data type Real for PDF and weight matrix. If you change from Real to Real64, then you can solve this problem

@breznak breznak added the bug Something isn't working label Aug 27, 2019
@breznak
Copy link
Member

breznak commented Aug 27, 2019

Thanks @Thanh-Binh 👍
that should be easy to fix. Do you have an example of code that would be broken by this, so we can add a unit test?

@breznak
Copy link
Member

breznak commented Sep 4, 2019

@Thanh-Binh do you have a test-case, so we can replicate the bug, please?

@Thanh-Binh
Copy link
Author

@breznak No, I do not have any test case here. It was also my short observation and report, ... and found quickly a solution for it...

@breznak breznak added the newbie label Sep 16, 2019
@breznak
Copy link
Member

breznak commented Sep 16, 2019

Data-> rdse encoder-> predictor-> bucketIdx
I compare the bucket index of encoder with index provided by predictor, but they are very different even after long learning.

@Thanh-Binh could you please share this code, or at least describe the details (encoders precision etc) to replicate? I'd like to resolve this issue (using Real64 is your found fix), but I need a test to replicate and validate the problem.

@breznak breznak changed the title SDRClaasifier for Prediction? SDRClasifier for Prediction? Sep 16, 2019
@Thanh-Binh
Copy link
Author

@breznak I will share it asap

@Thanh-Binh
Copy link
Author

@breznak I am trying to port my test codes into htm.core, but see some problems because no bucket information is available in the new encoders of htm.core.
In my framework, I also use my own encoder, which can provide bucket ID of the current signal value, so that the validating SDRClassifier will be done easily.

@breznak
Copy link
Member

breznak commented Sep 17, 2019

but see some problems because no bucket information is available in the new encoders of htm.core.

Does SDRClassifier.hpp help a bit with how-to obtain the buckets?:

* Categories are labeled using unsigned integers.  Other data types must be
* enumerated or transformed into postitive integers.  There are as many output
* units as the maximum category label.
*
* Example Usage:
*
*    // Make a random SDR and associate it with the category B.
*    SDR inputData({ 1000 });
*        inputData.randomize( 0.02 );
*    enum Category { A, B, C, D };
*    Classifier clsr;
*    clsr.learn( inputData, { Category::B } );
*    argmax( clsr.infer( inputData ) )  ->  Category::B
*
*    // Estimate a scalar value.  The Classifier only accepts categories, so
*    // put real valued inputs into bins (AKA buckets) by subtracting the
*    // minimum value and dividing by a resolution.
*    double scalar = 567.8;
*    double minimum = 500;
*    double resolution = 10;
*    clsr.learn( inputData, { (scalar - minimum) / resolution } );
*    argmax( clsr.infer( inputData ) ) * resolution + minimum  ->  560
*

@Thanh-Binh
Copy link
Author

@ctrl-z-9000-times
If I look at the likelihood distribution provided by Predictor::infer() for sinus wave, I found that it is uniform distribution (all catergories have the same likelihood) after a given time period.
I really do not understand it, but think more about an overestimation effect.
What do you think?
Thanks

@breznak
Copy link
Member

breznak commented Sep 18, 2019

@Thanh-Binh please help review #675 , I'm running into some Predictor issue in c++

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working newbie
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants