Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalization and Cost #310

Closed
horsto opened this issue Oct 11, 2023 · 3 comments
Closed

Normalization and Cost #310

horsto opened this issue Oct 11, 2023 · 3 comments

Comments

@horsto
Copy link

horsto commented Oct 11, 2023

Hi, this is a question, not an issue.
I have a bunch of features that I track over time. I am feeding them into

algo = rpt.Pelt(model=model, min_size=1, jump=1)
algo.fit(signal)
result = algo.predict(pen=p) # RESULT OF CHANGE POINT DETECTION

signal here is (for example) a 500x16 (timepoints x features). The features themselves live on pretty different scales, such that I thought that some kind of scaling / normalization (for example via https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html#sklearn.preprocessing.scale) could make sense. Now I wonder though how different costs would be affected by that. In the example I am attaching below you can see the normalized signal for L1 and L2 norms -> change points are depicted with dashed lines. You can see that there are some obvious misses there (calibrating the penalty helps sometimes, but is a finicky process).
Should normalization be skipped altogether / is there a better alternative cost for these kind of signals?

Screenshot 2023-10-11 at 15 36 46
@tg12
Copy link

tg12 commented Oct 19, 2023

What are you using to draw these graphs as an unrelated question!

Should normalization be skipped altogether / is there a better alternative cost for these kind of signals?

I do agree in some instances there might be a need to remove any pre processing of the data, this can be done upstream if needed unless it's an inherent part of the pelt algorithm.

@horsto
Copy link
Author

horsto commented Oct 20, 2023

It's not inherent to the pelt algorithm I think? Unless there is some hidden pre processing going on (?).

I would like to know whether I should do my own normalization up front, and how it might affect certain cost functions in the pelt algorithm (L1, L2, ...).

The plotting is just matplotlib + seaborn!

@deepcharles
Copy link
Owner

Hi,

Sorry for the late reply.

To normalize or not is task-dependant and there is no definite answer. For multivariate signals, PELT will detect the largest shifts, i.e., those with a large norm ||m_before - m_after|| where m_before and m_after are the multivariate averages just before and after the change.
As an example, consider the following 2D signal.
raw

One dimension has large shifts and the other has small shifts. Without normalization, only changes in the large dimension are detected.

rpt.display(s, [], rpt.Pelt().fit(s).predict(pen=50))

bkps1

After normalization, all changes are detected.

rpt.display(s, [], rpt.Pelt().fit(s).predict(pen=50))

bkps2

Hope this helps

Repository owner locked and limited conversation to collaborators Oct 24, 2023
@deepcharles deepcharles converted this issue into discussion #312 Oct 24, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants