-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breakpoints found in DataFrame but not in NumpyArray with same data #318
Comments
Hi @jdkworld , Thx for you interest in This is because of the If you remove the missing data, the outputs "looks" fine. series = dataframe.to_numpy(dtype='float', na_value=np.nan)
print(f"Raw data : shape is {series.shape}")
series = series[~np.isnan(series)]
print(f"After removing the nans : shape is {series.shape}")
algo = rpt.Binseg(model="normal", min_size=12*24*7, jump=12*24).fit(series)
result = algo.predict(pen=100)
rpt.display(series, result)
plt.show() I hope this helps ! Let us know ! Olivier |
Hi Olivier, Thanks a lot for your answer. Josien |
If you want to keep the timeseries' structure along the time axis, then yes you have to fill the missing values with something. And here, there are many many strategies (0.0, last known value, randomly draw from the series, mean or median on a particular time window, etc), but it all depends your use case and this is a decision you have to make according to the underlying goal of the task you are trying to solve ! Hope it helps ! Olivier |
I have this signal, when I input it into Binseg as Pandas DataFrame, I get the correct breakpoints but when I input it as Numpy Array, it does not find any breakpoints.
Am I missing something? Why is the behaviour different? Can it be due to the way in which NaNs are handled in both cases?
Also, when I have two the same columns in my dataframe, into breakpoints are found.
signal.csv
The text was updated successfully, but these errors were encountered: