-
-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of LODA (Lightweight On-line Detection of Anomalies) #1342
base: main
Are you sure you want to change the base?
Conversation
@MaxHalford Would you mind having a look at this PR, and also on why River build is failing for 3.11? Thank you so much in advance! |
Hey @hoanganhngo610! Good job on taking the initiative to implement this. The main feedback I have is that we should avoid using numpy for new implementations. As you know, we prefer working exclusively with dicts. I know you want to use numpy's histogram: we have Regarding CI, I'm aware we have an issue. It's on my todo list :) |
Hey @MaxHalford , Thank you so much for your response, and also thank you so much for having a look at my PR so promptly. Regarding the implementation using dictionaries and particularly For example, we can take a look at the example provided in
which will return
However, with numpy, under
the result would be
As such, to ensure that the results are consistent with other implementation in PyOD or PySAD, I decided to continue with the use of |
@hoanganhngo610 I believe the difference comes from the fact the histogram definitions in both cases. The histogram in numpy has equal bin widths, while the one in River is adaptive. The thing is, equal bin widths is not difficult to implement online. I've actually started #1344 to remove |
@MaxHalford Thank you for clarifying that! In that case, I would proceed to modify this PR to use dictionaries and |
Actually what I'm saying is that River's histogram is not based on equi-width bins. If you use it, it might not yield optimal performance. I suggest doing some speed/accuracy benchmarks with River vs. numpy. If it's more accurate with numpy's equi-width bins, then let's implement them with dictionaries, because it's trivial. |
LODA (Lightweight On-line Detection of Anomalies) is a popular static/incremental anomaly detection algorithm. LODA is implemented in various frameworks for anomaly detection, including PyOD and PySAD.
Unlike various other algorithms, this implementation of LODA relies primarily on
numpy
, since the functionnp.histogram
plays an important part in the learning phase and calculation of anomaly score. This implementation is also an adaptation of previous implementations of PyOD and PySAD toRiver
.