Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: Looking for implementation strategies to improve run time efficiency of all algorithms regardless of data type (i.e. discrete/continuous, missing data) #39

Open
ryanurbs opened this issue Apr 5, 2018 · 2 comments

Comments

@ryanurbs
Copy link
Member

ryanurbs commented Apr 5, 2018

One of the major challenges of making the Relief-based algorithms of ReBATE flexible enough to handle different dataset types, i.e. (1) continuous, discrete, or mixed feature types, (2) binary, multiclass, or continuous outcomes, (3) presence of missing data, is to do so in a way that preserves computational efficiency. Presently scikit-rebate is implemented in a fairly compact manner, however this may not ultimately be the most efficient implementation. This issue posting seeks enhancements to ReBATE and it's underlying algorithms (i.e. ReliefF, SURF, SURF*, MultiSURF, MultiSURF*, TuRF) to make the respective algorithms run faster, and utilize less memory.

@bukson
Copy link

bukson commented Feb 17, 2020

Hello I want to help in this issue, but first of all I will write some tests, that will guarantee that the code after optimization has the same result as code before the optimization.

@CaptainKanuk
Copy link

Hi folks - I took an initial pass at this to see if I could proof of concept some changes. I also implemented a benchmarking tool so folks could see how any branch was performing.

See here for my draft PR - it's not ready quite yet as I need to rerun my performance benchmarks. It provides a pattern for one case (ReliefF, binary features, discrete data) that I believe could be generally implemented across all cases to provide clearer code and much more performant operations. I'm working on a full testing benchmark run but initial results for the current parallel ReliefF test on binary/discrete data show a runtime improvement of ~1.85 seconds down to ~.6 seconds for the small testing dataset.

#79

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants