Refactor: Looking for implementation strategies to improve run time efficiency of all algorithms regardless of data type (i.e. discrete/continuous, missing data) #39

ryanurbs · 2018-04-05T17:52:13Z

One of the major challenges of making the Relief-based algorithms of ReBATE flexible enough to handle different dataset types, i.e. (1) continuous, discrete, or mixed feature types, (2) binary, multiclass, or continuous outcomes, (3) presence of missing data, is to do so in a way that preserves computational efficiency. Presently scikit-rebate is implemented in a fairly compact manner, however this may not ultimately be the most efficient implementation. This issue posting seeks enhancements to ReBATE and it's underlying algorithms (i.e. ReliefF, SURF, SURF*, MultiSURF, MultiSURF*, TuRF) to make the respective algorithms run faster, and utilize less memory.

bukson · 2020-02-17T09:53:49Z

Hello I want to help in this issue, but first of all I will write some tests, that will guarantee that the code after optimization has the same result as code before the optimization.

CaptainKanuk · 2021-12-16T18:59:38Z

Hi folks - I took an initial pass at this to see if I could proof of concept some changes. I also implemented a benchmarking tool so folks could see how any branch was performing.

See here for my draft PR - it's not ready quite yet as I need to rerun my performance benchmarks. It provides a pattern for one case (ReliefF, binary features, discrete data) that I believe could be generally implemented across all cases to provide clearer code and much more performant operations. I'm working on a full testing benchmark run but initial results for the current parallel ReliefF test on binary/discrete data show a runtime improvement of ~1.85 seconds down to ~.6 seconds for the small testing dataset.

#79

ryanurbs added help wanted enhancement labels Apr 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Looking for implementation strategies to improve run time efficiency of all algorithms regardless of data type (i.e. discrete/continuous, missing data) #39

Refactor: Looking for implementation strategies to improve run time efficiency of all algorithms regardless of data type (i.e. discrete/continuous, missing data) #39

ryanurbs commented Apr 5, 2018 •

edited

Loading

bukson commented Feb 17, 2020

CaptainKanuk commented Dec 16, 2021

Refactor: Looking for implementation strategies to improve run time efficiency of all algorithms regardless of data type (i.e. discrete/continuous, missing data) #39

Refactor: Looking for implementation strategies to improve run time efficiency of all algorithms regardless of data type (i.e. discrete/continuous, missing data) #39

Comments

ryanurbs commented Apr 5, 2018 • edited Loading

bukson commented Feb 17, 2020

CaptainKanuk commented Dec 16, 2021

ryanurbs commented Apr 5, 2018 •

edited

Loading