Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using swarm optimization instead of SAGE for stage 2. #48

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.8, 3.9, 3.10, 3.11]
python-version: [3.8, 3.9, 3.10, 3.11, 3.12]

steps:
- uses: actions/checkout@v2
Expand Down
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,10 @@ modifications:
to the 'n_iter_fwer' parameter. For a cluster to be rejected a similar round
of reasoning applies. Clusters that are not rejected remain tentative.

4) After the iterative refinement stage SAGE scores could be used to select
the best feature from each cluster.
4) After the iterative refinement a swarm intelligence algorithm, naked mole rat
algorithm, is used to select the most informative feature subset. The user can
also choose to use the MultiSURF algorithm as an alternative to swarm
intelligence.

While this method may not produce all features important for classification,
it does have some nice properties. First of all, by using an Extremely
Expand Down
48 changes: 19 additions & 29 deletions docs/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,29 +7,24 @@ of the `Triglav` class and its methods.

class triglav.Triglav(transformer = NoScale(), sampler = NoResample(), estimator = ExtraTreesClassifier(512, bootstrap = True),
stage_2_estimator = ExtraTreesClassifier(512, bootstrap = True), per_class_imp = False,
n_iter = 40, n_iter_fwer = 11, p_1 = 0.65, p_2 = 0.30, metric = "correlation", linkage = "complete",
n_iter = 40, n_iter_fwer = 11, p_1 = 0.65, p_2 = 0.30, metric = "euclidean", linkage = "ward",
thresh = 2.0, criterion = "distance", run_stage_2 = True, verbose = 0, n_jobs = 10)

### Parameters

transformer: default = NoScale()
The transformer to be used to scale features. One can use
the scikit-learn.preprocessing transformers. In addition,
CLR and Scaler (converts each row into frequencies) are
available by importing 'CLRTransformer' and 'Scaler' from the
'triglav' package.

The transformer to be used to scale features.

sampler: default = NoResample()
The resampling method used for imbalanced classes. Should be
compatable with 'imblearn' or use an 'imblearn' resampler.
The type of sampler (from Imbalanced-learn) to use.

estimator: default = ExtraTreesClassifier(512, bootstrap = True)
The estimator used to calculate Shapley scores.

stage_2_estimator: default = ExtraTreesClassifier(512)
The estimator used to calculate SAGE values. Only used if the
'run_stage_2' is set to True.
The estimator used to calculate MultiSURF CV scores.
Only used if the 'run_stage_2' is set to True or 'mms'.

per_class_imp: bool, default = False
Specifies if importance scores are calculated globally or per
class. Note, per class importance scores are calculated in a
Expand All @@ -47,14 +42,13 @@ of the `Triglav` class and its methods.

p_2: float, default = 0.30
Used to determine the shape of the Beta-Binomial distribution
modelling failures.
modelling misses.

metric: str, default = "correlation"
metric: str, default = "euclidean"
The dissimilarity measure used to calculate distances between
features. To use Extremely Randomized Trees proximities one
has to import 'ETCProx' from the 'triglav' package.
features.

linkage: str, default = "complete"
linkage: str, default = "ward"
The type of hierarchical clustering method to apply. The available
methods include: single, complete, ward, average, centroid.

Expand All @@ -63,15 +57,16 @@ of the `Triglav` class and its methods.

criterion: str, default = "distance"
The method used to form flat clusters. The available methods
include: inconsistent, distance, maxclust, monocrit,
maxclust_monocrit.
include: distance or maxclust.

alpha: float, default = 0.05
The level at which corrected p-values will be rejected.

run_stage_2: bool, default = True
This stage will determine the best feature from each of the
selected clusters by calculating SAGE values.
run_stage_2: str or bool, default = "mms"
This stage will determine the best features from the selected
Triglav features. If 'str' is "auto", swarm optimization is used.
If "mms" (default), a modified version of the MultiSURF algorithm
is used. If True, "mms" is used. If False, stage 2 is not run.

verbose: int, default = 0
Specifies if basic reporting is sent to the user.
Expand All @@ -94,10 +89,8 @@ of the `Triglav` class and its methods.
The mask of the best features from each cluster. Only returns an ndarray
if the 'run_stage_2' parameter is enabled.

self.sage_values_: SAGE Explanation Object
A SAGE explanation object created using the set of features in 'selected_'.
For a detailed explanation on how to use this object, please visit:
https://github.com/iancovert/sage
self.task_opt_: Task Object
MealPy task optimizer object.

linkage_matrix_: ndarray
The SciPy hierarchical clustering encoded as a linkage matrix.
Expand Down Expand Up @@ -206,8 +199,6 @@ of the `Triglav` class and its methods.

class triglav.Scaler()

class triglav.CLRTransformer()

class triglav.NoResample()

### Parameters
Expand Down Expand Up @@ -237,6 +228,5 @@ of the `Triglav` class and its methods.

NoScale will return X
Scaler will return the closure of X (all rows sum to one, X must be non-negative)
CLRTransformer will return the CLR Transform of X (X must be non-negative)
NoResample will return X

Loading
Loading