Skip to content

Commit

Permalink
- Removed unused variables and imports
Browse files Browse the repository at this point in the history
- Fixed location of imports
- Updated API documentation
- Added comments on where some code was adapted from
- UMAP metric is now 'mahalanobis'
  • Loading branch information
jrudar committed Oct 23, 2023
1 parent e851967 commit 9913b3f
Show file tree
Hide file tree
Showing 6 changed files with 311 additions and 72 deletions.
9 changes: 4 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,10 @@ modifications:
to the 'n_iter_fwer' parameter. For a cluster to be rejected a similar round
of reasoning applies. Clusters that are not rejected remain tentative.

4) After the iterative refinement a swarm intelligence algorithm, harris hawks
optimization, is used to select the most informative feature subset. This
procedure mimics the hunting strategies of harris hawks to find the minimum
value of a function. In this case, the optimization strategy is used to find
the subset of features which minimizes classification error.
4) After the iterative refinement a swarm intelligence algorithm, naked mole rat
algorithm, is used to select the most informative feature subset. The user can
also choose to use the MultiSURF algorithm as an alternative to swarm
intelligence.

While this method may not produce all features important for classification,
it does have some nice properties. First of all, by using an Extremely
Expand Down
58 changes: 19 additions & 39 deletions docs/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,30 +7,24 @@ of the `Triglav` class and its methods.

class triglav.Triglav(transformer = NoScale(), sampler = NoResample(), estimator = ExtraTreesClassifier(512, bootstrap = True),
stage_2_estimator = ExtraTreesClassifier(512, bootstrap = True), per_class_imp = False,
n_iter = 40, n_iter_fwer = 11, p_1 = 0.65, p_2 = 0.30, metric = "correlation", linkage = "complete",
thresh = 2.0, criterion = "distance", run_stage_2 = True, max_iter_sage_2 = 100, algo = HarrisHawksOptimization(),
alpha_2 = 0.99, verbose = 0, n_jobs = 10)
n_iter = 40, n_iter_fwer = 11, p_1 = 0.65, p_2 = 0.30, metric = "euclidean", linkage = "ward",
thresh = 2.0, criterion = "distance", run_stage_2 = True, verbose = 0, n_jobs = 10)

### Parameters

transformer: default = NoScale()
The transformer to be used to scale features. One can use
the scikit-learn.preprocessing transformers. In addition,
CLR and Scaler (converts each row into frequencies) are
available by importing 'CLRTransformer' and 'Scaler' from the
'triglav' package.

The transformer to be used to scale features.

sampler: default = NoResample()
The resampling method used for imbalanced classes. Should be
compatable with 'imblearn' or use an 'imblearn' resampler.
The type of sampler (from Imbalanced-learn) to use.

estimator: default = ExtraTreesClassifier(512, bootstrap = True)
The estimator used to calculate Shapley scores.

stage_2_estimator: default = ExtraTreesClassifier(512)
The estimator used to calculate SAGE values. Only used if the
'run_stage_2' is set to True.
The estimator used to calculate MultiSURF CV scores.
Only used if the 'run_stage_2' is set to True or 'mms'.

per_class_imp: bool, default = False
Specifies if importance scores are calculated globally or per
class. Note, per class importance scores are calculated in a
Expand All @@ -48,14 +42,13 @@ of the `Triglav` class and its methods.

p_2: float, default = 0.30
Used to determine the shape of the Beta-Binomial distribution
modelling failures.
modelling misses.

metric: str, default = "correlation"
metric: str, default = "euclidean"
The dissimilarity measure used to calculate distances between
features. To use Extremely Randomized Trees proximities one
has to import 'ETCProx' from the 'triglav' package.
features.

linkage: str, default = "complete"
linkage: str, default = "ward"
The type of hierarchical clustering method to apply. The available
methods include: single, complete, ward, average, centroid.

Expand All @@ -64,26 +57,16 @@ of the `Triglav` class and its methods.

criterion: str, default = "distance"
The method used to form flat clusters. The available methods
include: inconsistent, distance, maxclust, monocrit,
maxclust_monocrit.
include: distance or maxclust.

alpha: float, default = 0.05
The level at which corrected p-values will be rejected.

run_stage_2: bool, default = True
This stage will determine the best feature subset using
the harris hawks (HHO) algorithm.

max_iter_stage_2: int, default = 100
The maximum number of iterations the HHO algorithm will
run.

algo: Algorithm, default = HarrisHawksOptimization()
A NiaPy algorithm.

alpha_2: float, default = 0.99
The weight used to balance model generalization or
number of features selected by the HHO algorithm.
run_stage_2: str or bool, default = "mms"
This stage will determine the best features from the selected
Triglav features. If 'str' is "auto", swarm optimization is used.
If "mms" (default), a modified version of the MultiSURF algorithm
is used. If True, "mms" is used. If False, stage 2 is not run.

verbose: int, default = 0
Specifies if basic reporting is sent to the user.
Expand All @@ -107,7 +90,7 @@ of the `Triglav` class and its methods.
if the 'run_stage_2' parameter is enabled.

self.task_opt_: Task Object
NiaPy task optimizer object.
MealPy task optimizer object.

linkage_matrix_: ndarray
The SciPy hierarchical clustering encoded as a linkage matrix.
Expand Down Expand Up @@ -216,8 +199,6 @@ of the `Triglav` class and its methods.

class triglav.Scaler()

class triglav.CLRTransformer()

class triglav.NoResample()

### Parameters
Expand Down Expand Up @@ -247,6 +228,5 @@ of the `Triglav` class and its methods.

NoScale will return X
Scaler will return the closure of X (all rows sum to one, X must be non-negative)
CLRTransformer will return the CLR Transform of X (X must be non-negative)
NoResample will return X

Loading

0 comments on commit 9913b3f

Please sign in to comment.