Skip to content

Dimensionality reduction techniques with UCIML datasets from Kirill Eremenko intuition

Notifications You must be signed in to change notification settings

kuta-ndze/Dimensionality_Reduction_Techniques

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Dimensionality_Reduction_Techniques

#f03c15 Principal Component Analaysis (PCA)

  • Goal is to identify and detect strong correlation within variables i.e finding dimensions of maximum variance, reduce the dimensions of a d-dimensional dataset by projecting it onto a (k)-dimensional subspace (where k<d)

  • Unlike linear regression it attempts to learn the relationship between X and Y values quantified by finding a list of principal axes.

  • PCA can be highly affected by outliers in the data.

  • A good analysis of UCIML wine dataset applying PCA with 2 components and building a Logistic Regression classifier.

    • PCAclassifier
      Visualising the Train set Visualizing the Test set
      TrainedVisuals TestVisuals
  • In this type of algorithms we could try multiple number of components starting at 2, if we see that the model underperforms to get the optimal number of features.

from sklearn.decomposition import PCA
pca = PCA(n_components = "choose optimal nbr of components")
X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)

#0f3fff Linear Discriminant Analaysis (LDA)

  • LDA differs from PCA in that in addition to finding the component axises, we are interested in the axes that maximize the separation between multiple classes.
  • Both are all linear transformation techniques use for dimensionality reduction.
  • PCA is described as unsupervised but LDA is supervised because of the relation to the dependent variable.
  • The goal of LDA is to project feature space ( a dataset of n-dimensional samples) onto a small subspace k(where k <= n-1) while maintaining the class-discriminatory information.
  • Five steps method for the algorithm as well. The application of LDA before the classifier below.
    • LDAclassifier
      Visualising the Train set Visualizing the Test set
      TrainedVisuals TestVisuals
  • The implementation of LDA is different from PCA module.
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA(n_components = 2)
#to apply LDA need both features and dependent variables
X_train = lda.fit_transform(X_train, y_train)
X_test = lda.transform(X_test)

#fc51ff Kernel Principal Component Analysis

  • The Kernal PCA in most cases will always outperform the normal PCA.
  • We have applied KernelPCA to the UCIML Wine dataset
  • The implemention of KernelPCA
from sklearn.decomposition import KernelPCA
kpca = KernelPCA(n_components = 2, kernel = 'rbf')  #radial base function
X_train = kpca.fit_transform(X_train)
X_test = kpca.transform(X_test)

About

Dimensionality reduction techniques with UCIML datasets from Kirill Eremenko intuition

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages