(i) Check Numerical Features.
(ii) Check Categorial Feature.
(iii) Check Missing Values.
(iV) Check Outliers.
(v) Data Cleaning.
-
(i)
Mean / Median(ii)
Random Sample Imputation(iii)
Capturing NAN Value with a new feature.(iv)
End of Distribution Imputation.(v)
Arbitrary Value Imputation (Min / Max) value. -
(i)
Mode (Frequent Category Imputation).(ii)
Capturing NAN Value with a new feature.
(i)
Under Sampling ( Best for small dataset ).
(ii)
Over Sampling ( Best for large dataset ).
(i)
IQR.
(i)
Standardization.
(ii)
Normalization.
(iii)
Robust Scaler (Scaling To Median And Quantiles).
(iV)
Gaussian Transformation.
- Logarithmic Transformation.
- Reciprocal Transformation
- Square Root Transformation.
- Exponential Transformation.
- Box Cox Transformation.
(i)
One Hot Encoding {Nominal}.
(ii)
One Hot Encoding with many category in a feature {Nominal}.
(iii)
Count or Frequency Encoding {Nominal}.
(iv)
Mean Encoding {Nominal}.
(v)
Probability Ratio Encoding {Nominal}.
(vi)
Ordinal Number Encoding {Ordinal}.
(vii)
Target Guided {Ordinal Encoding}.
(i)
Variance Threshold ( Unsupervised )
(ii)
Correlation
(iii)
Chi-Square
(iv)
Genetic Algorithm
(v)
K Neighbour
(vi)
Feature Importance {Extra Tree Classifier}
i. Linear Regression -- Handle Outlier(Y) -- Required Feature Transformation(Y).
ii. Ridge and Lasso Regression -- Handle Outlier(Y) -- Required Feature Transformation(Y).
iii. Decision Tree Regressor -- Handle Outlier(N)
iv. Random Forest Regressor -- Handle Outlier(N) -- Required Feature Transformation(N).
v. XGBoost Regressor -- Handle Outlier(N)
vi. AdaBoost Regressor --
vii. ANN -- Required Feature Transformation(Y)
viii. RNN -- Required Feature Transformation(Y)
ix. KNN -- Handle Outlier(Y) -- Required Feature Transformation(Y)
x. SVM -- Handle Outlier(N)
xi. Gradient Boosting -- Handle Outlier(N)
i. Logistic Regression -- Handle Outlier(Y) -- Required Feature Transformation(Y)
ii. Decision Tree Classifier -- Handle Outlier(N)
iii. Random Forest Classifier -- Handle Outlier(N) -- Required Feature Transformation(N).
iv. XGBoost Classifier -- Handle Outlier(N)
v. AdaBoost Classifier --
vi. ANN -- -- Required Feature Transformation(Y)
vii. CNN -- -- Required Feature Transformation(Y)
viii. Naive Baye's Classifier -- Handle Outlier(N) --
ix. KNN -- Handle Outlier(Y) -- Required Feature Transformation(Y)
x. SVM -- Handle Outlier(N)
xi. Gradient Boosting -- Handle Outlier(N)
i. KMeans -- Handle Outlier(Y) -- Required Feature Transformation(Y)
ii. DBScan -- Handle Outlier(Y)
iii. Hierical Clustering -- Handle Outlier(Y) -- Required Feature Transformation(Y)
iv. KNN Clustering
v. PCA -- Handle Outlier(Y)
vi. LDA
vii. Neural Network -- Handle Outlier(Y)