Skip to content

Commit

Permalink
predictor pre-selection
Browse files Browse the repository at this point in the history
  • Loading branch information
siemdejong committed Dec 16, 2022
1 parent 2fced3b commit dccd15c
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 3 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ Yet to be adapted to this study.
- [ ] Statistical analysis methods
- [ ] Diagram of analytical process
- [ ] handling of predictors
- [ ] Pre-selection of predictors prior to model building (results for exp/pca/logistic)
- [x] Pre-selection of predictors prior to model building (results for exp/pca/logistic)
- [ ] rescaling/transformation on predictors (LDS + reweighting)
- [ ] type of model, building model + predictor selection + internal validation
- [x] model ensembling techniques (if used)
Expand Down
30 changes: 28 additions & 2 deletions skinstression/chapters/methods.tex
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ \section{Outcome}
The measurement is done mechanically by an experimentalist.
The mechanical measurement itself is blind to clinical information.

\section{Predictors}
\section{Predictors}\label{sec:skin_predictors}

% --------------------------------------------------
% SEARCHING FOR A SIMPLE SKIN STRAIN-STRESS MODEL
Expand Down Expand Up @@ -152,7 +152,7 @@ \subsubsection{Exponential}
\subsubsection{Principal component analysis}
In an earlier study (ref A.\ Soylu), principal component analysis (PCA) is used to reduce the dimensionality of the strain-stress data.
In summary, after PCA, every measurement $Y$ can be approximated by
\begin{equation}
\begin{equation}\label{eq:pca}
Y \approx Y_\mathrm{PCA} = \mathbf{A} \mathbf{V} + \bar{Y},
\end{equation}
where $\mathbf{A}$ and $\mathbf{V}$ are matrices containing respectively the eigenvalues and -vectors of the the measurement data.
Expand Down Expand Up @@ -195,6 +195,32 @@ \section{Missing data}

\section{Statistical analysis methods}

\subsection{Predictor pre-selection}
As discussed in \cref{sec:skin_predictors}, there are three candidates to be used as neural network predictors.
These candidates are tested against the raw strain-stress curves.

\subsubsection{Exponential and logistic curve}
The exponential and logistic models are fitted to all raw strain-stress curves.
The goodness of fit is assessed by eye.\marginnote{Whyyyy by eyeeee, simply calculate r2 also for exp :) yo}
A fit is considered good if it passes reasonably through all data points.
Moreover, the exponential regime of the fit should describe the leg part of the curve.

\subsubsection{Principal component analysis}
PCA requires information on at least one axis to align between every curve.
The first step to achieve this is excluding all stretch values above the stretch of the maximum of the shortest curve.
\textcite{Soylu2022} did linear interpolation on the curves and restricted both stretch and stress to minim peak value.
PCA on two variables requires only one shared set of points.
Moreover, results of \citeauthor{Soylu2022} show knicks in the PCA reconstructions near the end of the curves, which could originate from a limited amount of datapoints or linear interpolation.
Therefore, in this study, a non-uniform, univariate, interpolating spline was fitted to all points and the stress was calculated from the spline at the stetch values of the curve with the lowest maximum stretch.
After PCA on the complete dataset, the explained variance per component was calculated and used as a method to find an appropriate number of principal components.
From these principal components, the curves where reconstructed using \cref{eq:pca}.
The goodness of fit was determined by eye.
A fit is considered good if it passes reasonably through all data points and has few inflection points.

Only if PCA on the full dataset works reasonably well, it is possible to use PCA on a subset and use it to reconstruct another subset.
This would be useful if PCA was used to construct predictors, as using PCA results of the full dataset introduce information leakage from the test sets to the training set, because the components describe data from both subsets.
This is unlike Ref.~\cite{Soylu2022} where information leakage was not considered.\marginnote{Where to put PCA bias study?}

\subsection{Convolutional neural network}
The basis of the model originates from Liang \emph{et al.} \cite{Liang2017} and is adapted by Soylu \cite{Soylu2022}.
The model, a convolutional neural network, consists of five blocks.
Expand Down

0 comments on commit dccd15c

Please sign in to comment.