diff --git a/Stage-4/report/lgg-analysis-report.md b/Stage-4/report/lgg-analysis-report.md
index 9208b4d..9f40520 100644
--- a/Stage-4/report/lgg-analysis-report.md
+++ b/Stage-4/report/lgg-analysis-report.md
@@ -6,11 +6,26 @@
+## Table of Contents
+1. [Introduction](#1-introduction-to-low-grade-glioma)
+2. [Dataset and Data Preprocessing](#2-description-of-dataset-and-data-preprocessing-steps)
+3. [Methodology for Biomarker Discovery](#3-methodology-for-biomarker-discovery)
+4. [Methodology for Machine Learning Analysis](#4-methodology-for-machine-learning)
+5. [Result and Interpretation](#5-result-and-interpretation-of-model-performance)
+6. [Conclusion and Future Directions for Research](#6-conclusion-and-future-directions-for-research)
+7. [References](#references)
## 1. Introduction to Low-Grade Glioma
Low-Grade Gliomas (LGGs) are slow-growing brain tumors classified as Grade II gliomas by the World Health Organization (Ravanpay *et al*., 2018). Despite their slower growth, they can infiltrate brain tissue and progress to more aggressive forms. A key biomarker in LGG is the IDH mutation, which is associated with better prognosis, while IDH-wild type tumors tend to behave more aggressively (Solomou *et al*., 2023).
### 1.1 Project Aim:
@@ -37,7 +52,7 @@ The analysis used the `TCGAanalyze_DEA` function from the TCGAbiolinks R package
@@ -75,20 +90,17 @@ After performing DGE, genes were filtered by selecting those with a LogFC > 1 an
A random forest classification model was built to classify mutation status—mutant or wildtype— using the feature-selected training dataset consisting of 123 genes and 360 samples. 100 genes considered at each split (`mtry = 100`). Model testing was performed on an independent set of 153 samples.
### 4.2 KNN for Predicting IDH Status
-### 4.2.1 Feature extraction
We built a K-Nearest Neighbors (KNN) model to predict IDH status (mutant or wildtype) based on gene expression across samples. After thorough data preprocessing, cleaning, and filtering, we applied the `topPreds` function to identify the top 1000 predictors with the highest standard deviation, which were then used to train the model.
-### 4.2.2 Model Training and Testing
+### 4.2.1 Model Training and Testing
The data was split into 70:30, resulting in 375 samples for training and 159 for testing. We used cross-validation (via the `ctrl.lgg` function) to ensure thorough sampling and reduce bias during training. The optimal value of k was determined using `knn.lgg$bestTune`, which identified the best `k = 1`.
@@ -98,19 +110,19 @@ We built a K-Nearest Neighbors (KNN) model to predict IDH status (mutant or wild
@@ -122,7 +134,7 @@ The model achieved a prediction accuracy of 91.5%. The model accurately predicte