diff --git a/README.md b/README.md
index e5ade85f..e34d3d40 100644
--- a/README.md
+++ b/README.md
@@ -3,18 +3,18 @@
   <a href='https://arxiv.org/abs/2409.10566'><img src=https://img.shields.io/badge/arXiv-2409.10566-b31b1b.svg></a>
   <a href='https://arxiv.org/pdf/2504.00294'><img src=https://img.shields.io/badge/arXiv-2504.00294-b31b1b.svg></a>
   <a href='https://huggingface.co/datasets/microsoft/Eureka-Bench-Logs/tree/main'><img src=https://huggingface.co/front/assets/huggingface_logo-noborder.svg width="16">Eureka Evaluation Logs</a>
-  <a href='https://microsoft.github.io/eureka-ml-insights'><img src=docs/figures/github.png width="16">Project Website</a>
+  <a href='https://microsoft.github.io/eureka-ml-insights'><img src=readme_docs/figures/github.png width="16">Project Website</a>
 </p>
 
 This repository contains the code for the Eureka ML Insights framework. The framework is designed to help researchers and practitioners run reproducible evaluations of generative models using a variety of benchmarks and metrics efficiently. The framework allows the user to define custom pipelines for data processing, inference, and evaluation, and provides a set of pre-defined evaluation pipelines for key benchmarks.
 
 ## 📰 News
 
-- **[2025/5/20]**: We have uploaded logs from all experiment reported in our papers on [HuggingFace](https://huggingface.co/datasets/microsoft/Eureka-Bench-Logs/tree/main). 
-- **[2025/4/29]**: New blog post out [Eureka Inference-Time Scaling Insights: Where We Stand and What Lies Ahead](https://www.microsoft.com/en-us/research/articles/eureka-inference-time-scaling-insights-where-we-stand-and-what-lies-ahead/)
-- **[2025/3/31]**: ✨ We have a new paper out [Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead](https://arxiv.org/abs/2504.00294)
-- **[2024/9/17]**: New blog post out [Eureka: Evaluating and understanding progress in AI](https://aka.ms/eureka-ml-insights-blog)
-- **[2024/9/17]**: ✨ New paper out [Eureka: Evaluating and Understanding Large Foundation Models](https://arxiv.org/abs/2409.10566)
+- **[2025/5/20]**: <img src=https://huggingface.co/front/assets/huggingface_logo-noborder.svg width="16"> We have uploaded logs from all experiment reported in our papers on [HuggingFace](https://huggingface.co/datasets/microsoft/Eureka-Bench-Logs/tree/main). 
+- **[2025/4/29]**: <img src=readme_docs/figures/msr_blog.png width="16"> New blog post out [Eureka Inference-Time Scaling Insights: Where We Stand and What Lies Ahead](https://www.microsoft.com/en-us/research/articles/eureka-inference-time-scaling-insights-where-we-stand-and-what-lies-ahead/)
+- **[2025/3/31]**: <img src=readme_docs/figures/arxiv_logo.svg width="16"> We have a new technical report out [Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead](https://arxiv.org/abs/2504.00294)
+- **[2024/9/17]**: <img src=readme_docs/figures/msr_blog.png width="16"> New blog post out [Eureka: Evaluating and understanding progress in AI](https://aka.ms/eureka-ml-insights-blog)
+- **[2024/9/17]**: <img src=readme_docs/figures/arxiv_logo.svg width="16"> New technical report out [Eureka: Evaluating and Understanding Large Foundation Models](https://arxiv.org/abs/2409.10566)
 ## Table of Contents
 - [Eureka ML Insights Framework](#eureka-ml-insights-framework)
   - [📰 News](#-news)
@@ -98,7 +98,7 @@ The results of the experiment will be saved in a directory under `logs/FlenQA_Ex
 For other available experiment pipelines and model configurations, see the `eureka_ml_insights/user_configs` and `eureka_ml_insights/configs` directories, respectively. In [model_configs.py](eureka_ml_insights/configs/model_configs.py) you can configure the model classes to use your API keys, Key Vault urls, endpoints, and other model-specific configurations.
 
 ## 🗺️ Overview of Experiment Pipelines
-![Components](./docs/figures/transparent_uml.png)
+![Components](./readme_docs/figures/transparent_uml.png)
 Experiment pipelines define the sequence of components that are run to process data, run inference, and evaluate the model outputs. You can find examples of experiment pipeline configurations in the `user_configs` directory. To create a new experiment configuration, you need to define a class that inherits from `ExperimentConfig` and implements the `configure_pipeline` method. In the `configure_pipeline` method you define the Pipeline config (arrangement of Components) for your Experiment. Once your class is ready, add it to `user_configs/__init__.py` import list.
 
 
diff --git a/docs/figures/Benchmarks.png b/docs/figures/Benchmarks.png
deleted file mode 100644
index f191fd9e..00000000
Binary files a/docs/figures/Benchmarks.png and /dev/null differ
diff --git a/docs/figures/huggingface_logo-noborder.svg b/docs/figures/huggingface_logo-noborder.svg
deleted file mode 100644
index 43c5d3c0..00000000
--- a/docs/figures/huggingface_logo-noborder.svg
+++ /dev/null
@@ -1,37 +0,0 @@
-<svg xmlns="http://www.w3.org/2000/svg" width="95" height="88" fill="none">
-	<path fill="#FFD21E" d="M47.21 76.5a34.75 34.75 0 1 0 0-69.5 34.75 34.75 0 0 0 0 69.5Z" />
-	<path
-		fill="#FF9D0B"
-		d="M81.96 41.75a34.75 34.75 0 1 0-69.5 0 34.75 34.75 0 0 0 69.5 0Zm-73.5 0a38.75 38.75 0 1 1 77.5 0 38.75 38.75 0 0 1-77.5 0Z"
-	/>
-	<path
-		fill="#3A3B45"
-		d="M58.5 32.3c1.28.44 1.78 3.06 3.07 2.38a5 5 0 1 0-6.76-2.07c.61 1.15 2.55-.72 3.7-.32ZM34.95 32.3c-1.28.44-1.79 3.06-3.07 2.38a5 5 0 1 1 6.76-2.07c-.61 1.15-2.56-.72-3.7-.32Z"
-	/>
-	<path
-		fill="#FF323D"
-		d="M46.96 56.29c9.83 0 13-8.76 13-13.26 0-2.34-1.57-1.6-4.09-.36-2.33 1.15-5.46 2.74-8.9 2.74-7.19 0-13-6.88-13-2.38s3.16 13.26 13 13.26Z"
-	/>
-	<path
-		fill="#3A3B45"
-		fill-rule="evenodd"
-		d="M39.43 54a8.7 8.7 0 0 1 5.3-4.49c.4-.12.81.57 1.24 1.28.4.68.82 1.37 1.24 1.37.45 0 .9-.68 1.33-1.35.45-.7.89-1.38 1.32-1.25a8.61 8.61 0 0 1 5 4.17c3.73-2.94 5.1-7.74 5.1-10.7 0-2.34-1.57-1.6-4.09-.36l-.14.07c-2.31 1.15-5.39 2.67-8.77 2.67s-6.45-1.52-8.77-2.67c-2.6-1.29-4.23-2.1-4.23.29 0 3.05 1.46 8.06 5.47 10.97Z"
-		clip-rule="evenodd"
-	/>
-	<path
-		fill="#FF9D0B"
-		d="M70.71 37a3.25 3.25 0 1 0 0-6.5 3.25 3.25 0 0 0 0 6.5ZM24.21 37a3.25 3.25 0 1 0 0-6.5 3.25 3.25 0 0 0 0 6.5ZM17.52 48c-1.62 0-3.06.66-4.07 1.87a5.97 5.97 0 0 0-1.33 3.76 7.1 7.1 0 0 0-1.94-.3c-1.55 0-2.95.59-3.94 1.66a5.8 5.8 0 0 0-.8 7 5.3 5.3 0 0 0-1.79 2.82c-.24.9-.48 2.8.8 4.74a5.22 5.22 0 0 0-.37 5.02c1.02 2.32 3.57 4.14 8.52 6.1 3.07 1.22 5.89 2 5.91 2.01a44.33 44.33 0 0 0 10.93 1.6c5.86 0 10.05-1.8 12.46-5.34 3.88-5.69 3.33-10.9-1.7-15.92-2.77-2.78-4.62-6.87-5-7.77-.78-2.66-2.84-5.62-6.25-5.62a5.7 5.7 0 0 0-4.6 2.46c-1-1.26-1.98-2.25-2.86-2.82A7.4 7.4 0 0 0 17.52 48Zm0 4c.51 0 1.14.22 1.82.65 2.14 1.36 6.25 8.43 7.76 11.18.5.92 1.37 1.31 2.14 1.31 1.55 0 2.75-1.53.15-3.48-3.92-2.93-2.55-7.72-.68-8.01.08-.02.17-.02.24-.02 1.7 0 2.45 2.93 2.45 2.93s2.2 5.52 5.98 9.3c3.77 3.77 3.97 6.8 1.22 10.83-1.88 2.75-5.47 3.58-9.16 3.58-3.81 0-7.73-.9-9.92-1.46-.11-.03-13.45-3.8-11.76-7 .28-.54.75-.76 1.34-.76 2.38 0 6.7 3.54 8.57 3.54.41 0 .7-.17.83-.6.79-2.85-12.06-4.05-10.98-8.17.2-.73.71-1.02 1.44-1.02 3.14 0 10.2 5.53 11.68 5.53.11 0 .2-.03.24-.1.74-1.2.33-2.04-4.9-5.2-5.21-3.16-8.88-5.06-6.8-7.33.24-.26.58-.38 1-.38 3.17 0 10.66 6.82 10.66 6.82s2.02 2.1 3.25 2.1c.28 0 .52-.1.68-.38.86-1.46-8.06-8.22-8.56-11.01-.34-1.9.24-2.85 1.31-2.85Z"
-	/>
-	<path
-		fill="#FFD21E"
-		d="M38.6 76.69c2.75-4.04 2.55-7.07-1.22-10.84-3.78-3.77-5.98-9.3-5.98-9.3s-.82-3.2-2.69-2.9c-1.87.3-3.24 5.08.68 8.01 3.91 2.93-.78 4.92-2.29 2.17-1.5-2.75-5.62-9.82-7.76-11.18-2.13-1.35-3.63-.6-3.13 2.2.5 2.79 9.43 9.55 8.56 11-.87 1.47-3.93-1.71-3.93-1.71s-9.57-8.71-11.66-6.44c-2.08 2.27 1.59 4.17 6.8 7.33 5.23 3.16 5.64 4 4.9 5.2-.75 1.2-12.28-8.53-13.36-4.4-1.08 4.11 11.77 5.3 10.98 8.15-.8 2.85-9.06-5.38-10.74-2.18-1.7 3.21 11.65 6.98 11.76 7.01 4.3 1.12 15.25 3.49 19.08-2.12Z"
-	/>
-	<path
-		fill="#FF9D0B"
-		d="M77.4 48c1.62 0 3.07.66 4.07 1.87a5.97 5.97 0 0 1 1.33 3.76 7.1 7.1 0 0 1 1.95-.3c1.55 0 2.95.59 3.94 1.66a5.8 5.8 0 0 1 .8 7 5.3 5.3 0 0 1 1.78 2.82c.24.9.48 2.8-.8 4.74a5.22 5.22 0 0 1 .37 5.02c-1.02 2.32-3.57 4.14-8.51 6.1-3.08 1.22-5.9 2-5.92 2.01a44.33 44.33 0 0 1-10.93 1.6c-5.86 0-10.05-1.8-12.46-5.34-3.88-5.69-3.33-10.9 1.7-15.92 2.78-2.78 4.63-6.87 5.01-7.77.78-2.66 2.83-5.62 6.24-5.62a5.7 5.7 0 0 1 4.6 2.46c1-1.26 1.98-2.25 2.87-2.82A7.4 7.4 0 0 1 77.4 48Zm0 4c-.51 0-1.13.22-1.82.65-2.13 1.36-6.25 8.43-7.76 11.18a2.43 2.43 0 0 1-2.14 1.31c-1.54 0-2.75-1.53-.14-3.48 3.91-2.93 2.54-7.72.67-8.01a1.54 1.54 0 0 0-.24-.02c-1.7 0-2.45 2.93-2.45 2.93s-2.2 5.52-5.97 9.3c-3.78 3.77-3.98 6.8-1.22 10.83 1.87 2.75 5.47 3.58 9.15 3.58 3.82 0 7.73-.9 9.93-1.46.1-.03 13.45-3.8 11.76-7-.29-.54-.75-.76-1.34-.76-2.38 0-6.71 3.54-8.57 3.54-.42 0-.71-.17-.83-.6-.8-2.85 12.05-4.05 10.97-8.17-.19-.73-.7-1.02-1.44-1.02-3.14 0-10.2 5.53-11.68 5.53-.1 0-.19-.03-.23-.1-.74-1.2-.34-2.04 4.88-5.2 5.23-3.16 8.9-5.06 6.8-7.33-.23-.26-.57-.38-.98-.38-3.18 0-10.67 6.82-10.67 6.82s-2.02 2.1-3.24 2.1a.74.74 0 0 1-.68-.38c-.87-1.46 8.05-8.22 8.55-11.01.34-1.9-.24-2.85-1.31-2.85Z"
-	/>
-	<path
-		fill="#FFD21E"
-		d="M56.33 76.69c-2.75-4.04-2.56-7.07 1.22-10.84 3.77-3.77 5.97-9.3 5.97-9.3s.82-3.2 2.7-2.9c1.86.3 3.23 5.08-.68 8.01-3.92 2.93.78 4.92 2.28 2.17 1.51-2.75 5.63-9.82 7.76-11.18 2.13-1.35 3.64-.6 3.13 2.2-.5 2.79-9.42 9.55-8.55 11 .86 1.47 3.92-1.71 3.92-1.71s9.58-8.71 11.66-6.44c2.08 2.27-1.58 4.17-6.8 7.33-5.23 3.16-5.63 4-4.9 5.2.75 1.2 12.28-8.53 13.36-4.4 1.08 4.11-11.76 5.3-10.97 8.15.8 2.85 9.05-5.38 10.74-2.18 1.69 3.21-11.65 6.98-11.76 7.01-4.31 1.12-15.26 3.49-19.08-2.12Z"
-	/>
-</svg>
diff --git a/eureka_ml_insights/configs/config.py b/eureka_ml_insights/configs/config.py
index 75db4c31..8386bbdd 100644
--- a/eureka_ml_insights/configs/config.py
+++ b/eureka_ml_insights/configs/config.py
@@ -5,23 +5,34 @@
 UtilityClassConfigType = TypeVar("UtilityClassConfigType", bound=Type["UtilityClassConfig"])
 ComponentConfigType = TypeVar("ComponentConfigType", bound=Type["ComponentConfig"])
 
-""" Config classes for utility classes """
+"""Provides configuration classes for utility classes.
+
+These classes define configurations for various utility classes utilized in the pipeline.
+"""
 
 
 @dataclass
 class UtilityClassConfig:
-    """Base class for all utility class configs
-    Args:
-        class_name (Any): The utility class to be used with this config
-        init_args (dict): The arguments to be passed to the utility class constructor
+    """
+    Base class for all utility class configs.
+
+    Attributes:
+        class_name (Any): The utility class to be used with this config.
+        init_args (dict): Arguments to be passed to the utility class constructor.
     """
 
     class_name: Any = None
     init_args: dict = field(default_factory=dict)
 
     def __repr__(self):
-        # remove "secret_key_params" from the init_args representation for security reasons
-        # but keep it in the actual config
+        """
+        Return a string representation of the configuration.
+
+        Omits 'secret_key_params' and 'api_key' for security reasons.
+
+        Returns:
+            str: A string representation of the configuration.
+        """
         init_args_copy = copy.deepcopy(self.init_args)
         init_args_copy.pop("secret_key_params", None)
         init_args_copy.pop("api_key", None)
@@ -30,33 +41,74 @@ def __repr__(self):
 
 @dataclass(repr=False)
 class DataSetConfig(UtilityClassConfig):
-    pass
+    """
+    Configuration class for a dataset.
+
+    Inherits from:
+        UtilityClassConfig
+
+    Attributes:
+        class_name (Any): The utility class to be used with this config.
+        init_args (dict): Arguments to be passed to the utility class constructor.
+    """
 
 
 @dataclass(repr=False)
 class ModelConfig(UtilityClassConfig):
-    pass
+    """
+    Configuration class for a model.
+
+    Inherits from:
+        UtilityClassConfig
+
+    Attributes:
+        class_name (Any): The utility class to be used with this config.
+        init_args (dict): Arguments to be passed to the utility class constructor.
+    """
 
 
 @dataclass(repr=False)
 class MetricConfig(UtilityClassConfig):
-    pass
+    """
+    Configuration class for a metric.
+
+    Inherits from:
+        UtilityClassConfig
+
+    Attributes:
+        class_name (Any): The utility class to be used with this config.
+        init_args (dict): Arguments to be passed to the utility class constructor.
+    """
 
 
 @dataclass(repr=False)
 class AggregatorConfig(UtilityClassConfig):
-    pass
+    """
+    Configuration class for an aggregator.
 
+    Inherits from:
+        UtilityClassConfig
 
-""" Config classes for component classes """
+    Attributes:
+        class_name (Any): The utility class to be used with this config.
+        init_args (dict): Arguments to be passed to the utility class constructor.
+    """
+
+
+"""Provides configuration classes for component classes.
+
+This section includes classes that define various component configurations used in the pipeline.
+"""
 
 
 @dataclass
 class ComponentConfig:
-    """Base class for all component configs
-    Args:
-        component_type (Any): The component class to be used with this config
-        output_dir (str): The directory to save the output files of this component
+    """
+    Base class for all component configurations.
+
+    Attributes:
+        component_type (Any): The component class to be used with this config.
+        output_dir (str): The directory to save the output files of this component.
     """
 
     component_type: Any = None
@@ -65,10 +117,17 @@ class ComponentConfig:
 
 @dataclass
 class DataProcessingConfig(ComponentConfig):
-    """Config class for the data processing component
-    Args:
-        data_reader_config (UtilityClassConfig): The data reader config to be used with this component
-        output_data_columns (list): List of columns (subset of input columns) to keep in the transformed data output file
+    """
+    Config class for the data processing component.
+
+    Inherits from:
+        ComponentConfig
+
+    Attributes:
+        component_type (Any): The component class to be used with this config.
+        output_dir (str): The directory to save the output files of this component.
+        data_reader_config (UtilityClassConfigType): The data reader config to be used with this component.
+        output_data_columns (List[str]): List of columns (subset of input columns) to keep in the transformed data output file.
     """
 
     data_reader_config: UtilityClassConfigType = None
@@ -77,10 +136,19 @@ class DataProcessingConfig(ComponentConfig):
 
 @dataclass
 class PromptProcessingConfig(DataProcessingConfig):
-    """Config class for the prompt processing component
-    Args:
-        prompt_template_path (str): The path to the prompt template jinja file
-        ignore_failure (bool): Whether to ignore the failures in the prompt processing and move on
+    """
+    Config class for the prompt processing component.
+
+    Inherits from:
+        DataProcessingConfig
+
+    Attributes:
+        component_type (Any): The component class to be used with this config.
+        output_dir (str): Directory where output files are saved.
+        data_reader_config (UtilityClassConfigType): The data reader config used with this component.
+        output_data_columns (List[str]): List of columns to keep in the transformed data output file.
+        prompt_template_path (str): The path to the prompt template jinja file.
+        ignore_failure (bool): Whether to ignore failures in prompt processing and move on.
     """
 
     prompt_template_path: str = None
@@ -89,10 +157,19 @@ class PromptProcessingConfig(DataProcessingConfig):
 
 @dataclass
 class DataJoinConfig(DataProcessingConfig):
-    """Config class for the data join component
-    Args:
-        other_data_reader_config (UtilityClassConfig): The data reader config for the dataset to be joined with the main dataset
-        pandas_merge_args (dict): Arguments to be passed to pandas merge function
+    """
+    Config class for the data join component.
+
+    Inherits from:
+        DataProcessingConfig
+
+    Attributes:
+        component_type (Any): The component class to be used with this config.
+        output_dir (str): Directory where output files are saved.
+        data_reader_config (UtilityClassConfigType): The data reader config used with this component.
+        output_data_columns (List[str]): List of columns to keep in the transformed data output file.
+        other_data_reader_config (UtilityClassConfigType): The data reader config for the dataset to be joined with the main dataset.
+        pandas_merge_args (dict): Arguments passed to pandas merge function.
     """
 
     other_data_reader_config: UtilityClassConfigType = None
@@ -101,25 +178,43 @@ class DataJoinConfig(DataProcessingConfig):
 
 @dataclass
 class DataUnionConfig(DataProcessingConfig):
-    """Config class for the data union component
-    Args:
-        other_data_reader_config (UtilityClassConfig): The data reader config for the dataset to be joined with the main dataset
-        output_data_columns (list): List of columns (subset of input columns) to keep in the transformed data output file
-        dedupe_cols (list): List of columns to deduplicate the concatenated data frame
+    """
+    Config class for the data union component.
+
+    Inherits from:
+        DataProcessingConfig
+
+    Attributes:
+        component_type (Any): The component class to be used with this config.
+        output_dir (str): Directory where output files are saved.
+        data_reader_config (UtilityClassConfigType): The data reader config used with this component.
+        output_data_columns (List[str]): List of columns (subset of input columns) to keep in the transformed data output file.
+        other_data_reader_config (UtilityClassConfigType): The data reader config for the dataset to be joined with the main dataset.
+        dedupe_cols (List[str]): Columns used to deduplicate the concatenated data frame.
     """
 
     other_data_reader_config: UtilityClassConfigType = None
-    output_data_columns: List[str] = None
     dedupe_cols: List[str] = None
 
 
 @dataclass
 class InferenceConfig(ComponentConfig):
-    """Config class for the inference component
-    Args:
-        data_loader_config (UtilityClassConfig): The data loader config to be used with this component
-        model_config (UtilityClassConfig): The model config to be used with this component
-        resume_from (str): Optional. Path to the file where previous inference results are stored
+    """
+    Config class for the inference component.
+
+    Inherits from:
+        ComponentConfig
+
+    Attributes:
+        component_type (Any): The component class to be used with this config.
+        output_dir (str): Directory where output files are saved.
+        data_loader_config (UtilityClassConfigType): The data loader config used with this component.
+        model_config (UtilityClassConfigType): The model config to be used with this component.
+        resume_from (str): Path to the file where previous inference results are stored.
+        new_columns (List[str]): New columns generated by this component that are not present in the resume_from file.
+        requests_per_minute (int): Number of requests allowed per minute.
+        max_concurrent (int): Maximum number of concurrent requests. Defaults to 1.
+        chat_mode (bool): Whether the inference is in chat mode.
     """
 
     data_loader_config: UtilityClassConfigType = None
@@ -133,12 +228,19 @@ class InferenceConfig(ComponentConfig):
 
 @dataclass
 class EvalReportingConfig(ComponentConfig):
-    """Config class for the evaluation reporting component
-    Args:
-        data_reader_config (UtilityClassConfig): The data reader config to configure the data reader for this component
-        metric_config (UtilityClassConfig): The metric config
-        aggregator_configs (list): List of aggregator configs
-        visualizer_configs (list): List of visualizer configs
+    """
+    Config class for the evaluation reporting component.
+
+    Inherits from:
+        ComponentConfig
+
+    Attributes:
+        component_type (Any): The component class to be used with this config.
+        output_dir (str): Directory where output files are saved.
+        data_reader_config (UtilityClassConfigType): The data reader config used for this component.
+        metric_config (UtilityClassConfigType): The metric config used for evaluation.
+        aggregator_configs (List[UtilityClassConfigType]): Aggregator configurations.
+        visualizer_configs (List[UtilityClassConfigType]): Visualizer configurations.
     """
 
     data_reader_config: UtilityClassConfigType = None
@@ -147,21 +249,32 @@ class EvalReportingConfig(ComponentConfig):
     visualizer_configs: List[UtilityClassConfigType] = field(default_factory=list)
 
 
-""" Config class for the pipeline class """
+"""Provides a configuration class for the pipeline.
+
+This section includes a configuration class for the pipeline.
+"""
 
 
 @dataclass
 class PipelineConfig:
-    """Config class for the pipeline class
-    Args:
-        component_configs (list): List of ComponentConfigs
-        log_dir (str): The directory to save the logs of the pipeline
+    """
+    Configuration class for the pipeline.
+
+    Attributes:
+        component_configs (list[ComponentConfigType]): List of component configurations.
+        log_dir (str): Directory to save the pipeline logs.
     """
 
     component_configs: list[ComponentConfigType] = field(default_factory=list)
     log_dir: str = None
 
     def __repr__(self):
+        """
+        Return a string representation of the pipeline configuration.
+
+        Returns:
+            str: A string containing all component configurations.
+        """
         res = ""
         for comp in self.component_configs:
             res += str(comp) + "\n"
diff --git a/eureka_ml_insights/configs/experiment_config.py b/eureka_ml_insights/configs/experiment_config.py
index 3d6dafc5..9679c6e0 100644
--- a/eureka_ml_insights/configs/experiment_config.py
+++ b/eureka_ml_insights/configs/experiment_config.py
@@ -7,15 +7,16 @@
 
 
 def create_logdir(exp_dir: str, exp_subdir: Optional[str] = None):
-    """
-    Create a unique log directory for the experiment.
+    """Creates a unique log directory for the experiment.
+
     Args:
-        exp_dir: The name of the experiment directory directly under /logs.
-        exp_subdir: The name of the experiment subdirectory under the experiment directory.
+        exp_dir (str): The name of the experiment directory directly under "/logs".
+        exp_subdir (Optional[str], optional): The name of the experiment subdirectory under
+            the experiment directory. Defaults to None.
+
     Returns:
-        log_dir: The unique log directory for the experiment.
+        str: The unique log directory for the experiment.
     """
-    # generate a unique log dir based on time and date
     date = datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S.%f")
     if exp_subdir is None:
         log_dir = os.path.join("logs", f"{exp_dir}", f"{date}")
@@ -26,12 +27,18 @@ def create_logdir(exp_dir: str, exp_subdir: Optional[str] = None):
 
 
 class ExperimentConfig(ABC):
-    """
-    Abstract class for the experiment piplien configuration class.
+    """Abstract class for the experiment pipeline configuration.
+
     Child classes should implement the configure_pipeline method.
     """
+
     def __init__(self, exp_logdir: Optional[str] = None, **kwargs):
+        """Initializes the ExperimentConfig.
 
+        Args:
+            exp_logdir (Optional[str], optional): The experiment log directory. Defaults to None.
+            **kwargs: Additional keyword arguments to configure the pipeline.
+        """
         dir_name = self.__class__.__name__
         self.log_dir = create_logdir(dir_name, exp_logdir)
         self.pipeline_config = self.configure_pipeline(**kwargs)
@@ -41,4 +48,15 @@ def __init__(self, exp_logdir: Optional[str] = None, **kwargs):
 
     @abstractmethod
     def configure_pipeline(self, **kwargs) -> PipelineConfig:
+        """Configures and returns the pipeline.
+
+        Args:
+            **kwargs: Keyword arguments for pipeline configuration.
+
+        Returns:
+            PipelineConfig: The configured pipeline.
+
+        Raises:
+            NotImplementedError: If the method is not implemented in a subclass.
+        """
         raise NotImplementedError("configure_pipeline method must be implemented in the subclass")
diff --git a/eureka_ml_insights/configs/model_configs.py b/eureka_ml_insights/configs/model_configs.py
index 0f1cc3ce..24ce86bc 100644
--- a/eureka_ml_insights/configs/model_configs.py
+++ b/eureka_ml_insights/configs/model_configs.py
@@ -1,11 +1,16 @@
-""" This module contains config objects for the models used in the experiments. To use these configs, make sure to
-replace the placeholders with your own keys.json file, secret key names, and endpint URLs where applicable. 
-You can also add your custom models here by following the same pattern as the existing configs. """
+"""Module containing config objects for the models used in the experiments.
+
+This module includes config objects for various models, specifying parameters such
+as model names, keys, and endpoints. To use these configs, replace the placeholders
+with your own keys.json file, secret key names, and endpoint URLs where applicable.
+You can also add your custom models by following the same pattern as the existing configs.
+"""
 
 from eureka_ml_insights.models import (
     AzureOpenAIOModel,
     ClaudeModel,
     ClaudeReasoningModel,
+    DeepseekR1ServerlessAzureRestEndpointModel,
     DirectOpenAIModel,
     DirectOpenAIOModel,
     GeminiModel,
@@ -13,15 +18,13 @@
     LLaVAHuggingFaceModel,
     LLaVAModel,
     LocalVLLMModel,
-    Phi4HFModel,
     MistralServerlessAzureRestEndpointModel,
-    DeepseekR1ServerlessAzureRestEndpointModel,
+    Phi4HFModel,
     RestEndpointModel,
-    TogetherModel,
     TestModel,
+    TogetherModel,
     OfflineFileModel
 )
-from eureka_ml_insights.models.models import AzureOpenAIModel
 
 from .config import ModelConfig
 
@@ -59,7 +62,7 @@
         "temperature": 1.0,
         # high max token limit for deep seek
         # otherwise the answers may be cut in the middle
-        "max_tokens": 65536
+        "max_tokens": 65536,
     },
 )
 
@@ -71,7 +74,7 @@
         "temperature": 0.6,
         # high max token limit for deep seek
         # otherwise the answers may be cut in the middle
-        "max_tokens": 65536
+        "max_tokens": 65536,
     },
 )
 
@@ -98,6 +101,16 @@
         "secret_key_params": OPENAI_SECRET_KEY_PARAMS,
     },
 )
+TRAPI_O1_CONFIG = ModelConfig(
+    AzureOpenAIOModel,
+    {
+        "url": "https://trapi.research.microsoft.com/msraif/shared",
+        # o1 models only work with 2024-12-01-preview api version
+        "api_version": "2024-12-01-preview",
+        "model_name": "o1_2024-12-17",
+        "auth_scope": "api://trapi/.default",
+    },
+)
 
 OAI_O1_CONFIG = ModelConfig(
     DirectOpenAIOModel,
@@ -184,7 +197,7 @@
     {
         "model_name": "gemini-2.0-flash-thinking-exp-01-21",
         "secret_key_params": GEMINI_SECRET_KEY_PARAMS,
-	    "max_tokens": 32768
+        "max_tokens": 32768,
     },
 )
 
@@ -234,9 +247,9 @@
         "model_name": "claude-3-7-sonnet-20250219",
         "thinking_enabled": True,
         "thinking_budget": 30720,
-        "max_tokens": 32768, # This number should always be higher than the thinking budget
-        "temperature": 1.0, # As of 03/08/2025, thinking only works with temperature 1.0
-        "timeout": 600, # We set a timeout of 10 minutes for thinking
+        "max_tokens": 32768,  # This number should always be higher than the thinking budget
+        "temperature": 1.0,  # As of 03/08/2025, thinking only works with temperature 1.0
+        "timeout": 600,  # We set a timeout of 10 minutes for thinking
     },
 )
 
diff --git a/eureka_ml_insights/core/data_join.py b/eureka_ml_insights/core/data_join.py
index 3b26e7ee..d51ed67c 100644
--- a/eureka_ml_insights/core/data_join.py
+++ b/eureka_ml_insights/core/data_join.py
@@ -1,3 +1,10 @@
+"""Module for joining datasets using pandas merge functionality.
+
+This module defines the DataJoin class, which inherits from DataProcessing, and provides methods
+to merge two pandas DataFrames with flexible join types. The module also includes a factory
+class method for construction from a configuration object.
+"""
+
 from typing import List, Optional
 
 import pandas as pd
@@ -6,7 +13,7 @@
 
 
 class DataJoin(DataProcessing):
-    """This component is used to join two datasets using pandas merge functionality."""
+    """Joins two datasets using pandas merge functionality."""
 
     def __init__(
         self,
@@ -16,14 +23,15 @@ def __init__(
         pandas_merge_args: dict,
         output_data_columns: Optional[List[str]] = None,
     ) -> None:
-        """
-        args:
-            data_reader_config: DataReaderConfig
-            output_dir: str directory to save the output files of this component.
-            other_data_reader_config: DataReaderConfig
-            pandas_merge_args: dict arguments to be passed to pandas merge function.
-            output_data_columns: Optional[List[str]] list of columns (subset of input columns)
-                                      to keep in the transformed data output file.
+        """Initializes the DataJoin component.
+
+        Args:
+            data_reader_config: Configuration object for reading the first dataset.
+            output_dir (str): Directory to save the output files of this component.
+            other_data_reader_config: Configuration object for reading the second dataset.
+            pandas_merge_args (dict): Arguments passed to pandas merge function.
+            output_data_columns (List[str], optional): List of columns (subset of input columns)
+                to keep in the transformed data output file.
         """
         super().__init__(data_reader_config, output_dir, output_data_columns)
         self.other_data_reader = other_data_reader_config.class_name(**other_data_reader_config.init_args)
@@ -31,12 +39,18 @@ def __init__(
         allowed_join_types = {"inner", "outer", "left", "right", "cross"}
         join_type = self.pandas_merge_args["how"]
         if join_type.lower() not in allowed_join_types:
-            raise ValueError(
-                f"Invalid join type '{join_type}'. Expected one of: {', '.join(allowed_join_types)}."
-            )
-        
+            raise ValueError(f"Invalid join type '{join_type}'. Expected one of: {', '.join(allowed_join_types)}.")
+
     @classmethod
-    def from_config(cls, config):        
+    def from_config(cls, config):
+        """Creates a DataJoin instance from a configuration object.
+
+        Args:
+            config: A configuration object containing all required initialization attributes.
+
+        Returns:
+            DataJoin: An instance of the DataJoin class.
+        """
         return cls(
             config.data_reader_config,
             config.output_dir,
@@ -46,33 +60,34 @@ def from_config(cls, config):
         )
 
     def run(self):
+        """Executes the data join operation and writes the result to the output directory."""
         df = self.data_reader.load_dataset()
         other_df = self.other_data_reader.load_dataset()
         if len(df.columns) > 0 and len(other_df.columns) > 0:
             joined_df = pd.merge(df, other_df, **self.pandas_merge_args)
         # handling corner cases when one of the data frames is empty
         elif len(df.columns) == 0:
-            if (self.pandas_merge_args["how"] == 'inner'):
+            if self.pandas_merge_args["how"] == "inner":
                 joined_df = df
-            elif (self.pandas_merge_args["how"] == 'outer'):
+            elif self.pandas_merge_args["how"] == "outer":
                 joined_df = other_df
-            elif (self.pandas_merge_args["how"] == 'left'):
+            elif self.pandas_merge_args["how"] == "left":
                 joined_df = df
-            elif (self.pandas_merge_args["how"] == 'right'):
+            elif self.pandas_merge_args["how"] == "right":
                 joined_df = other_df
-            elif (self.pandas_merge_args["how"] == 'cross'):
+            elif self.pandas_merge_args["how"] == "cross":
                 joined_df = df
         elif len(other_df.columns) == 0:
-            if (self.pandas_merge_args["how"] == 'inner'):
+            if self.pandas_merge_args["how"] == "inner":
                 joined_df = other_df
-            elif (self.pandas_merge_args["how"] == 'outer'):
+            elif self.pandas_merge_args["how"] == "outer":
                 joined_df = df
-            elif (self.pandas_merge_args["how"] == 'left'):
+            elif self.pandas_merge_args["how"] == "left":
                 joined_df = df
-            elif (self.pandas_merge_args["how"] == 'right'):
+            elif self.pandas_merge_args["how"] == "right":
                 joined_df = other_df
-            elif (self.pandas_merge_args["how"] == 'cross'):
+            elif self.pandas_merge_args["how"] == "cross":
                 joined_df = other_df
-        if (len(joined_df.columns) > 0):
+        if len(joined_df.columns) > 0:
             joined_df = self.get_desired_columns(joined_df)
-        self.write_output(joined_df)
\ No newline at end of file
+        self.write_output(joined_df)
diff --git a/eureka_ml_insights/core/data_processing.py b/eureka_ml_insights/core/data_processing.py
index 5ae94519..1d87558a 100644
--- a/eureka_ml_insights/core/data_processing.py
+++ b/eureka_ml_insights/core/data_processing.py
@@ -6,6 +6,11 @@
 
 from eureka_ml_insights.data_utils import NumpyEncoder
 
+"""
+This module defines a data processing pipeline component, a utility function for computing MD5 hashes,
+and integrates reserved name handling for data outputs.
+"""
+
 from .pipeline import Component
 from .reserved_names import (
     INFERENCE_RESERVED_NAMES,
@@ -14,15 +19,31 @@
 
 
 def compute_hash(val: str) -> str:
-    """
-    Hashes the provided value using MD5.
+    """Compute the MD5 hash of a given string.
+
+    Args:
+        val (str): The string to be hashed.
+
+    Returns:
+        str: The MD5 hash of the input string.
     """
     return md5(val.encode("utf-8")).hexdigest()
 
 
 class DataProcessing(Component):
+    """Implements data reading, transformation, and output writing for a pipeline component."""
+
     @classmethod
     def from_config(cls, config):
+        """Create a DataProcessing instance from a configuration object.
+
+        Args:
+            config: A configuration object with data_reader_config,
+                output_dir, and output_data_columns.
+
+        Returns:
+            DataProcessing: An instance of DataProcessing.
+        """
         return cls(
             config.data_reader_config,
             config.output_dir,
@@ -35,19 +56,25 @@ def __init__(
         output_dir: str,
         output_data_columns: Optional[List[str]] = None,
     ) -> None:
-        """
-        args:
-            data_reader_config: DataReaderConfig
-            output_dir: str directory to save the output files of this component.
-            output_data_columns: Optional[List[str]] list of columns (subset of input columns)
-                                 to keep in the transformed data output file. The columns reserved for the Eureka framework
-                                 will automatically be added to the output_data_columns if not provided.
+        """Initialize the DataProcessing component.
+
+        Args:
+            data_reader_config (DataReaderConfig): The configuration for reading data.
+            output_dir (str): Directory to save the output files of this component.
+            output_data_columns (Optional[List[str]], optional): A list of columns (subset of input columns)
+                to keep in the transformed data output file. The columns reserved for the Eureka framework
+                will automatically be added to the output_data_columns if not provided.
         """
         super().__init__(output_dir)
         self.data_reader = data_reader_config.class_name(**data_reader_config.init_args)
         self.output_data_columns = output_data_columns
 
     def write_output(self, df):
+        """Write the transformed DataFrame to a JSONL file.
+
+        Args:
+            df (pandas.DataFrame): The DataFrame containing the data to write.
+        """
         logging.info(f"About to save transformed_data_file with columns: {df.columns}.")
         transformed_data_file = os.path.join(self.output_dir, "transformed_data.jsonl")
 
@@ -57,21 +84,30 @@ def write_output(self, df):
                 writer.write(json.dumps(content, ensure_ascii=False, separators=(",", ":"), cls=NumpyEncoder) + "\n")
 
     def get_desired_columns(self, df):
+        """Get the desired columns from the DataFrame, including necessary reserved columns.
+
+        Args:
+            df (pandas.DataFrame): The DataFrame from which columns will be selected.
+
+        Returns:
+            pandas.DataFrame: The DataFrame containing only the desired columns.
+        """
         if self.output_data_columns is None:
             self.output_data_columns = df.columns
         self.output_data_columns = list(self.output_data_columns)
-        # if the data was multiplied, keep the columns that are needed to identify datapoint and replicates
-        # (just in case the user forgot to specify these columns in output_data_columns)
         cols_to_keep = set(INFERENCE_RESERVED_NAMES + PROMPT_PROC_RESERVED_NAMES)
         self.output_data_columns.extend([col for col in cols_to_keep if col in df.columns])
         self.output_data_columns = list(set(self.output_data_columns))
         return df[self.output_data_columns]
 
     def run(self) -> None:
-        # data reader loads data into a pandas dataframe and applies any transformations
+        """Run the data processing steps.
+
+        Loads the dataset, applies transformations (if any),
+        selects desired columns, and writes the output as a JSONL file.
+        """
         input_df = self.data_reader.load_dataset()
         logging.info(f"input has: {len(input_df)} rows, and the columns are: {input_df.columns}.")
-        # if input_df is not empty, select the desired columns
         if not input_df.empty:
             input_df = self.get_desired_columns(input_df)
         self.write_output(input_df)
diff --git a/eureka_ml_insights/core/data_union.py b/eureka_ml_insights/core/data_union.py
index 6936d44a..95eb2f1a 100644
--- a/eureka_ml_insights/core/data_union.py
+++ b/eureka_ml_insights/core/data_union.py
@@ -1,3 +1,8 @@
+"""Module for data union functionality.
+
+This module provides a DataUnion class which concatenates two datasets using pandas concat functionality.
+"""
+
 from typing import List, Optional
 
 import pandas as pd
@@ -6,7 +11,7 @@
 
 
 class DataUnion(DataProcessing):
-    """This component is used to concatenate two datasets using pandas concat functionality."""
+    """Concatenates two datasets using pandas concat functionality."""
 
     def __init__(
         self,
@@ -16,14 +21,15 @@ def __init__(
         output_data_columns: Optional[List[str]] = None,
         dedupe_cols: Optional[List[str]] = None,
     ) -> None:
-        """
-        args:
-            data_reader_config: DataReaderConfig
-            output_dir: str directory to save the output files of this component.
-            other_data_reader_config: DataReaderConfig
-            output_data_columns: Optional[List[str]] list of columns (subset of input columns)
-                                      to keep in the transformed data output file.
-            dedupe_cols: Optional[List[str]] list of columns to deduplicate the concatenated data frame.
+        """Initializes DataUnion.
+
+        Args:
+            data_reader_config: DataReaderConfig object for reading the primary dataset.
+            output_dir (str): Directory to save the output files of this component.
+            other_data_reader_config: DataReaderConfig object for reading the secondary dataset.
+            output_data_columns (Optional[List[str]]): List of columns (subset of input columns) to keep
+                in the transformed data output file.
+            dedupe_cols (Optional[List[str]]): List of columns to deduplicate the concatenated DataFrame.
         """
         super().__init__(data_reader_config, output_dir, output_data_columns)
         self.other_data_reader = other_data_reader_config.class_name(**other_data_reader_config.init_args)
@@ -31,6 +37,14 @@ def __init__(
 
     @classmethod
     def from_config(cls, config):
+        """Creates a DataUnion instance from a configuration object.
+
+        Args:
+            config: Configuration object containing initialization parameters.
+
+        Returns:
+            DataUnion: The newly created DataUnion instance.
+        """
         return cls(
             config.data_reader_config,
             config.output_dir,
@@ -40,6 +54,15 @@ def from_config(cls, config):
         )
 
     def run(self):
+        """Concatenates two DataFrames, optionally deduplicates them, and writes the result.
+
+        This method:
+         - Loads two datasets using the configured data readers.
+         - Concatenates them with pandas.concat.
+         - Optionally drops duplicates based on dedupe_cols.
+         - Filters columns to output_data_columns if provided.
+         - Writes the final DataFrame to the specified output directory.
+        """
         df = self.data_reader.load_dataset()
         other_df = self.other_data_reader.load_dataset()
         if len(df.columns) > 0 and len(other_df.columns) > 0:
diff --git a/eureka_ml_insights/core/eval_reporting.py b/eureka_ml_insights/core/eval_reporting.py
index 7e2eaa93..3f8c96f1 100644
--- a/eureka_ml_insights/core/eval_reporting.py
+++ b/eureka_ml_insights/core/eval_reporting.py
@@ -1,4 +1,7 @@
 # report component has data reader, list of metrics, list of visualizers, and list of writers
+"""This module provides functionality to read data, compute metrics, visualize results, and write reports
+through the EvalReporting class and its supporting components."""
+
 import json
 import os
 
@@ -9,12 +12,26 @@
 
 
 class EvalReporting(Component):
-    """This class reads the data from a given dataset, calculates the metrics and writes its results,
-    and passes the results to reporter."""
+    """Reads data from a given dataset, computes metrics, writes results, and passes results to a reporter.
+
+    Attributes:
+        data_reader: An instance of a data reader that loads the dataset.
+        metric: An instance of a metric calculator to evaluate the data, if specified.
+        reporter: A Reporter object responsible for generating the final report.
+    """
 
     def __init__(
         self, data_reader_config, output_dir, metric_config=None, aggregator_configs=None, visualizer_configs=None
     ):
+        """Initializes an EvalReporting instance.
+
+        Args:
+            data_reader_config (object): Configuration for the data reader, including class and initialization arguments.
+            output_dir (str): Directory where output files will be written.
+            metric_config (object, optional): Configuration for the metric calculator.
+            aggregator_configs (list, optional): A list of aggregator configurations.
+            visualizer_configs (list, optional): A list of visualizer configurations.
+        """
         super().__init__(output_dir)
         self.data_reader = data_reader_config.class_name(**data_reader_config.init_args)
         self.metric = None
@@ -24,6 +41,14 @@ def __init__(
 
     @classmethod
     def from_config(cls, config):
+        """Creates an EvalReporting instance from a configuration object.
+
+        Args:
+            config (object): An object containing all necessary configuration for EvalReporting.
+
+        Returns:
+            EvalReporting: An initialized instance of EvalReporting.
+        """
         return cls(
             data_reader_config=config.data_reader_config,
             output_dir=config.output_dir,
@@ -33,11 +58,16 @@ def from_config(cls, config):
         )
 
     def run(self):
+        """Executes the reporting pipeline.
+
+        Loads the dataset, evaluates it with the metric (if provided), writes the metric
+        results to a file (if metric_config is specified), and then generates a report.
+        """
         df = self.data_reader.load_dataset()
         metric_result = df
         if self.metric:
             metric_result = self.metric.evaluate(df)
-            # write results in the output directory in a file names metric_resutls.jsonl
+            # write results in the output directory in a file names metric_results.jsonl
             metric_results_file = os.path.join(self.output_dir, "metric_results.jsonl")
             with open(metric_results_file, "w", encoding="utf-8") as writer:
                 for _, row in metric_result.iterrows():
diff --git a/eureka_ml_insights/core/inference.py b/eureka_ml_insights/core/inference.py
index da83f2bd..e82e4f1e 100644
--- a/eureka_ml_insights/core/inference.py
+++ b/eureka_ml_insights/core/inference.py
@@ -1,3 +1,8 @@
+"""This module provides the Inference class, which extends the Component class to handle inference
+processes for a given dataset and model configuration. It supports resuming from existing inference
+results, applying rate limiting, and running parallel inferences.
+"""
+
 import logging
 import os
 import threading
@@ -18,6 +23,8 @@
 
 
 class Inference(Component):
+    """Handles inference processes for a given dataset and model configuration."""
+
     def __init__(
         self,
         model_config: ModelConfig,
@@ -29,17 +36,20 @@ def __init__(
         max_concurrent=1,
         chat_mode=False,
     ):
-        """
-        Initialize the Inference component.
-        args:
-            model_config (dict): ModelConfig object.
-            data_config (dict): DataSetConfig object.
+        """Initializes the Inference component.
+
+        Args:
+            model_config (ModelConfig): Object specifying the model class and initialization arguments.
+            data_config (DataSetConfig): Object specifying the data loader class and initialization arguments.
             output_dir (str): Directory to save the inference results.
-            resume_from (str): optional. Path to the file where previous inference results are stored.
-            new_columns (list): optional. List of new columns to be added to resume_from data to match the current inference response.
-            requests_per_minute (int): optional. Number of inference requests to be made per minute, used for rate limiting. If not provided, rate limiting will not be applied.
-            max_concurrent (int): optional. Maximum number of concurrent inferences to run. Default is 1.
-            chat_mode (bool): optional. If True, the model will be used in chat mode, where a history of messages will be maintained in "previous_messages" column.
+            resume_from (Optional[str]): Path to a file containing previous inference results to resume from.
+            new_columns (Optional[list]): List of new columns to match the current inference response when resuming from old results.
+            requests_per_minute (Optional[int]): Number of inference requests per minute for rate limiting. If None, no rate limiting is applied.
+            max_concurrent (int): Maximum number of concurrent inferences. Defaults to 1.
+            chat_mode (bool): If True, runs in chat mode with a maintained history of messages in the "previous_messages" column.
+
+        Raises:
+            FileNotFoundError: If the provided resume_from file is not found.
         """
         super().__init__(output_dir)
         self.model: Model = model_config.class_name(**model_config.init_args)
@@ -67,6 +77,16 @@ def __init__(
 
     @classmethod
     def from_config(cls, config):
+        """Creates an Inference instance from a provided configuration object.
+
+        Args:
+            config: A configuration object containing attributes:
+                model_config, data_loader_config, output_dir, resume_from,
+                new_columns, requests_per_minute, max_concurrent, and chat_mode.
+
+        Returns:
+            Inference: An instance of the Inference class.
+        """
         return cls(
             config.model_config,
             config.data_loader_config,
@@ -79,40 +99,40 @@ def from_config(cls, config):
         )
 
     def fetch_previous_inference_results(self):
-        """This method loads the contents of the resume_from file and validates if it
-        contains the required columns and keys in alignment with the current model configuration."""
+        """Loads the contents from the resume_from file and validates alignment with the current
+        model configuration.
+
+        Returns:
+            Tuple[pd.DataFrame, int]: A tuple containing:
+                - The DataFrame with previous inference results.
+                - The highest uid value that was processed.
 
+        Raises:
+            ValueError: If the 'model_output' or 'is_valid' columns are not found in the resume_from file.
+            ValueError: If the sample inference call returns invalid results.
+            ValueError: If there is a mismatch between the columns in the resume_from file and the model's expected keys.
+        """
         logging.info(f"Resuming inference from {self.resume_from}")
-        # fetch previous results from the provided resume_from file.
         pre_inf_results_df = DataReader(self.resume_from, format=".jsonl").load_dataset()
 
-        # add new columns listed by the user to the previous inference results, in case we know that the current model will
-        # generate new columns that were not present in the previous results
         if self.new_columns:
             for col in self.new_columns:
                 if col not in pre_inf_results_df.columns:
                     pre_inf_results_df[col] = None
 
-        # validate the resume_from contents both stand-alone and against the current model response keys
         with self.data_loader as loader:
             _, sample_model_input, sample_model_kwargs = loader.get_sample_model_input()
             sample_data_keys = loader.reader.read().keys()
 
-            # verify that "model_output" and "is_valid" columns are present
             if "model_output" not in pre_inf_results_df.columns or "is_valid" not in pre_inf_results_df.columns:
                 raise ValueError("Columns 'model_output' and 'is_valid' are required in the resume_from file.")
 
-            # perform a sample inference call to get the model output keys and validate the resume_from contents
             sample_response_dict = self.model.generate(*sample_model_input, **sample_model_kwargs)
             if not sample_response_dict["is_valid"]:
                 raise ValueError(
                     "Sample inference call for resume_from returned invalid results, please check the model configuration."
                 )
-            # check if the inference response dictionary contains the same keys as the resume_from file
             eventual_keys = set(sample_response_dict.keys()) | set(sample_data_keys)
-
-            # in case of resuming from a file that was generated by an older version of the model,
-            # we let the discrepancy in the reserved keys slide and later set the missing keys to None
             match_keys = set(pre_inf_results_df.columns) | set(INFERENCE_RESERVED_NAMES)
 
             if set(eventual_keys) != match_keys:
@@ -122,21 +142,32 @@ def fetch_previous_inference_results(self):
                     f"Problemtaic columns: {diff}"
                 )
 
-        # find the last uid that was inferenced
         last_uid = pre_inf_results_df["uid"].astype(int).max()
         logging.info(f"Last uid inferenced: {last_uid}")
         return pre_inf_results_df, last_uid
 
     def validate_response_dict(self, response_dict):
-        # Validate that the response dictionary contains the required fields
-        # "model_output" and "is_valid" are mandatory fields to be returned by any model
+        """Validates that the response dictionary contains the mandatory fields 'model_output'
+        and 'is_valid'.
+
+        Args:
+            response_dict (dict): The response dictionary returned by the model.
+
+        Raises:
+            ValueError: If 'model_output' or 'is_valid' is missing.
+        """
         if "model_output" not in response_dict or "is_valid" not in response_dict:
             raise ValueError("Response dictionary must contain 'model_output' and 'is_valid' keys.")
 
     def retrieve_exisiting_result(self, data, pre_inf_results_df):
-        """Finds the previous result for the given data point from the pre_inf_results_df and returns it if it is valid
-        data: dict, data point to be inferenced
-        pre_inf_results_df: pd.DataFrame, previous inference results
+        """Retrieves a valid previous inference result for a given data record.
+
+        Args:
+            data (dict): The data record to be inferenced.
+            pre_inf_results_df (pd.DataFrame): The DataFrame containing previous inference results.
+
+        Returns:
+            dict or None: The updated data record with previous model outputs if valid, otherwise None.
         """
         prev_results = pre_inf_results_df[pre_inf_results_df.uid == data["uid"]]
         if prev_results.empty:
@@ -167,7 +198,6 @@ def retrieve_exisiting_result(self, data, pre_inf_results_df):
                 prev_model_tokens,
                 prev_model_time,
             )
-            # add remaining pre_inf_results_df columns to the data point, making sure to update the previous_messages column
             for col in pre_inf_results_df.columns:
                 if col not in data or col == "previous_messages":
                     data[col] = prev_results[col].values[0]
@@ -175,8 +205,14 @@ def retrieve_exisiting_result(self, data, pre_inf_results_df):
             return data
 
     def run(self):
+        """Executes the inference process, optionally resuming from previous results, and writes
+        the final results to the output file.
+
+        This method manages parallel execution with a ThreadPoolExecutor, performs rate limiting
+        if specified, and tracks progress with a tqdm progress bar.
+        """
         with self.appender as appender:
-            pass  # create the output file. This will guarantee that an empty inference_result.jsonl file is created even if no inference is run.
+            pass
         if self.resume_from:
             self.pre_inf_results_df, self.last_uid = self.fetch_previous_inference_results()
         with self.data_loader as loader, ThreadPoolExecutor(max_workers=self.max_concurrent) as executor:
@@ -187,13 +223,28 @@ def run(self):
                     self._append_threadsafe(result)
 
     def _append_threadsafe(self, data):
+        """Appends inference results to the output file in a thread-safe manner.
+
+        Args:
+            data (dict): The inference result data to be appended.
+        """
         with self.writer_lock:
             with self.appender as appender:
                 appender.write(data)
 
     def _run_single(self, record: tuple[dict, tuple, dict]):
-        """Runs model.generate() with respect to a single element of the dataloader."""
+        """Runs the model's generate method for a single data record from the data loader.
 
+        Args:
+            record (Tuple[dict, tuple, dict]): A tuple containing:
+                - The data record (dict).
+                - The model's positional arguments (tuple).
+                - The model's keyword arguments (dict).
+
+        Returns:
+            dict or None: The updated data record with inference results, or None if inference was
+            skipped or the record is invalid.
+        """
         data, model_args, model_kwargs = record
         if self.chat_mode and data.get("is_valid", True) is False:
             return None
@@ -202,14 +253,11 @@ def _run_single(self, record: tuple[dict, tuple, dict]):
             if prev_result:
                 return prev_result
 
-        # Rate limiter -- only for sequential inference
         if self.requests_per_minute and self.max_concurrent == 1:
             while len(self.request_times) >= self.requests_per_minute:
-                # remove the oldest request time if it is older than the rate limit period
                 if time.time() - self.request_times[0] > self.period:
                     self.request_times.popleft()
                 else:
-                    # rate limit is reached, wait for a second
                     time.sleep(1)
             self.request_times.append(time.time())
 
diff --git a/eureka_ml_insights/core/pipeline.py b/eureka_ml_insights/core/pipeline.py
index 22f22a31..944ab944 100644
--- a/eureka_ml_insights/core/pipeline.py
+++ b/eureka_ml_insights/core/pipeline.py
@@ -1,3 +1,7 @@
+"""This module defines an abstract base class for pipeline components and a pipeline class
+that orchestrates the execution of these components sequentially.
+"""
+
 import os
 import pathlib
 from abc import ABC, abstractmethod
@@ -10,7 +14,26 @@
 
 
 class Component(ABC):
+    """Abstract base class for pipeline components.
+
+    This class provides the basic interface for all components in the pipeline,
+    ensuring they define a 'run' method and can be instantiated from a config.
+
+    Args:
+        output_dir (PathLike): The directory where the component's output should be stored.
+        **_ (Dict[str, Any]): Additional keyword arguments.
+    """
+
     def __init__(self, output_dir: PathLike, **_: Dict[str, Any]):
+        """Initializes the component.
+
+        Args:
+            output_dir (PathLike): The output directory for the component.
+            **_ (Dict[str, Any]): Additional parameters.
+
+        Raises:
+            FileExistsError: If the output directory already exists.
+        """
         self.output_dir = output_dir
         # create the component output directory
         if not os.path.exists(self.output_dir):
@@ -20,20 +43,50 @@ def __init__(self, output_dir: PathLike, **_: Dict[str, Any]):
 
     @abstractmethod
     def run(self) -> None:
+        """Executes the component's main functionality.
+
+        Raises:
+            NotImplementedError: This method must be implemented in subclasses.
+        """
         raise NotImplementedError
 
     @classmethod
     def from_config(cls: Type[T], config: ComponentConfig) -> T:
+        """Creates an instance of the component from the provided configuration.
+
+        Args:
+            config (ComponentConfig): The configuration for the component.
+
+        Returns:
+            T: An instance of the component.
+
+        Raises:
+            NotImplementedError: This method must be implemented in subclasses.
+        """
         raise NotImplementedError
 
 
 class Pipeline:
+    """A pipeline that executes a list of components in sequence.
+
+    Args:
+        component_configs (List[ComponentConfig]): A list of component configurations.
+        log_dir (str): The directory for pipeline logs.
+    """
+
     def __init__(self, component_configs: List[ComponentConfig], log_dir: str):
+        """Initializes the pipeline with the given component configurations and log directory.
+
+        Args:
+            component_configs (List[ComponentConfig]): A list of configurations for pipeline components.
+            log_dir (str): The directory where pipeline logs will be stored.
+        """
         self.log_dir = log_dir
         self.components: List[Component] = []
         for config in component_configs:
             self.components.append(config.component_type.from_config(config))
 
     def run(self) -> None:
+        """Runs all the components in the pipeline in order."""
         for component in self.components:
             component.run()
diff --git a/eureka_ml_insights/core/prompt_processing.py b/eureka_ml_insights/core/prompt_processing.py
index 01eede26..2e02cda5 100644
--- a/eureka_ml_insights/core/prompt_processing.py
+++ b/eureka_ml_insights/core/prompt_processing.py
@@ -1,3 +1,7 @@
+"""This module provides functionalities for prompt processing, including a function to compute an MD5 hash and
+a class that extends DataProcessing to handle prompt generation workflows.
+"""
+
 import logging
 import os
 from hashlib import md5
@@ -10,15 +14,30 @@
 
 
 def compute_hash(val: str) -> str:
-    """
-    Hashes the provided value using MD5.
+    """Compute the MD5 hash of a given string.
+
+    Args:
+        val (str): The value to be hashed.
+
+    Returns:
+        str: The MD5 hash of the given value.
     """
     return md5(val.encode("utf-8")).hexdigest()
 
 
 class PromptProcessing(DataProcessing):
+    """Handles the prompt generation workflow by extending DataProcessing."""
+
     @classmethod
     def from_config(cls, config):
+        """Create a PromptProcessing instance from a configuration object.
+
+        Args:
+            config: The configuration object.
+
+        Returns:
+            PromptProcessing: A new PromptProcessing instance.
+        """
         return cls(
             config.data_reader_config,
             config.output_dir,
@@ -35,14 +54,15 @@ def __init__(
         prompt_template_path: Optional[str] = None,
         ignore_failure: bool = False,
     ) -> None:
-        """
-        args:
-            data_reader_config: DataReaderConfig
-            prompt_template_path: str path to the prompt template .jinja file.
-            output_dir: str directory to save the output files of this component.
-            output_data_columns: Optional[List[str]] list of columns (subset of input columns)
-                                      to keep in the transformed data output file.
-            ignore_failure: bool whether to ignore failure in prompt generation or not.
+        """Initialize the PromptProcessing object.
+
+        Args:
+            data_reader_config: DataReaderConfig object that specifies the data reading configuration.
+            output_dir (str): Directory to save the output files of this component.
+            output_data_columns (Optional[List[str]]): A list of columns (subset of input columns) to keep in
+                the transformed data output file.
+            prompt_template_path (Optional[str]): Path to the prompt template .jinja file.
+            ignore_failure (bool): Whether to ignore failure in prompt generation or not.
         """
         super().__init__(data_reader_config, output_dir, output_data_columns)
         self.ignore_failure = ignore_failure
@@ -53,6 +73,14 @@ def __init__(
             self.prompt_data_processor = JinjaPromptTemplate(prompt_template_path)
 
     def run(self) -> None:
+        """Execute the prompt processing workflow.
+
+        Loads input data, optionally uses a Jinja template to create prompts, and writes
+        output data with generated prompts and their hashes to the output directory.
+
+        Raises:
+            Exception: If an error occurs during prompt creation and ignore_failure is False.
+        """
         # data reader loads data into a pandas dataframe and applies any transformations
         input_df = self.data_reader.load_dataset()
         logging.info(f"input has: {len(input_df)} rows, and the columns are: {input_df.columns}.")
diff --git a/eureka_ml_insights/core/reserved_names.py b/eureka_ml_insights/core/reserved_names.py
index 4dd2b4d9..81544eee 100644
--- a/eureka_ml_insights/core/reserved_names.py
+++ b/eureka_ml_insights/core/reserved_names.py
@@ -1,3 +1,20 @@
+"""This module defines reserved column names used by Eureka for inference and prompt processing.
+
+The variables in this module list the columns that may be removed or overwritten by Eureka if present
+in the data.
+"""
+
 # if your data has any of these columns, they may be removed or overwritten by Eureka
 INFERENCE_RESERVED_NAMES = ["model_output", "is_valid", "response_time", "n_output_tokens"]
-PROMPT_PROC_RESERVED_NAMES = ["prompt_hash", "prompt", "uid", "data_point_id", "data_repeat_id", "__hf_task", "__hf_split"]
+"""list of str: Reserved column names used for inference outputs and status."""
+
+PROMPT_PROC_RESERVED_NAMES = [
+    "prompt_hash",
+    "prompt",
+    "uid",
+    "data_point_id",
+    "data_repeat_id",
+    "__hf_task",
+    "__hf_split",
+]
+"""list of str: Reserved column names used for prompt processing."""
diff --git a/eureka_ml_insights/data_utils/aime_utils.py b/eureka_ml_insights/data_utils/aime_utils.py
index d36742cd..9372829d 100644
--- a/eureka_ml_insights/data_utils/aime_utils.py
+++ b/eureka_ml_insights/data_utils/aime_utils.py
@@ -5,24 +5,44 @@
 
 from .transform import DFTransformBase
 
+"""This module provides classes and functions that extract numeric answers from textual model 
+output for AIME questions."""
+
 
 @dataclass
 class AIMEExtractAnswer(DFTransformBase):
+    """A data transformation class that extracts an AIME answer from the model's output.
+
+    Attributes:
+        model_output_column (str): The name of the column in the DataFrame containing the model output.
+        model_answer_column (str): The name of the column in the DataFrame where the extracted answer
+            will be stored.
+    """
+
     model_output_column: str
     model_answer_column: str
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """Transforms the given DataFrame by extracting the AIME answer from the model's output column.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame to be transformed.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame with the extracted answers in the specified column.
+        """
         df[self.model_answer_column] = df[self.model_output_column].apply(self.parse_output_answer)
         return df
 
     @staticmethod
     def parse_output_answer(response):
-        """
-        Parse the input string to extract answer of a given AIME question.
-        Parameters:
-            response (str): Input string containing answer X in the form of "Final Answer: X".
+        """Parses the input string to extract the answer of a given AIME question.
+
+        Args:
+            response (str): Input string containing the answer in the form of "Final Answer: X".
+
         Returns:
-            numerical_value (float): A numeric value representing the model's answer.
+            float: A numeric value representing the model's answer.
         """
         numerical_value = None
 
diff --git a/eureka_ml_insights/data_utils/ba_calendar_utils.py b/eureka_ml_insights/data_utils/ba_calendar_utils.py
index 3d24d62e..28b117af 100644
--- a/eureka_ml_insights/data_utils/ba_calendar_utils.py
+++ b/eureka_ml_insights/data_utils/ba_calendar_utils.py
@@ -1,3 +1,8 @@
+"""
+This module provides the BA_Calendar_ExtractAnswer class, which transforms a DataFrame
+to extract answers from a model output column for BA Calendar problems.
+"""
+
 import re
 from dataclasses import dataclass
 
@@ -8,31 +13,51 @@
 
 @dataclass
 class BA_Calendar_ExtractAnswer(DFTransformBase):
+    """Extracts answers from a model output column in a DataFrame.
+
+    This class extends DFTransformBase, providing a transformation method to parse
+    and extract BA Calendar problem answers from a specified output column.
+
+    Attributes:
+        model_output_column (str): The name of the column containing the model output.
+        model_answer_column (str): The name of the column that will store the parsed answers.
+    """
+
     model_output_column: str
     model_answer_column: str
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """Transforms the given DataFrame by parsing the model output column to extract answers.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame containing the model output column.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame with a new column containing the parsed answers.
+        """
         df[self.model_answer_column] = df[self.model_output_column].apply(self.parse_output_answer)
         return df
 
     @staticmethod
     def parse_output_answer(response):
-        """
-        Parse the input string to extract answer of a given BA Calendar problems.
-        Parameters:
-            response (str): Input string containing answer X in the form of "Final Answer: X".
-        Returns: 
-            numerical_value (float): A numeric value representing the model's answer.
+        """Parses the input string to extract an answer for BA Calendar problems.
+
+        Args:
+            response (str): The input string containing the answer in the form of "Final Answer: X".
+
+        Returns:
+            str: A string representing the extracted answer. If no valid answer is found,
+            returns an empty string or "No common time slot available" if applicable.
         """
         answer = ""
 
         if response is None:
             return ""
-        
+
         response = response.replace("**", "").replace("\n", "")
 
         match = re.findall(r"Final Answer:\s*(\w+ \d{2}:\d{2}-\d{2}:\d{2})", response)
-        
+
         if match:
             answer = match[len(match) - 1]
         elif "No common time slot available".lower() in response.lower():
diff --git a/eureka_ml_insights/data_utils/data.py b/eureka_ml_insights/data_utils/data.py
index 297a5250..b506b94a 100644
--- a/eureka_ml_insights/data_utils/data.py
+++ b/eureka_ml_insights/data_utils/data.py
@@ -1,3 +1,7 @@
+"""Module containing data loader classes, reader/writer classes, and supporting utilities
+for local or remote data ingestion and consumption.
+"""
+
 import json
 import logging
 import os
@@ -23,22 +27,48 @@
 
 
 class AzureStorageLogger:
+    """Provides a logger for Azure Blob Storage operations."""
+
     def get_logger(self, level=logging.WARNING):
+        """Returns a logger for Azure Blob Storage.
+
+        Args:
+            level: The logging level to set for the logger.
+
+        Returns:
+            logging.Logger: The configured logger instance.
+        """
         logger = logging.getLogger("azure.storage")
         logger.setLevel(level)
         return logger
 
 
 class DataReaderBase(ABC):
+    """Base class for data readers."""
+
     @abstractmethod
     def read(self):
+        """Abstract method to read data from a source.
+
+        Raises:
+            NotImplementedError: If not implemented in a subclass.
+        """
         raise NotImplementedError
 
 
 class DataLoader:
-    """Dataloaders are used to feed data to models in inference time from LOCAL data sources."""
+    """Loads data from local sources for model inference.
+
+    DataLoader is used to feed data to models at inference time from local data sources.
+    """
 
     def __init__(self, path, total_lines=None):
+        """Initializes DataLoader.
+
+        Args:
+            path (str): The path to the data file.
+            total_lines (int, optional): The total number of lines in the data file. Defaults to None.
+        """
         self.path = path
         self.total_lines = total_lines
 
@@ -61,19 +91,34 @@ def __iter__(self):
             yield self.prepare_model_input(data)
 
     def prepare_model_input(self, row):
+        """Prepares a single row of data for model input.
+
+        Args:
+            row (dict): A single data row from the JSON lines file.
+
+        Returns:
+            tuple: A tuple containing the data row, model arguments, and model keyword arguments.
+        """
         query_text = row["prompt"]
         model_args = (query_text,)
         model_kwargs = {}
         return row, model_args, model_kwargs
 
     def get_sample_model_input(self):
-        """Get a sample data row and model_args from the jsonlines reader."""
+        """Gets a sample data row and model arguments from the JSON lines reader.
+
+        Returns:
+            tuple: A tuple containing the sample data row, model arguments, and model keyword arguments.
+        """
         row = next(self.reader.iter(skip_empty=True, skip_invalid=True))
         return self.prepare_model_input(row)
 
 
 class MMDataLoader(DataLoader):
-    """This data loader class is a base class for those that allow for loading images that are referenced in the local dataset."""
+    """Base class for loading images from a local dataset for multimodal tasks.
+
+    MMDataLoader supports loading images that are referenced in the local dataset in addition to textual data.
+    """
 
     def __init__(
         self,
@@ -85,43 +130,47 @@ def __init__(
         image_column_search_regex: str = "image",
         misc_columns: List[str] = None,
     ):
+        """Initializes an MMDataLoader.
+
+        Args:
+            path (str): Local path to the dataset JSONL file.
+            mm_data_path_prefix (str, optional): Local path prefix to prepend to the image file names in the JSONL file.
+            total_lines (int, optional): The total number of lines in the dataset file. Defaults to None.
+            load_images (bool, optional): Whether images should be loaded. Defaults to True.
+            image_column_names (List[str], optional): Names of columns containing images. Defaults to None.
+            image_column_search_regex (str, optional): Regex pattern used to identify image columns. Defaults to "image".
+            misc_columns (List[str], optional): Names of other columns to include in model inputs. Defaults to None.
+        """
         super().__init__(path, total_lines)
         self.mm_data_path_prefix = mm_data_path_prefix
         self.load_images = load_images
         self.image_column_names = image_column_names
         self.image_column_search_regex = image_column_search_regex
         self.misc_columns = misc_columns
-        """
-        Initializes an MMDataLoader.
-        args:
-            path: str, local path to the dataset jsonl file.
-            mm_data_path_prefix: str, local path prefix that will be prepended to the mm data location (e.g. image file name) stored in the jsonl file.
-            total_lines: option int, the number of lines of the dataset file.
-            load_images: optional bool, specify if images should be loaded.
-            image_column_names: optional List of str, names of columns that have images in them.
-            image_column_search_regex: optional Regex str, to search for which columns have images in them.
-            misc_columns: optional List of str, names of other columns from the data to include in the model inputs.
-        """
 
     def prepare_model_input(self, row):
-        # Given a row from the jsonl file, prepare the data for the model.
+        """Prepares a single row of data (including images if required) for model input.
+
+        Args:
+            row (dict): A data record from the JSON lines file.
+
+        Returns:
+            tuple: A tuple containing the data row, model arguments, and model keyword arguments.
+        """
         query_text = row["prompt"]
         model_args = (query_text,)
 
-        # if images are present load them
         if self.load_images:
-            # if the user passed in a list of image column names when creating the class, use it
             if self.image_column_names:
                 image_column_names = self.image_column_names
-            # otherwise search for the image columns
             else:
                 image_column_names = self._search_for_image_columns(row)
 
-            # if found load them from disk and add to model inputs
             if image_column_names:
                 images = self._gather_image_file_names(row, image_column_names)
                 query_images = self._load_images(images)
                 model_args = (query_text, query_images)
+
         model_kwargs = {}
         if self.misc_columns:
             for column in self.misc_columns:
@@ -129,28 +178,23 @@ def prepare_model_input(self, row):
         return row, model_args, model_kwargs
 
     def _search_for_image_columns(self, data_row) -> list:
+        """Searches for columns that contain images in a data row.
+
+        Args:
+            data_row (dict): A data record from the JSONL file.
+
+        Returns:
+            list: A list of column names that contain images.
         """
-        Search for columns that contain images for a datarow.
-        args:
-            data_row: dict, a record (row) from the jsonl
-        returns:
-            image_columns: List of str, column names.
-        """
-        # look in the data to see if it was stored
         if "image_column_names" in data_row:
-            # fetch which columns are images
             image_column_names = data_row["image_column_names"]
-        # if not stored try to search for the columns
         else:
             if self.image_column_search_regex:
-
                 image_column_names = []
-
                 for col_name in data_row.keys():
                     match_obj = re.match(self.image_column_search_regex, col_name)
                     if match_obj:
                         image_column_names.append(col_name)
-
             else:
                 log.warning(
                     "No image search string was provided, and no image columns were found. Thus no images will be loaded."
@@ -160,22 +204,21 @@ def _search_for_image_columns(self, data_row) -> list:
         return image_column_names
 
     def _gather_image_file_names(self, data, image_column_names: List[str] | str) -> list:
+        """Gets all image file names from a data record.
+
+        Args:
+            data (dict): A data record from the JSONL file.
+            image_column_names (List[str] | str): Names of columns that contain images.
+
+        Returns:
+            list: A list of image file paths (or a single path if not a list).
         """
-        Get all image file names from the data dict and return as a list.
-        args:
-            image_column_names: List of str or str, names of column(s) that have images in them.
-        returns:
-            images: List of str or str, images path(s).
-        """
-        # if is not a list, get the single image file name with the column name
         if not isinstance(image_column_names, list):
             images = data[image_column_names]
         else:
             if len(image_column_names) == 1:
-                # some datasets store multiple images in one column as a list
                 images = data[image_column_names[0]]
             else:
-                # some datasets store multiple images in multiple columns
                 images = [
                     data[image_column_name] for image_column_name in image_column_names if (data[image_column_name])
                 ]
@@ -186,14 +229,14 @@ def _gather_image_file_names(self, data, image_column_names: List[str] | str) ->
         return images
 
     def _load_images(self, images: List[str] | str) -> list:
+        """Loads image files from local paths.
+
+        Args:
+            images (List[str] | str): One or more image file paths.
+
+        Returns:
+            list: A list of loaded PIL Images.
         """
-        Load images files with load_image.
-        args:
-            images: List of str or str, images path(s).
-        returns:
-            query_images: List of PIL Images.
-        """
-        # if is not a list, make it a list
         if not isinstance(images, list):
             images = [images]
 
@@ -205,26 +248,33 @@ def _load_images(self, images: List[str] | str) -> list:
         return query_images
 
     def load_image(self, image_file_name):
+        """Loads an image file from a local path.
+
+        Args:
+            image_file_name (str): The image file path.
+
+        Returns:
+            PIL.Image.Image: The loaded image as a PIL Image.
         """
-        Load image file from local path.
-        args:
-            image_file_name: str, images path.
-        returns:
-            query_image: PIL Image.
-        """
-        # prepend the local path prefix
         full_image_file_path = os.path.join(self.mm_data_path_prefix, image_file_name)
         query_image = Image.open(full_image_file_path).convert("RGB")
         return query_image
 
 
 class AzureDataAuthenticator:
+    """Handles Azure authentication parameters for blob storage."""
+
     def get_query_string(self, query_string=None, secret_key_params=None):
-        """
-        One of the two arguments must be provided.
-        args:
-            query_string: str, query string to authenticate with Azure Blob Storage.
-            secret_key_params: dict, dictionary containing the paramters to call get_secret with.
+        """Obtains an Azure query string for authentication.
+
+        One of the two arguments (query_string or secret_key_params) must be provided.
+
+        Args:
+            query_string (str, optional): The query string to authenticate with Azure Blob Storage. Defaults to None.
+            secret_key_params (dict, optional): Parameters for retrieving the query string from secrets. Defaults to None.
+
+        Raises:
+            ValueError: If neither query_string nor secret_key_params are provided.
         """
         self.query_string = query_string
         self.secret_key_params = secret_key_params
@@ -235,7 +285,10 @@ def get_query_string(self, query_string=None, secret_key_params=None):
 
 
 class AzureMMDataLoader(MMDataLoader):
-    """This data loader allows for loading images that are referenced in the local dataset from Azure Blob Storage."""
+    """Allows for image loading from Azure Blob Storage based on references in a local dataset.
+
+    AzureMMDataLoader extends MMDataLoader for loading images from an Azure Blob Storage container.
+    """
 
     def __init__(
         self,
@@ -247,16 +300,16 @@ def __init__(
         image_column_search_regex="image",
         misc_columns=None,
     ):
-        """
-        Initializes an AzureMMDataLoader.
-        args:
-            path: str, The Azure storage account URL.
-            account_url: str, The Azure storage account URL.
-            blob_container: str, Azure storage container name.
-            total_lines: option int, the number of lines of the dataset file.
-            image_column_names: optional List of str, names of columns that have images in them.
-            image_column_search_regex: optional Regex str, to search for which columns have images in them.
-            misc_columns: optional List of str, names of other columns from the data to include in the model inputs.
+        """Initializes AzureMMDataLoader.
+
+        Args:
+            path (str): The local path or reference to the dataset JSONL file.
+            account_url (str): The Azure storage account URL.
+            blob_container (str): The Azure storage container name.
+            total_lines (int, optional): The number of lines in the dataset file. Defaults to None.
+            image_column_names (list, optional): The names of columns containing image references. Defaults to None.
+            image_column_search_regex (str, optional): The regex string used to find image columns. Defaults to "image".
+            misc_columns (list, optional): The names of other columns to include in model inputs. Defaults to None.
         """
         super().__init__(
             path,
@@ -276,15 +329,30 @@ def __init__(
         )
 
     def load_image(self, image_file_name):
+        """Loads an image from Azure Blob Storage.
+
+        Args:
+            image_file_name (str): The name of the image file in the blob storage.
+
+        Returns:
+            PIL.Image.Image: The loaded image as a PIL Image.
+        """
         image_bytes = self.container_client.download_blob(image_file_name).readall()
         query_image = Image.open(BytesIO(image_bytes)).convert("RGB")
         return query_image
 
 
 class JsonLinesWriter:
+    """Writes data to a JSON lines file."""
+
     def __init__(self, out_path, mode="w"):
+        """Initializes a JsonLinesWriter.
+
+        Args:
+            out_path (str): The output file path.
+            mode (str, optional): The file open mode. Defaults to "w".
+        """
         self.out_path = out_path
-        # if the directory does not exist, create it
         directory = os.path.dirname(out_path)
         if not os.path.exists(directory):
             os.makedirs(directory)
@@ -300,16 +368,27 @@ def __exit__(self, exc_type, exc_value, traceback):
 
 
 class JsonReader(DataReaderBase):
-    """
-    This is a DataReader that loads a json or jsonl data file from a local path.
-    """
+    """Reads JSON or JSONL data from a local file."""
 
     def __init__(self, path):
+        """Initializes a JsonReader.
+
+        Args:
+            path (str): The local path to the JSON or JSONL file.
+        """
         self.path = path
         _, ext = os.path.splitext(self.path)
         self.format = ext.lower()
 
     def read(self):
+        """Reads the data from the file.
+
+        Returns:
+            list or dict: The data read from the file.
+
+        Raises:
+            ValueError: If the file format is not .json or .jsonl.
+        """
         if self.format == ".json":
             with open(self.path, mode="r") as reader:
                 data = json.load(reader)
@@ -322,26 +401,25 @@ def read(self):
 
 
 class AzureBlobReader:
-    """Reads an Azure storage blob from a full URL to a str"""
+    """Provides functionality to read data from an Azure Storage blob via a full blob URL."""
 
     def read_azure_blob(self, blob_url) -> str:
-        """
-        Reads an Azure storage blob..
-        args:
-            blob_url: str, The Azure storage blob full URL.
+        """Reads content from an Azure Storage blob.
+
+        Args:
+            blob_url (str): The full URL of the Azure blob.
+
+        Returns:
+            str: The content of the blob as a string.
         """
         blob_client = BlobClient.from_blob_url(blob_url, credential=DefaultAzureCredential(), logger=self.logger)
-        # real all the bytes from the blob
         file = blob_client.download_blob().readall()
         file = file.decode("utf-8")
         return file
 
 
 class AzureJsonReader(JsonReader, AzureBlobReader):
-    """
-    This is a Azure storage blob DataReader that loads a json data
-    file hosted on a blob and returns the contents as a dict.
-    """
+    """Reads JSON or JSONL data from an Azure Storage blob and returns the content as a dict."""
 
     def __init__(
         self,
@@ -349,18 +427,26 @@ def __init__(
         blob_container: str,
         blob_name: str,
     ):
-        """
-        Initializes an AzureJsonReader.
-        args:
-            account_url: str, The Azure storage account URL.
-            blob_container: str, Azure storage container name.
-            blob_name: str, Azure storage blob name.
+        """Initializes an AzureJsonReader.
+
+        Args:
+            account_url (str): The Azure storage account URL.
+            blob_container (str): The Azure storage container name.
+            blob_name (str): The Azure storage blob name.
         """
         self.blob_url = f"{account_url}/{blob_container}/{blob_name}"
         super().__init__(self.blob_url)
         self.logger = AzureStorageLogger().get_logger()
 
     def read(self) -> dict:
+        """Reads the data from the Azure Storage blob.
+
+        Returns:
+            dict or jsonlines.Reader: The data loaded from the blob.
+
+        Raises:
+            ValueError: If the file format is not .json or .jsonl.
+        """
         file = super().read_azure_blob(self.blob_url)
         if self.format == ".json":
             data = json.loads(file)
@@ -372,17 +458,15 @@ def read(self) -> dict:
 
 
 class HFJsonReader(JsonReader):
-    """
-    This is a DataReader that loads a json or jsonl data file from HuggingFace.
-    """
+    """Reads JSON or JSONL data from Hugging Face repositories."""
 
     def __init__(self, repo_id, repo_type, filename):
-        """
-        Initializes an HFJsonReader.
-        args:
-            repo_id: str, The HF repo id.
-            repo_type: str, The HF repo_type.
-            filename: str, The HF filename.
+        """Initializes an HFJsonReader.
+
+        Args:
+            repo_id (str): The Hugging Face repository ID.
+            repo_type (str): The type of the repository (e.g., dataset).
+            filename (str): The name of the file in the repository.
         """
         from huggingface_hub import hf_hub_download
 
@@ -391,19 +475,38 @@ def __init__(self, repo_id, repo_type, filename):
 
 
 class Writer:
+    """Base class for writing data to files."""
+
     def __init__(self, out_path):
+        """Initializes a Writer.
+
+        Args:
+            out_path (str): The output file path.
+        """
         self.out_path = out_path
-        # if the directory does not exist, create it
         directory = os.path.dirname(out_path)
         if not os.path.exists(directory):
             os.makedirs(directory)
 
     def write(self, data):
-        pass
+        """Writes data to a file.
+
+        This method should be implemented by subclasses.
+
+        Args:
+            data: The data to be written.
+        """
 
 
 class TXTWriter(Writer):
+    """Writes data to a text file."""
+
     def write(self, data):
+        """Writes data to a text file.
+
+        Args:
+            data (list, dict, or str): The data to be written.
+        """
         with open(self.out_path, "w") as f:
             if isinstance(data, list):
                 for item in data:
@@ -416,7 +519,10 @@ def write(self, data):
 
 
 class DataReader:
-    """This is the base DataReader that loads a jsonl data file from a local path"""
+    """Base class that loads data from a local file in various formats (Parquet, CSV, JSONL).
+
+    It can optionally apply a transform to the loaded data.
+    """
 
     def __init__(
         self,
@@ -425,13 +531,13 @@ def __init__(
         transform: Optional[DFTransformBase] = None,
         **kwargs,
     ):
-        """
-        Initializes an DataReader.
-        args:
-            path: str, local path to the dataset file.
-            format: optional str, to specify file format (parquet, csv, and jsonl).
-            transform: optional Transform, to apply after loading.
-            kwargs: addtional arguments.
+        """Initializes a DataReader.
+
+        Args:
+            path (str): The local path to the dataset file.
+            format (str, optional): The file format (e.g., .parquet, .csv, .jsonl). Defaults to None.
+            transform (DFTransformBase, optional): A transform to apply after loading data. Defaults to None.
+            **kwargs: Additional keyword arguments for the data reading process.
         """
         self.path = path
         _, ext = os.path.splitext(self.path)
@@ -443,6 +549,11 @@ def __init__(
         self.kwargs = kwargs
 
     def load_dataset(self) -> pd.DataFrame:
+        """Loads the dataset into a pandas DataFrame.
+
+        Returns:
+            pd.DataFrame: The loaded and transformed dataset.
+        """
         df = self._load_dataset()
         df = df.reset_index(drop=True)
         log.info(f"Loaded dataset with shape: {df.shape}")
@@ -453,10 +564,15 @@ def load_dataset(self) -> pd.DataFrame:
         return df
 
     def _load_dataset(self) -> pd.DataFrame:
+        """Loads the dataset from the specified path based on the file format.
+
+        Returns:
+            pd.DataFrame: The dataset as a DataFrame.
+        """
         if self.format == ".parquet":
             log.info(f"Loading Parquet Data From {self.path}.")
             df = pd.read_parquet(self.path, **self.kwargs)
-        elif self.format == ".csv":  # TODO: remove
+        elif self.format == ".csv":
             log.info(f"Loading CSV Data From {self.path}.")
             df = pd.read_csv(self.path, **self.kwargs)
         elif self.format == ".jsonl":
@@ -469,7 +585,7 @@ def _load_dataset(self) -> pd.DataFrame:
 
 
 class AzureDataReader(DataReader, AzureBlobReader):
-    """This is a Azure storage blob DataReader that loads a jsonl data file hosted on a blob."""
+    """Loads JSONL data from an Azure Blob Storage URL into a pandas DataFrame."""
 
     def __init__(
         self,
@@ -480,21 +596,29 @@ def __init__(
         transform: Optional[DFTransformBase] = None,
         **kwargs,
     ):
-        """
-        Initializes an AzureDataReader.
-        args:
-            account_url: str, The Azure storage account URL.
-            blob_container: str ,Azure storage container name.
-            blob_name: str, Azure storage blob name.
-            format: optional str, specifies file format (only jsonl currently supported).
-            transform: optional Transform, to apply after loading.
-            kwargs: addtional arguments.
+        """Initializes an AzureDataReader.
+
+        Args:
+            account_url (str): The Azure storage account URL.
+            blob_container (str): The Azure storage container name.
+            blob_name (str): The Azure storage blob name.
+            format (str, optional): The file format (only .jsonl is currently supported). Defaults to None.
+            transform (DFTransformBase, optional): A transform to apply after loading data. Defaults to None.
+            **kwargs: Additional keyword arguments for the data reading process.
         """
         self.blob_url = f"{account_url}/{blob_container}/{blob_name}"
         super().__init__(self.blob_url, format, transform, **kwargs)
         self.logger = AzureStorageLogger().get_logger()
 
     def _load_dataset(self) -> pd.DataFrame:
+        """Loads JSONL data from the Azure Blob Storage specified by the blob URL.
+
+        Returns:
+            pd.DataFrame: The loaded dataset as a DataFrame.
+
+        Raises:
+            ValueError: If the file format is not .jsonl.
+        """
         file = super().read_azure_blob(self.blob_url)
         if self.format == ".jsonl":
             jlr = jsonlines.Reader(file.splitlines(), loads=json.loads)
@@ -505,8 +629,11 @@ def _load_dataset(self) -> pd.DataFrame:
 
 
 class HFDataReader(DataReader):
-    """This is a HuggingFace DataReader that downloads data hosted on HuggingFace to infer on
-    using HF load_dataset method."""
+    """DataReader that leverages Hugging Face datasets for loading data.
+
+    It can download data from Hugging Face or load from a local HF dataset directory,
+    then convert the data to a pandas DataFrame.
+    """
 
     def __init__(
         self,
@@ -518,16 +645,16 @@ def __init__(
         load_data_from_disk: bool = False,
         **kwargs,
     ):
-        """
-        Initializes a HFDataReader.
-        args:
-            path: str, Huggingface specific path.
-            split: optional str or List of str, names of splits (e.g., val, test,...).
-            tasks: optional str or List of str, names of tasks (e.g., Math, Art,...).
-                (This is passed into load_dataset parameter 'name' — dataset configuration name.)
-            transform: optional list of Transforms, to apply after loading.
-            cache_dir: optional str, local cache path.
-            load_data_from_disk: optional bool, if True, load the Huggingface dataset from specified local path.
+        """Initializes an HFDataReader.
+
+        Args:
+            path (str): The Hugging Face dataset path or repository ID.
+            split (str or list, optional): The split(s) of the dataset to load. Defaults to "test".
+            tasks (str or list, optional): The tasks or dataset configurations to load. Defaults to None.
+            transform (DFTransformBase, optional): A transform to apply after loading. Defaults to None.
+            cache_dir (str, optional): The local cache directory for the dataset. Defaults to None.
+            load_data_from_disk (bool, optional): Whether to load the dataset from a local directory. Defaults to False.
+            **kwargs: Additional keyword arguments for dataset loading.
         """
         super().__init__(path=path, transform=transform, **kwargs)
         self.split = split
@@ -536,27 +663,22 @@ def __init__(
         self.load_data_from_disk = load_data_from_disk
 
     def _save_base64_to_image_file(self, image_base64: dict, cache_path: str) -> str:
-        """
-        Saves a base64 encoded image to a local cache path and returns the path to be stored.
-        args:
-            image_base64: dict, that contains the byte string and file name.
-            cache_path: str, that is the directory to save the image.
-        returns:
-            file_path: str, full path to saved image.
+        """Saves a base64-encoded image to a local file.
+
+        Args:
+            image_base64 (dict): A dictionary containing the byte string and file name of the image.
+            cache_path (str): The local directory path where the image should be saved.
+
+        Returns:
+            str: The full path to the saved image file.
         """
         file_path = ""
 
         if image_base64:
-            # create path to save image
             file_path = os.path.join(cache_path, image_base64["path"])
-
-            # only do this if the image doesn't already exist
             if not os.path.exists(file_path):
-                # base64 string to binary image data
                 buffered = BytesIO(image_base64["bytes"])
                 query_image = Image.open(buffered).convert("RGB")
-
-                # save image and make the dir path if needed (need for paths with nested new dirs)
                 dir_path = os.path.dirname(file_path)
                 os.makedirs(dir_path, exist_ok=True)
                 query_image.save(file_path)
@@ -564,61 +686,52 @@ def _save_base64_to_image_file(self, image_base64: dict, cache_path: str) -> str
         return file_path
 
     def _save_images(self, df: pd.DataFrame, cache_path: str, image_columns) -> pd.DataFrame:
-        """
-        Saves all base64 encoded image columns to a local cache path and updates the data frame.
-        args:
-            df: Panda dataframe, to save images for.
-            cache_path: str, that is the directory to save the image.
-            image_columns: List of str, with names of columns that have images in them.
-        returns:
-            df: Pandas dataframe, with cached image path updated in the column
-        """
+        """Saves base64-encoded images from the dataframe columns to the local cache directory.
 
-        tqdm.pandas()
+        Args:
+            df (pd.DataFrame): The DataFrame containing base64-encoded images.
+            cache_path (str): The directory path where images are saved.
+            image_columns (list): The names of columns containing base64-encoded images.
 
+        Returns:
+            pd.DataFrame: The updated DataFrame with replaced image paths.
+        """
+        tqdm.pandas()
         for column in tqdm(image_columns, desc="Image Saving Progress:"):
             df[column] = df[column].progress_apply(self._save_base64_to_image_file, args=(cache_path,))
-
         return df
 
     def _hf_to_dataframe(self, hf_dataset):
-        """
-        Converts a huggingface dataset object to a Pandas dataframe.
-        If images are present in the dataset (as base64 encoded images),
-        those images will be cached to the local path where the dataset files reside.
-        args:
-            hf_dataset: huggingface dataset, object to convert.
-        returns:
-            df: Pandas dataframe, converted from a huggingface dataset.
-        """
+        """Converts a Hugging Face dataset object to a pandas DataFrame.
 
+        If the dataset contains base64-encoded images, they are saved to disk and replaced with file paths.
+
+        Args:
+            hf_dataset: The Hugging Face dataset object.
+
+        Returns:
+            pd.DataFrame: The resulting pandas DataFrame.
+        """
         df = hf_dataset.to_pandas()
 
         if hasattr(hf_dataset, "features"):
-            # find which columns contain images
             image_columns = [
                 col
                 for col in hf_dataset.features
                 if hasattr(hf_dataset.features[col], "dtype") and hf_dataset.features[col].dtype == "PIL.Image.Image"
             ]
-
             if image_columns:
-
-                # get the dir where the dataset is cached
                 cache_path = os.path.dirname(hf_dataset.cache_files[0]["filename"])
                 df = self._save_images(df, cache_path, image_columns)
-                # store the names of the images columns in the data frame for later retrieval
                 df["image_column_names"] = df.apply(lambda x: image_columns, axis=1)
 
         return df
 
     def _load_dataset(self) -> pd.DataFrame:
-        """
-        Loads a set of huggingface datasets specified as a list of splits or tasks (as provided to the class init).
-        Each dataset is loaded, processed to a Pandas dataframe and then merged to a single dataframe
-        Each dataframe has the task and split name added to a column before the merge
-        returns:
-            pd.concat(df_frames): Pandas dataframe, concatenated Pandas dataframe.
+        """Loads one or multiple Hugging Face datasets, converts them to pandas DataFrames, and merges them.
+
+        Returns:
+            pd.DataFrame: The combined dataset as a single DataFrame.
         """
         if not isinstance(self.split, list):
             self.split = [self.split]
diff --git a/eureka_ml_insights/data_utils/dna_utils.py b/eureka_ml_insights/data_utils/dna_utils.py
index 7b7b7cca..ee3c50f8 100644
--- a/eureka_ml_insights/data_utils/dna_utils.py
+++ b/eureka_ml_insights/data_utils/dna_utils.py
@@ -1,3 +1,7 @@
+"""This module provides a transform class to parse DNA labels from model outputs, and associated
+helper functions to parse the output labels.
+"""
+
 from dataclasses import dataclass
 
 import pandas as pd
@@ -7,12 +11,29 @@
 
 @dataclass
 class DNAParseLabel(DFTransformBase):
+    """Transformer to parse DNA label columns from model output.
+
+    Attributes:
+        model_output_column (str): The column name containing the model output.
+        model_action_label_column (str): The column name to store the parsed action label.
+        model_harmless_label_column (str): The column name to store the parsed harmless label.
+        use_updated_metric (bool): A flag indicating whether to use the updated parsing logic.
+    """
+
     model_output_column: str
     model_action_label_column: str
     model_harmless_label_column: str
     use_updated_metric: bool
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """Transforms a DataFrame by parsing model output columns into action labels.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame containing model output columns.
+
+        Returns:
+            pd.DataFrame: A DataFrame with updated action label and harmless label columns.
+        """
         if self.use_updated_metric:
             df[self.model_action_label_column] = df[self.model_output_column].apply(parse_output_label_updated)
             df[self.model_harmless_label_column] = (df[self.model_action_label_column].isin([0, 1, 2, 3, 4, 5])).astype(
@@ -25,12 +46,13 @@ def transform(self, df: pd.DataFrame) -> pd.DataFrame:
 
 
 def parse_output_label(s: str):
-    """
-    Parse the output to extract the model predicted label.
-    Parameters:
-    s (str): Model output with label as <answer>index</answer>
+    """Parses the output string to extract the model-predicted label.
+
+    Args:
+        s (str): The model output string containing labels in <answer>index</answer> format.
+
     Returns:
-    label (int): extracted label index
+        int: The extracted label index. Returns -1 if parsing fails.
     """
     if not s:
         return -1
@@ -51,13 +73,15 @@ def parse_output_label(s: str):
 
 
 def parse_output_label_updated(s: str):
-    """
-    Parse the output to extract the model predicted label.
-    Addresses edge cases in parsing. Integers <0, >6 and > 9 are giving invalid (-1) label.
-    Parameters:
-    s (str): Model output with label as <answer>index</answer>
+    """Parses the output string to extract the model-predicted label using updated logic.
+
+    Addresses edge cases by treating integers <0 or >6 as invalid (-1).
+
+    Args:
+        s (str): The model output string containing labels in <answer>index</answer> format.
+
     Returns:
-    label (int): extracted label index
+        int: The extracted label index. Returns -1 for invalid or failed parsing.
     """
     if not s:
         return -1
diff --git a/eureka_ml_insights/data_utils/encoders.py b/eureka_ml_insights/data_utils/encoders.py
index a8ab9d2b..b0252ccd 100644
--- a/eureka_ml_insights/data_utils/encoders.py
+++ b/eureka_ml_insights/data_utils/encoders.py
@@ -1,11 +1,32 @@
-import json
+"""This module provides classes and functions for JSON encoding of numpy data types."""
+
 import base64
+import json
+
 import numpy as np
 
+
 class NumpyEncoder(json.JSONEncoder):
-    """Special json encoder for numpy types"""
+    """Encodes numpy data types into JSON-compatible formats.
+
+    Extends the standard json.JSONEncoder to handle numeric, array, and bytes
+    objects from numpy.
+    """
 
     def default(self, obj):
+        """Returns a JSON-serializable version of the given object.
+
+        If the object is a numpy integer or float, it is converted to the
+        appropriate Python type (int or float). If the object is a numpy array,
+        it is converted to a list. If the object is a bytes object, it is
+        Base64-encoded. Otherwise, the default json.JSONEncoder method is called.
+
+        Args:
+            obj (Any): The object to serialize.
+
+        Returns:
+            Any: A JSON-serializable representation of the given object.
+        """
         if isinstance(
             obj,
             (
@@ -23,10 +44,10 @@ def default(self, obj):
             ),
         ):
             return int(obj)
-        elif isinstance(obj, (np.float_, np.float16, np.float32, np.float64)):
+        elif isinstance(obj, (np.float16, np.float32, np.float64)):
             return float(obj)
         elif isinstance(obj, (np.ndarray,)):
             return obj.tolist()
         elif isinstance(obj, bytes):
             return base64.b64encode(obj).decode("ascii")
-        return json.JSONEncoder.default(self, obj)
\ No newline at end of file
+        return json.JSONEncoder.default(self, obj)
diff --git a/eureka_ml_insights/data_utils/flenqa_utils.py b/eureka_ml_insights/data_utils/flenqa_utils.py
index 6ede6059..3dfa6f87 100644
--- a/eureka_ml_insights/data_utils/flenqa_utils.py
+++ b/eureka_ml_insights/data_utils/flenqa_utils.py
@@ -11,12 +11,24 @@
 @dataclass
 class FlenQAOutputProcessor(DFTransformBase):
     """
-    This class is for processing the output of models for the FLenQA dataset.
+    Handles processing the output of models for the FLenQA dataset.
+
+    Attributes:
+        chain_of_thought (bool): If True, the chain of thought is used in the analysis.
     """
 
     chain_of_thought: bool = False
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Transforms a given DataFrame for the FLenQA dataset.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame containing model output.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame with additional columns.
+        """
         df["ground_truth"] = df["ground_truth"].apply(lambda x: x.lower())
         processed_columns = df.apply(
             lambda row: response_analysis(
@@ -35,14 +47,16 @@ def transform(self, df: pd.DataFrame) -> pd.DataFrame:
 
 def normalize_answer(s):
     """
-    Lower text and remove punctuation, articles and extra white spaces
-    Code taken from: https://github.com/alonj/Same-Task-More-Tokens
+    Lowers the text, removes punctuation, articles, and extra white spaces.
+
+    This code is adapted from:
+    https://github.com/alonj/Same-Task-More-Tokens
 
     Args:
-    s: string to normalize
+        s (str): The string to normalize.
 
     Returns:
-    normalized string
+        str: The normalized string.
     """
     s = str(s).lower()
     s = s.replace("".join(list(set(string.punctuation))), "")
@@ -53,13 +67,16 @@ def normalize_answer(s):
 
 def response_category(ans):
     """
-    Categorize the answer as true, false or other/refused
-    Code taken from: https://github.com/alonj/Same-Task-More-Tokens
+    Categorizes the answer as "true," "false," or "none."
+
+    This code is adapted from:
+    https://github.com/alonj/Same-Task-More-Tokens
+
     Args:
-    ans: string to categorize
+        ans (str): The string to categorize.
 
     Returns:
-    string category
+        str: The category of the answer.
     """
     if isinstance(ans, (bool, np.bool_)):
         return normalize_answer(str(ans))
@@ -77,19 +94,21 @@ def response_category(ans):
 
 def response_analysis(response, dataset, facts, statement, is_valid, chain_of_thought=False):
     """
-    Analyze the model response, calculate chain of thought coverage, response category and early response.
-    Code taken and adapted from: https://github.com/alonj/Same-Task-More-Tokens
+    Analyzes the model response by calculating chain of thought coverage, response category, and early response.
+
+    This code is taken and adapted from:
+    https://github.com/alonj/Same-Task-More-Tokens
 
     Args:
-        response: model output to analyze
-        dataset: dataset name which is a column in the original flenQA dataset
-        facts: facts column from the original flenQA dataset
-        statement: statement column from the original flenQA dataset
-        is_valid: boolean column output from inference component indicating if the response is valid
-        chain_of_thought: boolean indicating if a chain of thought prompt was used.
+        response (str): The model output to analyze.
+        dataset (str): The dataset name which is a column in the original flenQA dataset.
+        facts (list): The facts column from the original flenQA dataset.
+        statement (list): The statement column from the original flenQA dataset.
+        is_valid (bool): Indicates if the response is valid.
+        chain_of_thought (bool): Indicates if a chain of thought prompt was used. Defaults to False.
 
     Returns:
-        Pandas series containing analysis results.
+        pd.Series: A pandas Series containing analysis results.
     """
     column_names = ["cot_coverage", "categorical_response", "early_response", "is_valid"]
     if not is_valid:
diff --git a/eureka_ml_insights/data_utils/gsm8k_utils.py b/eureka_ml_insights/data_utils/gsm8k_utils.py
index f24932da..944bc69f 100644
--- a/eureka_ml_insights/data_utils/gsm8k_utils.py
+++ b/eureka_ml_insights/data_utils/gsm8k_utils.py
@@ -1,3 +1,5 @@
+"""Module for extracting GSM8K answers from model output and storing them in a DataFrame."""
+
 from dataclasses import dataclass
 
 import pandas as pd
@@ -7,21 +9,41 @@
 
 @dataclass
 class GSM8KExtractAnswer(DFTransformBase):
+    """Extracts numeric answers from GSM8K model output columns in a DataFrame.
+
+    Attributes:
+        model_output_column (str): The name of the column containing the raw model output.
+        model_answer_column (str): The name of the column to store the extracted numeric answer.
+    """
+
     model_output_column: str
     model_answer_column: str
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """Transforms the input DataFrame by extracting numeric answers from model output.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame containing model output data.
+
+        Returns:
+            pd.DataFrame: A DataFrame with an additional column storing the extracted numeric answers.
+        """
         df[self.model_answer_column] = df[self.model_output_column].apply(self.parse_output_answer)
         return df
 
     @staticmethod
     def parse_output_answer(response: str) -> float:
-        """
-        Parse the input string to extract answer of a given GSM8K question.
-        Parameters:
-            response (str): Input string containing answer X
+        """Extracts a numeric answer from a given GSM8K response string.
+
+        This function looks for the answer text after "####", strips unwanted
+        characters, and attempts to convert it to a float. If the conversion fails,
+        it returns None.
+
+        Args:
+            response (str): The raw string response containing the answer.
+
         Returns:
-            numerical_value (float): A numeric value representing the model's answer.
+            float: The numeric value of the answer, or None if parsing fails.
         """
 
         def str_to_float(hyp):
diff --git a/eureka_ml_insights/data_utils/kitab_utils.py b/eureka_ml_insights/data_utils/kitab_utils.py
index 61281ac3..07f4d56b 100644
--- a/eureka_ml_insights/data_utils/kitab_utils.py
+++ b/eureka_ml_insights/data_utils/kitab_utils.py
@@ -2,6 +2,16 @@
 # Some code in this file is copied from the original source repository and then adapted to fit this repository.
 # The original license for this code is Community Data License Agreement - Permissive - Version 2.0
 
+"""
+Module for parsing and transforming book data from the Kitab dataset.
+
+This module provides classes and functions to parse model outputs, extract book
+information, and prepare context data. It includes utilities to handle specific
+modifications and standard output marker additions for more consistent data
+parsing. Additionally, it provides a method to retrieve GPT-4 preprocessed names
+from a specified dataset.
+"""
+
 import ast
 import re
 import urllib.request
@@ -14,23 +24,56 @@
 
 @dataclass
 class KitabExtractBooks(DFTransformBase):
+    """
+    Class for extracting books from a DataFrame by parsing model outputs.
+
+    Attributes:
+        model_output_column (str): The column name containing model output.
+        model_books_column (str): The column name to store extracted book data.
+    """
+
     model_output_column: str
     model_books_column: str
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Transforms the DataFrame by parsing the specified model output column
+        to extract book information.
+
+        Args:
+            df (pd.DataFrame): Input DataFrame.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame with the extracted book data.
+        """
         df[self.model_books_column] = df[self.model_output_column].apply(parse_output_reason)
         return df
 
 
 @dataclass
 class KitabExtractBooksAddMarker(DFTransformBase):
+    """
+    Class for extracting books from a DataFrame by parsing model outputs, with an
+    added step for injecting an 'Output:' marker if needed.
+
+    Attributes:
+        model_output_column (str): The column name containing model output.
+        model_books_column (str): The column name to store extracted book data.
+    """
+
     model_output_column: str
     model_books_column: str
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
         """
-        Some models (e.g. GPT3.5) sometimes does not follow instructions and do not add the "Output:\n" marker in its format
-        add_output_marker adds a "Output:\n" string to the model output prior to parsing, if it detects a list start, i.e. "1."
+        Transforms the DataFrame by adding an output marker if needed and then
+        parsing the specified model output column to extract book information.
+
+        Args:
+            df (pd.DataFrame): Input DataFrame.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame with the extracted book data.
         """
         df[self.model_books_column] = df[self.model_output_column].apply(add_output_marker)
         df[self.model_books_column] = df[self.model_books_column].apply(parse_output_reason)
@@ -39,19 +82,46 @@ def transform(self, df: pd.DataFrame) -> pd.DataFrame:
 
 @dataclass
 class PrepareContext(DFTransformBase):
+    """
+    Class for preparing context data for a set of books.
+
+    Attributes:
+        all_books_column (str): The column name containing all books data.
+        all_books_context_column (str): The column name to store the prepared
+            context data.
+    """
+
     all_books_column: str
     all_books_context_column: str
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Transforms the DataFrame by preparing the context for all books in the
+        specified column.
+
+        Args:
+            df (pd.DataFrame): Input DataFrame.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame with prepared context data.
+        """
         df[self.all_books_context_column] = df[self.all_books_column].apply(prepare_all_books_context)
         return df
 
 
 def prepare_all_books_context(s):
     """
-    Parse the array of all books such that it is presented to the model in the following format:
-    Title (year)    \n
-    Title (year)
+    Parses the array of all books and presents it in a specific text format.
+
+    This function reads the string representation of a list of books, and
+    returns a string in which each book is followed by four spaces and a
+    newline.
+
+    Args:
+        s (str): The string representing a list of books.
+
+    Returns:
+        str: A formatted string of book titles, each on its own line.
     """
     book_array = ast.literal_eval(s)
     context = ""
@@ -61,6 +131,18 @@ def prepare_all_books_context(s):
 
 
 def add_output_marker(s):
+    """
+    Adds an 'Output:\\n' marker to the given string if needed.
+
+    If the string does not contain 'Output:\\n', and it starts with '1.', this
+    function will prepend 'Output:\\n' to the string.
+
+    Args:
+        s (str): The input string.
+
+    Returns:
+        str: The possibly modified string with an output marker.
+    """
     if s == None:
         s = "Output:\n"
     # Look for the last occurrence of 'Output:\n' with an arbitrary number of spaces
@@ -74,15 +156,21 @@ def add_output_marker(s):
     return s
 
 
-# with all books
 def parse_output_reason(s):
     """
-    Parse the input string to extract titles and reasons from the 'Output:' section.
-    Parameters:
-    s (str): Input string containing information with 'Output:' section.
+    Parses the input string to extract titles and reasons from the 'Output:' section.
+
+    The function looks for the substring 'Output:\\n', and if found, processes
+    the subsequent text to extract pairs of reason and title. The extracted data
+    is returned as a dictionary with keys 'titles' and 'reasons'.
+
+    Args:
+        s (str): The input string containing information with an 'Output:' section.
+
     Returns:
-    dict: A dictionary containing extracted titles and reasons.
-          Example: {'titles': ['Title 1', 'Title 2'], 'reasons': ['Reason 1', 'Reason 2']}
+        dict: A dictionary with keys 'titles' and 'reasons', each containing a list
+            of extracted values. Example:
+            {'titles': ['Title 1', 'Title 2'], 'reasons': ['Reason 1', 'Reason 2']}
     """
     if s == None:
         return {"titles": [], "reasons": []}
@@ -101,8 +189,6 @@ def parse_output_reason(s):
 
     # regex for extracting reason and title
     reason_pattern = r"Reason: (.*?). Title:"
-
-    # Adjust the title pattern to make year optional
     title_pattern = r"Title: (.*?)\s*(?:\(\d{4}\))?$"
 
     reasons = re.findall(reason_pattern, s, re.MULTILINE)
@@ -113,7 +199,18 @@ def parse_output_reason(s):
 
 def get_gpt4_preprocessed_names(original_path, download_path):
     """
-    Download all book titles that have a human name as pre processed by gpt4
+    Downloads book titles preprocessed by GPT-4 that contain human names.
+
+    This function retrieves a CSV file from the provided URL, reads the data, and
+    collects book titles from a JSON-like column containing a dictionary with
+    a list of 'titles'.
+
+    Args:
+        original_path (str): The URL or path to download the CSV from.
+        download_path (str): The local path to store the downloaded CSV.
+
+    Returns:
+        list: A list of book titles extracted from the downloaded CSV.
     """
     urllib.request.urlretrieve(original_path, download_path)
     gpt4_names = []
diff --git a/eureka_ml_insights/data_utils/mathvision_utils.py b/eureka_ml_insights/data_utils/mathvision_utils.py
index 48e5df9f..26b0d8c2 100644
--- a/eureka_ml_insights/data_utils/mathvision_utils.py
+++ b/eureka_ml_insights/data_utils/mathvision_utils.py
@@ -1,142 +1,157 @@
-"""Evaluates output of models for Math-V dataset; following https://github.com/mathllm/MATH-V/tree/main/evaluation"""
+"""Evaluates the output of models for the Math-V dataset.
 
+Follows the evaluation script from:
+https://github.com/mathllm/MATH-V/tree/main/evaluation
+"""
+
+import re
 from dataclasses import dataclass
-from latex2sympy2 import latex2sympy
+
+import latex2sympy
 import pandas as pd
-import re
 
 from .transform import DFTransformBase
 
 
 @dataclass
 class MathVisionOutputEvaluator(DFTransformBase):
+    """Evaluates the output of models for the Math-V dataset.
+
+    Follows the evaluation script from the Math-V repo.
+
+    Attributes:
+        score_column_name (str): The name of the column in the DataFrame where
+            the score will be stored.
     """
-    This class is for evaluating the output of models for the Math-V dataset,
-    following the evaluation script from the Math-V repo.
-    """
+
     score_column_name: str = "score"
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
-        df[self.score_column_name] = df.apply(lambda row: evaluate(row["model_output"], row["answer"], row["options"]), axis=1)
+        """Transforms the input DataFrame by evaluating model outputs.
+
+        Applies the evaluation function to each row in the DataFrame to compute
+        a score, which is stored in the specified score_column_name.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame containing columns "model_output",
+                "answer", and "options".
+
+        Returns:
+            pd.DataFrame: The DataFrame with an additional column for the score.
+        """
+        df[self.score_column_name] = df.apply(
+            lambda row: evaluate(row["model_output"], row["answer"], row["options"]), axis=1
+        )
         return df
 
 
 def is_number(value):
+    """Checks if a given value can be converted to a float.
+
+    Args:
+        value (object): The value to check.
+
+    Returns:
+        bool: True if the value can be converted to float, otherwise False.
+    """
     try:
         float(value)
         return True
     except ValueError:
         return False
-    
+
 
 def eval_tuple(s):
-    """
-    Evaluates the mathematical expressions within tuples or lists represented as strings.
-    
+    """Evaluates mathematical expressions within tuples or lists represented as strings.
+
+    Splits the string by commas to evaluate each element using latex2sympy, rounding
+    the result to 2 decimal places when applicable.
+
     Args:
-        s (str): The string representation of a tuple or list.
-                 E.g., "(a,b,c,...)" or "[a,b,c,...]"
-    
+        s (str): The string representation of a tuple or list, e.g., "(2*3, 5+2)" or "[2*3, 5+2]".
+
     Returns:
         str: A string representation of the tuple or list with evaluated expressions.
-             Returns the original string if it doesn't match the expected format or if an error occurs.
-    
+        Returns the original string if it doesn't match the expected format or if an error occurs.
+
     Example:
-        eval_tuple("(2*3, 5+2)") -> "(6,7)"
-    
-    Note:
-        This function relies on the latex2sympy function which is assumed to be defined elsewhere in the code.
+        >>> eval_tuple("(2*3, 5+2)")
+        '(6,7)'
     """
-    # Split the string by commas to get individual elements
-    sl = s[1:-1].split(',')
-    
+    sl = s[1:-1].split(",")
+
     try:
-        # Check if string is a tuple representation and has more than one element
-        if s[0] == '(' and s[-1] == ')' and len(sl) > 1:
-            # Evaluate each element using latex2sympy and round the result to 2 decimal places
-            # Skip evaluation if element is 'infty', 'a', or '-a'
-            s = ','.join([str(round(eval(str(latex2sympy(sub))),2)) 
-                          if 'infty' not in sub and sub not in ['a', '-a'] else sub for sub in sl])
+        if s[0] == "(" and s[-1] == ")" and len(sl) > 1:
+            s = ",".join(
+                [
+                    str(round(eval(str(latex2sympy(sub))), 2)) if "infty" not in sub and sub not in ["a", "-a"] else sub
+                    for sub in sl
+                ]
+            )
             return f"({s})"
-        
-        # Check if string is a list representation and has more than one element
-        elif s[0] == '[' and s[-1] == ']' and len(sl) > 1:
-            # Same evaluation process as for tuples
-            s = ','.join([str(round(eval(str(latex2sympy(sub))),2)) 
-                          if 'infty' not in sub and sub not in ['a', '-a'] else sub for sub in sl])
+        elif s[0] == "[" and s[-1] == "]" and len(sl) > 1:
+            s = ",".join(
+                [
+                    str(round(eval(str(latex2sympy(sub))), 2)) if "infty" not in sub and sub not in ["a", "-a"] else sub
+                    for sub in sl
+                ]
+            )
             return f"[{s}]"
-    
-    except Exception:  # Catch any exceptions and return the original string
+    except Exception:
         return s
-    
-    # Return original string if it doesn't match tuple or list format
     return s
-    
+
 
 def is_equal(asw: str, gt_asw: str) -> bool:
-    """
-    Judge if `asw` is equivalent to `gt_asw`.
+    """Determines whether two answer strings are equivalent.
 
-    This function checks if the given answers are equivalent, considering
-    various scenarios such as tuples, lists separated by commas, and
-    mathematical equivalence in LaTeX format.
+    Checks for equivalence by considering tuple/list representations, LaTeX
+    expressions, and mathematical evaluations.
 
     Args:
         asw (str): The answer string to be checked.
-        gt_asw (str): The ground truth answer string to be matched against.
+        gt_asw (str): The ground truth answer string.
 
     Returns:
-        bool: True if the answers are equivalent, otherwise False.
-
+        bool: True if the answers are considered equivalent, otherwise False.
     """
-
-    # return gt_asw == asw
-
-    # Check for empty strings after removing spaces and return False if any of them is empty.
     asw = asw.lower()
     gt_asw = gt_asw.lower()
-    
-    if asw.replace(' ', '') == '' or gt_asw.replace(' ', '') == '':
+
+    if asw.replace(" ", "") == "" or gt_asw.replace(" ", "") == "":
         return False
 
     if gt_asw.strip() == asw.strip():
         return True
-   
-    # Convert the string to a tuple format.
+
     asw = eval_tuple(asw)
     gt_asw = eval_tuple(gt_asw)
 
-    # Check for simple tuple containment. Return True if one tuple is contained in the other.
     if gt_asw == asw:
         return True
 
     try:
-        # Convert LaTeX format to a sympy expression and evaluate both expressions.
-        # If the evaluated results are close enough (up to 2 decimal places), return True.
         if round(eval(str(latex2sympy(gt_asw))), 2) == round(eval(str(latex2sympy(asw))), 2):
             return True
-
         else:
             return False
     except:
-        # If any error occurs during comparison, return False.
         return False
-    
+
 
 def in_area(id: str, area: str) -> bool:
-    """Determine if a given ID falls within a specified area.
+    """Checks if a given ID falls within a specified area.
 
-    This function checks if a provided ID contains the specified area string
-    or if the ID matches the pattern for a test CSV related to that area.
+    The function returns True if the area is 'all' or if the ID contains the
+    specified area string or matches the pattern for a test CSV of that area.
 
     Args:
-        id (str): The ID to be checked.
-        area (str): The area string or 'all'. If 'all', the function always
-                    returns True.
+        id (str): The ID to check.
+        area (str): The area of interest or 'all'.
 
     Returns:
-        bool: True if the ID is within the specified area or the area is 'all',
-              False otherwise.
+        bool: True if the ID matches the specified area or if the area is 'all',
+            otherwise False.
 
     Examples:
         >>> in_area("abstract_algebra_test.csv_1", "algebra")
@@ -147,21 +162,27 @@ def in_area(id: str, area: str) -> bool:
         False
     """
 
-    # If the area is 'all', always return True
-    if area == 'all':
-        return True
-    
-    # Check if the ID contains the specified area or if it matches the pattern 
-    # for a test CSV related to that area
-    if f'/{area}/' in id or f'{area}_test.csv' in id:
+    if area == "all":
         return True
 
-    # If none of the above conditions are met, return False
+    if f"/{area}/" in id or f"{area}_test.csv" in id:
+        return True
     else:
         return False
 
 
 def extract_nums(s):
+    """Extracts numeric values from a string.
+
+    Removes commas and uses a regex to find numeric substrings. Attempts to
+    evaluate them as floats or integers for return.
+
+    Args:
+        s (str): The string from which to extract numbers.
+
+    Returns:
+        list: A list of numerical values (floats) extracted from the string.
+    """
     s = s.replace(",", "")
     nums = re.findall(r"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?", s)
     return_list = []
@@ -172,13 +193,43 @@ def extract_nums(s):
             pass
     return return_list
 
+
 def find_formula(step):
+    """Finds a formula enclosed within '<<' and '>>' in a string.
+
+    Asserts that the string contains exactly one pair of '<<' and '>>', then
+    extracts the substring between them.
+
+    Args:
+        step (str): The string containing the formula.
+
+    Returns:
+        str: The substring between the '<<' and '>>'.
+
+    Raises:
+        AssertionError: If the string doesn't contain exactly one matching
+            pair of '<<' and '>>'.
+    """
     assert step.count("<<") == step.count(">>") == 1
-    left, right = step.find("<<")+2, step.find(">>")
-    return step[left: right]
+    left, right = step.find("<<") + 2, step.find(">>")
+    return step[left:right]
 
 
 def extract_answer(completion):
+    """Extracts a numeric answer from a string using the pattern '#### XY'.
+
+    Uses a regex to match the pattern. If a match is found, commas are removed.
+    If no match is found, an AssertionError is raised.
+
+    Args:
+        completion (str): The string from which to extract the answer.
+
+    Returns:
+        str: The matched numeric substring without commas.
+
+    Raises:
+        AssertionError: If no suitable match is found.
+    """
     ANS_RE = re.compile(r"#### (\-?[0-9\.\,]+)")
     match = ANS_RE.search(completion)
     if match:
@@ -190,53 +241,61 @@ def extract_answer(completion):
 
 
 def delete_extra_zero(n):
+    """Removes unnecessary trailing zeros from a numeric string.
+
+    Tries to convert the input to a float. If successful, trailing zeros
+    (including trailing decimal points) are removed. Otherwise, the original
+    string is returned.
+
+    Args:
+        n (str): The numeric string to process.
+
+    Returns:
+        str: A string with trailing zeros removed or the original string if
+            conversion fails.
+    """
     try:
-        n = float(n)  # Try to convert the input to a float
-    except ValueError:  # If conversion fails
-        print("None {}".format(n))  # Print the error message
-        return n  # Return the original string
-        
-    # If n is an integer after conversion, return its string representation
+        n = float(n)
+    except ValueError:
+        print("None {}".format(n))
+        return n
+
     if isinstance(n, int):
         return str(n)
-    
-    # If n is a float after conversion
+
     if isinstance(n, float):
-        n = str(n).rstrip('0')  # Remove trailing zeros after the decimal point
-        # If number ends with a dot after removing zeros, convert to int
-        # Otherwise, keep it as float and return its string representation
-        n = int(n.rstrip('.')) if n.endswith('.') else float(n)
+        n = str(n).rstrip("0")
+        n = int(n.rstrip(".")) if n.endswith(".") else float(n)
         return str(n)
 
 
 def _fix_fracs(string):
-    # Split the string based on occurrences of '\frac'.
+    """Fixes fraction formatting in a LaTeX string containing '\frac'.
+
+    Ensures that numerators and denominators are properly enclosed in braces.
+
+    Args:
+        string (str): The LaTeX string to be processed.
+
+    Returns:
+        str: The LaTeX string with corrected fraction notation.
+    """
     substrs = string.split("\\frac")
     new_str = substrs[0]
 
-    # Check if there are any occurrences of '\frac' in the string.
     if len(substrs) > 1:
-        # Exclude the part of the string before the first '\frac'.
         substrs = substrs[1:]
-
         for substr in substrs:
             new_str += "\\frac"
-            # If the current substring already starts with a brace, 
-            # it's likely formatted correctly.
             if len(substr) > 0 and substr[0] == "{":
                 new_str += substr
             else:
-                # Ensure that the substring has at least 2 characters 
-                # for numerator and denominator.
                 try:
                     assert len(substr) >= 2
                 except:
                     return string
-
-                a = substr[0]  # Potential numerator.
-                b = substr[1]  # Potential denominator.
-
-                # Check if the denominator (b) is already braced.
+                a = substr[0]
+                b = substr[1]
                 if b != "{":
                     if len(substr) > 2:
                         post_substr = substr[2:]
@@ -249,165 +308,154 @@ def _fix_fracs(string):
                         new_str += "{" + a + "}" + b + post_substr
                     else:
                         new_str += "{" + a + "}" + b
-
-    # Update the string to the newly formatted version.
     string = new_str
     return string
 
 
 def _fix_a_slash_b(string):
-    # Check if the string contains exactly one slash, which may indicate it's a fraction.
+    """Converts a simple fraction of the form 'a/b' to LaTeX '\\frac{a}{b}' if possible.
+
+    Args:
+        string (str): The string to process, which may contain a fraction.
+
+    Returns:
+        str: A LaTeX fraction representation if applicable, otherwise the
+            original string.
+    """
     if len(string.split("/")) != 2:
         return string
 
-    # Split the string by slash to extract potential numerator and denominator.
     a, b = string.split("/")
-
     try:
-        # Try to convert the parts to integers.
         a = int(a)
         b = int(b)
-
-        # Check if the string is in the expected format after conversion.
         assert string == "{}/{}".format(a, b)
-
-        # Convert the fraction to LaTeX representation.
         new_string = "\\frac{" + str(a) + "}{" + str(b) + "}"
         return new_string
-
-    # Handle exceptions for non-integer fractions or other unexpected formats.
     except:
         return string
 
 
 def _remove_right_units(string):
-    # Split the string using "\\text{ " as the delimiter.
+    """Removes the last occurrence of '\\text{' and everything following it from a LaTeX string.
+
+    Splits the string on '\\text{ ' and returns the portion before the final segment.
+
+    Args:
+        string (str): The LaTeX string potentially containing units notation.
+
+    Returns:
+        str: The string with the trailing '\\text{' part removed.
+    """
     splits = string.split("\\text{ ")
-    
-    # Return the part of the string before the last occurrence of "\\text{ ".
     return splits[0]
 
 
 def _fix_sqrt(string):
-    # Check if "\sqrt" is not in the string. If not, return the string as is.
+    """Ensures that any '\sqrt' in a LaTeX string has its argument enclosed in braces.
+
+    Args:
+        string (str): The LaTeX string to process.
+
+    Returns:
+        str: The LaTeX string with corrected '\sqrt' usage.
+    """
     if "\\sqrt" not in string:
         return string
 
-    # Split the string based on the "\sqrt" substring.
     splits = string.split("\\sqrt")
-    
-    # The initial portion of the string before the first occurrence of "\sqrt".
     new_string = splits[0]
-
-    # Loop through each split portion (after the initial one).
     for split in splits[1:]:
-        # If the split portion is non-empty and the first character isn't a '{',
-        # then it means the argument of the sqrt is not enclosed in braces.
         if len(split) > 0 and split[0] != "{":
             a = split[0]
-            # Add braces around the first character and append the rest of the split portion.
             new_substr = "\\sqrt{" + a + "}" + split[1:]
         else:
-            # If the split portion starts with a '{', then it's already correct.
             new_substr = "\\sqrt" + split
-        # Add the new substring to our result string.
         new_string += new_substr
 
     return new_string
 
 
 def _strip_string(string):
-    # Remove linebreaks
-    string = string.replace("\n", "")
+    """Strips and fixes LaTeX formatting from a string.
 
-    # Remove inverse spaces
-    string = string.replace("\\!", "")
+    Removes linebreaks, extra spaces, certain LaTeX commands, and converts certain
+    fraction notations. Also handles sqrt usage and potential multiple representations
+    (e.g., 0.5 to \\frac{1}{2}).
 
-    # Replace double backslashes with a single backslash
-    string = string.replace("\\\\", "\\")
+    Args:
+        string (str): The string to clean.
 
-    # Replace tfrac and dfrac with frac
+    Returns:
+        str: The cleaned and transformed string.
+    """
+    string = string.replace("\n", "")
+    string = string.replace("\\!", "")
+    string = string.replace("\\\\", "\\")
     string = string.replace("tfrac", "frac")
     string = string.replace("dfrac", "frac")
-
-    # Remove \left and \right LaTeX commands
     string = string.replace("\\left", "")
     string = string.replace("\\right", "")
-
-    # Remove degree notation
     string = string.replace("^{\\circ}", "")
     string = string.replace("^\\circ", "")
-
-    # Remove dollar signs (potentially used for inline math in LaTeX)
     string = string.replace("\\$", "")
     string = string.replace("$", "")
-
-    # Remove units (assumed to be on the right). Note: The function _remove_right_units is not provided.
     string = _remove_right_units(string)
-
-    # Remove percentage notations
     string = string.replace("\\%", "")
     string = string.replace("\%", "")
-
-    # Handle floating numbers starting with "."
     string = string.replace(" .", " 0.")
     string = string.replace("{.", "{0.")
     if len(string) == 0:
         return string
     if string[0] == ".":
         string = "0" + string
-
-    # If there are equalities or approximations, only consider the value after them
     if len(string.split("=")) == 2:
         string = string.split("=")[-1]
     if len(string.split("\\approx")) == 2:
         string = string.split("\\approx")[-1]
-
-    # Fix sqrt values not wrapped in curly braces. Note: The function _fix_sqrt is not provided.
-    if 'sqrt' in string:
+    if "sqrt" in string:
         string = _fix_sqrt(string)
-
-    # Remove all spaces
     string = string.replace(" ", "")
-
-    # Transform certain fraction notations to the desired format. Note: The function _fix_fracs is not provided.
-    if 'sqrt' in string:
+    if "sqrt" in string:
         string = _fix_fracs(string)
-
-    # Convert 0.5 to its fraction representation
     if string == "0.5":
         string = "\\frac{1}{2}"
-
-    # Fix fractions represented with a slash. Note: The function _fix_a_slash_b is not provided.
     string = _fix_a_slash_b(string)
-
     return string
 
 
 def find_math_answer(s: str) -> str:
+    """Extracts and cleans the final mathematical answer from a LaTeX string.
+
+    Attempts to locate the substring after certain patterns (e.g., '\\boxed{}'),
+    then strips and fixes LaTeX formatting.
+
+    Args:
+        s (str): The LaTeX string from which to extract the math answer.
+
+    Returns:
+        str: The cleaned math answer.
+    """
     s = s.lower()
-    if '{}' in s:
-        s = s.replace('{}', '')
+    if "{}" in s:
+        s = s.replace("{}", "")
 
     try:
-        pattern = re.compile('oxed{(.*)}', flags=re.S)
+        pattern = re.compile("oxed{(.*)}", flags=re.S)
         ans = pattern.findall(s)[-1]
-    except:     
-        ans = s  # If the pattern is not found, consider the entire string as the answer.
+    except:
+        ans = s
 
-    # If there's a closing bracket without an opening bracket before it, consider everything before it.
-    if ans.find('}') != -1 and (ans.find('{') == -1 or  ans.find('}') < ans.find('{')):
-        ans = ans.split('}')[0]
+    if ans.find("}") != -1 and (ans.find("{") == -1 or ans.find("}") < ans.find("{")):
+        ans = ans.split("}")[0]
 
-    # Extract the value after the equals sign or approx symbol.
-    ans = ans.split('=')[-1]
-    ans = ans.split('\\approx')[-1]
+    ans = ans.split("=")[-1]
+    ans = ans.split("\\approx")[-1]
 
-    # Clean the string from various LaTeX formatting.
-    ans = ans.replace(" ", "").replace("\\,", "").replace('∞', '\\infty')
+    ans = ans.replace(" ", "").replace("\\,", "").replace("∞", "\\infty")
     ans = ans.replace("+\infty", "\infty").replace("\\\\", "\\").replace("\n", "")
-    ans = ans.replace('\\text', '').replace('\\mbox', '').replace('bmatrix', 'pmatrix')
-    ans = ans.replace("\\left", "").replace('\\right', '').replace("^{\\circ}", "")
+    ans = ans.replace("\\text", "").replace("\\mbox", "").replace("bmatrix", "pmatrix")
+    ans = ans.replace("\\left", "").replace("\\right", "").replace("^{\\circ}", "")
     ans = ans.replace("^\\circ", "").replace("{m}^3", "").replace("m^3", "")
     ans = ans.replace("{units}", "").replace("units", "").replace("{km}", "").replace("km", "")
 
@@ -415,9 +463,24 @@ def find_math_answer(s: str) -> str:
 
 
 def evaluate(model_output, answer, options):
+    """Evaluates the model output against a ground truth answer or set of options.
+
+    If the model output is empty, returns False. Otherwise, attempts to parse
+    the final answer and compares it to the ground truth. If multiple choice
+    options exist, uses the letter in 'answer' to select the correct option.
+
+    Args:
+        model_output (str): The raw text output from the model.
+        answer (str): The ground truth answer (can be a letter if options are provided).
+        options (list): A list of potential multiple choice answers.
+
+    Returns:
+        bool: True if the model output is equivalent to the ground truth answer,
+            otherwise False.
+    """
     if not model_output or model_output == "":
         return False
-    
+
     gt_answer = answer if isinstance(answer, str) else str(answer)
     if len(options) > 0:
         gt_answer_value = options[ord(gt_answer) - ord("A")]
@@ -425,25 +488,46 @@ def evaluate(model_output, answer, options):
         gt_answer_value = ""
 
     model_output = model_output.strip()
-    for c in 'ABCDE':
-        if model_output.endswith(f" {c}.") or model_output.endswith(f" ({c}).") or model_output.startswith(f"{c}\n") or model_output.startswith(f"({c})\n") or model_output.startswith(f"({c}) {c}\n"):
+    for c in "ABCDE":
+        if (
+            model_output.endswith(f" {c}.")
+            or model_output.endswith(f" ({c}).")
+            or model_output.startswith(f"{c}\n")
+            or model_output.startswith(f"({c})\n")
+            or model_output.startswith(f"({c}) {c}\n")
+        ):
             model_output = c
-    if is_number(model_output.split('is ')[-1].rstrip('.')):
-        model_output = model_output.split('is ')[-1].rstrip('.')
-    if 'oxed{' not in model_output:
-        for flag in ['the final answer is', 'the answer is', 'the correct answer is', 'the answer should be']:
+    if is_number(model_output.split("is ")[-1].rstrip(".")):
+        model_output = model_output.split("is ")[-1].rstrip(".")
+    if "oxed{" not in model_output:
+        for flag in ["the final answer is", "the answer is", "the correct answer is", "the answer should be"]:
             raw_model_output = model_output
             model_output = model_output.split(flag)[-1].strip()
             if flag in raw_model_output:
-                model_output = model_output.split('\n')[0].split('. ')[0]
-            flag = flag.replace('the', 'The')
+                model_output = model_output.split("\n")[0].split(". ")[0]
+            flag = flag.replace("the", "The")
             raw_model_output = model_output
             model_output = model_output.split(flag)[-1].strip()
             if flag in raw_model_output:
-                model_output = model_output.split('\n')[0].split('. ')[0]
-    elif model_output.count('oxed{') > 1:
-        model_output = '\\boxed{' + model_output.split('oxed{')[-1]
-            
-        model_output = find_math_answer(model_output).replace('(a)', 'a').replace('(b)', 'b').replace('(c)', 'c').replace('(d)', 'd').replace('(e)', 'e').replace('{a}', 'a').replace('{b}', 'b').replace('{c}', 'c').replace('{d}', 'd').replace('{e}', 'e').rstrip('.').lstrip(':').strip()
+                model_output = model_output.split("\n")[0].split(". ")[0]
+    elif model_output.count("oxed{") > 1:
+        model_output = "\\boxed{" + model_output.split("oxed{")[-1]
+
+        model_output = (
+            find_math_answer(model_output)
+            .replace("(a)", "a")
+            .replace("(b)", "b")
+            .replace("(c)", "c")
+            .replace("(d)", "d")
+            .replace("(e)", "e")
+            .replace("{a}", "a")
+            .replace("{b}", "b")
+            .replace("{c}", "c")
+            .replace("{d}", "d")
+            .replace("{e}", "e")
+            .rstrip(".")
+            .lstrip(":")
+            .strip()
+        )
 
     return is_equal(gt_answer, model_output) or is_equal(gt_answer_value, model_output)
diff --git a/eureka_ml_insights/data_utils/mmmu_utils.py b/eureka_ml_insights/data_utils/mmmu_utils.py
index a3732f9e..bb884847 100644
--- a/eureka_ml_insights/data_utils/mmmu_utils.py
+++ b/eureka_ml_insights/data_utils/mmmu_utils.py
@@ -1,3 +1,5 @@
+"""This module defines MMMU categories, tasks, and a class that transforms DataFrames to include MMMU prompts."""
+
 from dataclasses import dataclass
 
 import pandas as pd
@@ -40,17 +42,28 @@
 
 @dataclass
 class CreateMMMUPrompts(DFTransformBase):
-    """
-    Create prompts in the MMMU format for mutiple-choice and open-ended questions
-    The original code is located in https://github.com/MMMU-Benchmark/MMMU/tree/main/eval/utils and has
-    small modifications to work in this framework.
+    """Creates prompts in the MMMU format for multiple-choice and open-ended questions.
+
+    The original code is located at:
+    https://github.com/MMMU-Benchmark/MMMU/tree/main/eval/utils
+
+    Small modifications have been made to work in this framework.
     """
 
     def __init__(self):
+        """Initializes the CreateMMMUPrompts class with formats for multiple-choice and short answers."""
         self.multi_choice_example_format = "{}\n{}\nAnswer with the option's letter from the given choices directly."
         self.short_ans_example_format = "{}\nAnswer the question using a single word or phrase."
 
     def _create_prompt(self, sample):
+        """Creates a prompt string from a single sample.
+
+        Args:
+            sample (dict): A dictionary containing at least the keys 'question_type', 'question', and 'options'.
+
+        Returns:
+            str: The constructed prompt string based on the sample's question type.
+        """
         question = sample["question"]
         options = sample["options"]
         example = ""
@@ -71,6 +84,14 @@ def _create_prompt(self, sample):
         return prompt
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
-        df["prompt"] = df.apply(self._create_prompt, axis=1)
+        """Transforms a DataFrame by creating MMMU prompts for each row.
 
+        Args:
+            df (pd.DataFrame): The input DataFrame. The DataFrame must have columns
+                'question_type', 'question', and 'options', among others.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame containing a new 'prompt' column.
+        """
+        df["prompt"] = df.apply(self._create_prompt, axis=1)
         return df
diff --git a/eureka_ml_insights/data_utils/nphard_sat_utils.py b/eureka_ml_insights/data_utils/nphard_sat_utils.py
index 6f9997bd..321927d9 100644
--- a/eureka_ml_insights/data_utils/nphard_sat_utils.py
+++ b/eureka_ml_insights/data_utils/nphard_sat_utils.py
@@ -1,3 +1,10 @@
+"""
+This module provides utilities to parse model outputs for SAT solutions and convert
+them to binary representations. It includes a data transform class that extracts the
+SAT assignment from the model output and several helper functions for parsing and
+converting the solution.
+"""
+
 import ast
 import logging
 import re
@@ -11,54 +18,73 @@
 
 @dataclass
 class NPHARDSATExtractAnswer(DFTransformBase):
-    """Class to extract and transform the SAT path from model output."""
+    """
+    Extracts and transforms the SAT path from model output.
+
+    Attributes:
+        model_output_column (str): The name of the column containing the model's raw output.
+        model_answer_column (str): The name of the column where the transformed answer is stored.
+    """
 
     model_output_column: str
     model_answer_column: str
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
-        """Extracts the SAT assignment from the model output and stores it in the model_answer_column."""
+        """
+        Extracts the SAT assignment from the model output and stores it in the specified column.
+
+        Args:
+            df (pd.DataFrame): The input dataframe.
+
+        Returns:
+            pd.DataFrame: The dataframe with the extracted SAT assignment stored in the
+            'model_answer_column' column.
+        """
         df[self.model_answer_column] = df[self.model_output_column].apply(parse_assignment_from_model_output)
         return df
 
 
 def extract_final_answer(model_output):
-    """Return the text inside the *last* <final_answer> … </final_answer> block."""
+    """
+    Retrieves the text within the last <final_answer>...</final_answer> block from the model output.
+
+    Args:
+        model_output (str): The raw output from the model, which may contain <final_answer> blocks.
+
+    Returns:
+        Optional[str]: The content within the final <final_answer>...</final_answer> block,
+        or None if no block is found.
+    """
     open_tag = "<final_answer>"
-    # Locate the last occurrence of the opening tag.
     last_open = model_output.rfind(open_tag)
     if last_open == -1:
         return None
     sliced = model_output[last_open:]
-    # Grab everything between <final_answer> and </final_answer>
     matches = re.findall(r"<final_answer>(.*?)</final_answer>", sliced, re.DOTALL)
-
-    # Return the final captured block (if any); otherwise None.
     return matches[-1] if matches else None
 
 
 def extract_solution(final_answer: str) -> Optional[str]:
     """
-    Parse ``final_answer`` (which should look like ``{'Solution': 'True, False, ...'}``)
-    and return the value of the ``"Solution"`` key.
+    Parses a final answer string (for example, "{'Solution': 'True, False, ...'}") and
+    returns the value of the 'Solution' key.
 
-    Returns
-    -------
-    Optional[str]
-        The assignment string if present and well-formed, otherwise ``None``.
+    Args:
+        final_answer (str): A string representation of a dictionary containing a 'Solution' key.
+
+    Returns:
+        Optional[str]: The assignment string if present and well-formed, otherwise None.
     """
-    # Try to turn the raw string into a Python object.
     try:
         parsed = ast.literal_eval(final_answer)
     except ValueError as err:
         logging.info(f"extract_solution: literal_eval failed: {err}")
         return None
 
-    # Ensure we really got something dict-like.
     try:
         return parsed.get("Solution")
     except AttributeError:
-        logging.info("extract_solution: expected a dict-like object but got " f"{type(parsed).__name__}")
+        logging.info(f"extract_solution: expected a dict-like object but got {type(parsed).__name__}")
     except KeyError:
         logging.info("extract_solution: 'Solution' key not found in parsed result")
 
@@ -67,64 +93,52 @@ def extract_solution(final_answer: str) -> Optional[str]:
 
 def convert_to_binary_string(solution):
     """
-    Convert a comma-separated list of “True”/“False” flags into a
-    comma-separated list of “1”/“0”.
+    Converts a comma-separated list of "True"/"False" flags to a comma-separated
+    list of "1"/"0".
+
+    Any token other than exactly "True" or "False" is converted to "-1".
+
+    Example:
+        >>> convert_to_binary_string("True, False, True, True")
+        '1,0,1,1'
 
-    Special cases
-    -------------
-    * Any token other than exactly "True" or "False" →  "-1"
+    Args:
+        solution (str): A comma-separated string containing "True" or "False" values.
 
-    Example
-    -------
-    >>> convert_to_binary_string("True, False, True, True")
-    '1,0,1,1'
+    Returns:
+        str: A comma-separated string of "1" and "0", or "-1" if invalid input is encountered.
     """
-    # If the solution is not a string, there's nothing to process
     if not isinstance(solution, str):
         return "-1"
 
-    # Split on commas and strip whitespace
     parts = [p.strip() for p in solution.split(",")]
-
-    # If parts are not strictly "True" or "False", return -1.
     if not all(p in ["True", "False"] for p in parts):
         return "-1"
 
-    # Convert "True" -> "1" and "False" -> "0"
     converted_parts = ["1" if p == "True" else "0" for p in parts]
-
-    # Join the converted parts with commas
     return ",".join(converted_parts)
 
 
 def parse_assignment_from_model_output(model_output: str) -> str:
     """
-    Extract a SAT assignment from a model's raw output and convert it to a
-    binary string.
-
-    Returns
-    -------
-    str
-        A binary string representing the assignment, or "-1" if no valid
-        solution is found.
-    """
-    # Pull out the “final answer” block the model produced.
+    Extracts a SAT assignment from a model's raw output and converts it to a binary string.
 
-    final_answer: Optional[str] = extract_final_answer(model_output)
+    Args:
+        model_output (str): The raw output from the model.
 
+    Returns:
+        str: A binary string representing the assignment, or "-1" if no valid solution is found.
+    """
+    final_answer: Optional[str] = extract_final_answer(model_output)
     if not final_answer:
-        logging.info(f"No final answer section detected in model output.")
+        logging.info("No final answer section detected in model output.")
         return "-1"
 
-    # Try to parse the SAT solution from the answer block.
     sat_solution = extract_solution(final_answer)
-
     if not sat_solution:
         return "-1"
 
-    # If the solution is "Unsatisfiable"
     if sat_solution == "Unsatisfiable":
         return ""
 
-    # Convert parsed solution to a binary string representation.
     return convert_to_binary_string(sat_solution)
diff --git a/eureka_ml_insights/data_utils/nphard_tsp_utils.py b/eureka_ml_insights/data_utils/nphard_tsp_utils.py
index c8c5c46c..6d9bdcce 100644
--- a/eureka_ml_insights/data_utils/nphard_tsp_utils.py
+++ b/eureka_ml_insights/data_utils/nphard_tsp_utils.py
@@ -1,3 +1,10 @@
+"""This module provides functionality for extracting TSP paths from model output.
+
+It includes:
+1) A data transform class (NPHARDTSPExtractAnswer) that organizes extraction in a DataFrame.
+2) Utility functions (extract_final_answer, extract_path, parse_path_from_model_output) to parse TSP paths from raw strings.
+"""
+
 import json
 import logging
 import re
@@ -10,27 +17,64 @@
 
 @dataclass
 class NPHARDTSPExtractAnswer(DFTransformBase):
-    """Class to extract and transform the TSP path from model output."""
+    """Extracts TSP paths from model output columns in a DataFrame.
+
+    This class transforms the TSP data in a DataFrame by applying the
+    parse_path_from_model_output function to each row.
+
+    Attributes:
+        model_output_column (str): The name of the column containing model outputs.
+        model_answer_column (str): The name of the column to store extracted TSP paths.
+    """
 
     model_output_column: str
     model_answer_column: str
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
-        """Extracts the tsp path from the model output and stores it in the model_answer_column."""
+        """Transforms the DataFrame by extracting the TSP path.
+
+        This method applies parse_path_from_model_output to the specified
+        model output column and stores the result in the specified
+        model answer column.
+
+        Args:
+            df (pd.DataFrame): The DataFrame containing the model output.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame with the extracted TSP path.
+        """
         df[self.model_answer_column] = df[self.model_output_column].apply(parse_path_from_model_output)
         return df
 
 
 def extract_final_answer(model_output):
-    # Find all non-overlapping occurrences between <final_answer> and </final_answer>
-    matches = re.findall(r"<final_answer>(.*?)</final_answer>", model_output, flags=re.DOTALL)
+    """Extracts the final answer from the model output.
+
+    Finds all non-overlapping occurrences between <final_answer> and </final_answer>
+    and returns the last occurrence if any are found, otherwise returns None.
 
-    # Return the last occurrence if any are found, otherwise return None
+    Args:
+        model_output (str): The raw output string from the model.
+
+    Returns:
+        str or None: The last occurrence of the final answer if found, otherwise None.
+    """
+    matches = re.findall(r"<final_answer>(.*?)</final_answer>", model_output, flags=re.DOTALL)
     return matches[-1] if matches else None
 
 
 def extract_path(final_answer):
-    """Extracts the path string from the final answer, handling both JSON formats."""
+    """Extracts the path string from the final answer.
+
+    This function tries to parse the final answer as JSON or uses a regex fallback
+    to extract the "Path" value.
+
+    Args:
+        final_answer (str): The final answer string extracted from model output.
+
+    Returns:
+        str or None: The path string if extracted, otherwise None.
+    """
     try:
         # Convert single quotes to double quotes for valid JSON parsing
         final_answer_json = json.loads(final_answer.replace("'", '"'))
@@ -42,7 +86,18 @@ def extract_path(final_answer):
 
 
 def parse_path_from_model_output(model_output_string):
-    """Parses the model output to extract a tsp path."""
+    """Parses the model output to extract a TSP path.
+
+    This function extracts the final answer, then attempts to retrieve
+    the path and parse it into a list of integers. If no valid path is
+    found, a default path "0,0,0,0" is returned.
+
+    Args:
+        model_output_string (str): The raw output string from the model.
+
+    Returns:
+        str: A comma-separated string of integers indicating the TSP path.
+    """
     try:
         final_answer = extract_final_answer(model_output_string)
         tour_string = extract_path(final_answer) if final_answer else None
diff --git a/eureka_ml_insights/data_utils/omni_math_utils.py b/eureka_ml_insights/data_utils/omni_math_utils.py
index 6a3156be..ab614d49 100644
--- a/eureka_ml_insights/data_utils/omni_math_utils.py
+++ b/eureka_ml_insights/data_utils/omni_math_utils.py
@@ -1,76 +1,115 @@
+"""This module provides functions and classes for parsing output from a math model 
+and applying transformations to data frames by extracting labels and solutions."""
+
 import math
-import re
 from dataclasses import dataclass
 
 import pandas as pd
 
 from .transform import DFTransformBase
 
-# @staticmethod
+
 def parse_output_answer(response):
-    """
-    Parse the input string to extract the model judgement.
-    Parameters:
+    """Parses the input string to extract the model judgement.
+
+    Args:
         response (str): Input string containing model judgement as '## Equivalence Judgement: X '.
-    Returns: 
-        dict: A dict of extracted final answer and model based judgement.
+
+    Returns:
+        dict: A dictionary of extracted final answer and model-based judgement.
     """
-    if response is None or response == '':
+    if response is None or response == "":
         return {}
 
     parts = response.split("## ")
     data = {}
-    
+
     for part in parts[1:]:
         lines = part.strip().split("\n")
-        title = lines[0].strip().replace('#', '').replace('*', '').lower()
+        title = lines[0].strip().replace("#", "").replace("*", "").lower()
         content = "\n".join(lines[1:]).strip()
-        
-        if title == "Justification":
+
+        if title == "Justification".lower():
             data[title] = content
         else:
-            data[title] = lines[1].strip() if len(lines) > 1 else ''
-    
+            data[title] = lines[1].strip() if len(lines) > 1 else ""
+
     return data
 
+
 @dataclass
 class Omni_Math_ParseLabel(DFTransformBase):
+    """A DFTransformBase subclass that parses a math model's label output."""
+
     model_output_column: str
     model_answer_column: str
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """Transforms the DataFrame by extracting labels from the model output.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame with the extracted labels appended.
+        """
         df[self.model_answer_column] = df[self.model_output_column].apply(self.extract_label)
         return df
-    
+
     @staticmethod
     def extract_label(response):
+        """Extracts the numeric label from the model's response.
+
+        Args:
+            response (str): The model's output string.
+
+        Returns:
+            float: The numeric label representing the model's equivalence judgement
+                (1 for True, 0 for False, NaN otherwise).
+        """
         data = parse_output_answer(response)
-        label = 'Equivalence Judgement'.lower()
-        model_label = data[label] if label in data else ''
+        label = "equivalence judgement"
+        model_label = data[label] if label in data else ""
         numeric_label = math.nan
-        if model_label.strip().replace('#', '').replace('*', '').lower() == 'true':
+        if model_label.strip().replace("#", "").replace("*", "").lower() == "true":
             numeric_label = 1
-        elif model_label.strip().replace('#', '').replace('*', '').lower() == 'false':
+        elif model_label.strip().replace("#", "").replace("*", "").lower() == "false":
             numeric_label = 0
         if numeric_label == math.nan:
             print(data[label], model_label, numeric_label)
         return numeric_label
-    
+
 
 @dataclass
 class Omni_Math_ParseSolution(DFTransformBase):
+    """A DFTransformBase subclass that parses a math model's solution output."""
+
     model_output_column: str
     model_answer_column: str
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """Transforms the DataFrame by extracting the solution from the model output.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame with the extracted solutions appended.
+        """
         df[self.model_answer_column] = df[self.model_output_column].apply(self.extract_solution)
         return df
-    
+
     @staticmethod
     def extract_solution(response):
+        """Extracts the solution from the model's response.
+
+        Args:
+            response (str): The model's output string.
+
+        Returns:
+            str: The solution extracted from the model's response, or an empty string if not present.
+        """
         data = parse_output_answer(response)
-        label = 'Student Final Answer'.lower()
-        model_label = data[label] if label in data else ''
+        label = "student final answer"
+        model_label = data[label] if label in data else ""
         return model_label
-
-    
\ No newline at end of file
diff --git a/eureka_ml_insights/data_utils/prompt_processing.py b/eureka_ml_insights/data_utils/prompt_processing.py
index aa1e8c8b..bb9e3fd0 100644
--- a/eureka_ml_insights/data_utils/prompt_processing.py
+++ b/eureka_ml_insights/data_utils/prompt_processing.py
@@ -1,3 +1,5 @@
+"""This module provides a JinjaPromptTemplate class to load and render Jinja templates."""
+
 import logging
 from typing import Any, Dict, Set
 
@@ -6,6 +8,12 @@
 
 
 class JinjaPromptTemplate:
+    """Provides functionality to load and render Jinja templates.
+
+    Attributes:
+        env (jinja2.Environment): The Jinja environment used for rendering.
+        template (jinja2.Template): The Jinja template object parsed from the template file.
+    """
 
     env: jinja2.Environment
     template: jinja2.Template
@@ -14,6 +22,11 @@ def __init__(
         self,
         template_path: str,
     ):
+        """Initializes JinjaPromptTemplate with a template file.
+
+        Args:
+            template_path (str): The path to the Jinja template file.
+        """
         with open(template_path, "r", encoding="utf-8") as tf:
             template = tf.read().strip()
         logging.info(f"Template is:\n {template}.")
@@ -24,13 +37,32 @@ def __init__(
         logging.info(f"Placeholders are: {self.placeholders}.")
 
     def _find_placeholders(self, env: jinja2.Environment, template: str) -> Set[str]:
+        """Finds the undeclared variables in the given Jinja template.
+
+        Args:
+            env (jinja2.Environment): The Jinja environment to parse the template.
+            template (str): The Jinja template content.
+
+        Returns:
+            Set[str]: A set of placeholder names.
+        """
         ast = self.env.parse(template)
         return jinja2.meta.find_undeclared_variables(ast)
 
     def create(self, values: Dict[str, Any]) -> str:
-        """
-        Instantiates the template by filling all placeholders with provided values.  Returns the filled template. Raises
-        exception when some placeholder value was not provided.
+        """Instantiates the template by filling all placeholders with the provided values.
+
+        If any placeholder does not have a corresponding value in the dictionary,
+        a ValueError is raised.
+
+        Args:
+            values (Dict[str, Any]): A dictionary of placeholder values.
+
+        Returns:
+            str: The rendered template string.
+
+        Raises:
+            ValueError: If a placeholder is not found in the provided values.
         """
         for placeholder in self.placeholders:
             if placeholder not in values:
diff --git a/eureka_ml_insights/data_utils/spatial_utils.py b/eureka_ml_insights/data_utils/spatial_utils.py
index b3159e1c..4896b59f 100644
--- a/eureka_ml_insights/data_utils/spatial_utils.py
+++ b/eureka_ml_insights/data_utils/spatial_utils.py
@@ -10,9 +10,23 @@
 
 @dataclass
 class LowerCaseNoPunctuationConvertNumbers(MultiColumnTransform):
+    """A multi-column transform that converts text to lower case, removes punctuation, and replaces textual numbers with numeric digits.
+
+    Attributes:
+        columns (List[str]): The list of column names to transform.
+    """
+
     columns: List[str]
 
     def punctuation_case_number_replace(self, text):
+        """Replaces punctuation, lowers case, and converts textual numbers to digits.
+
+        Args:
+            text (str): The text to transform.
+
+        Returns:
+            str: The transformed text, or None if the input is None.
+        """
         if text:
             text = re.sub(r"[^\w\s\']", "", text).replace('"', "").replace("'", "").lower().strip()
             # replace words for numbers with numbers, a common occurance in LLM outputs
@@ -36,6 +50,14 @@ def punctuation_case_number_replace(self, text):
         return text
 
     def _transform(self, text):
+        """Applies the punctuation_case_number_replace method to the given text or list of texts.
+
+        Args:
+            text (str or list): The input text or list of texts to transform.
+
+        Returns:
+            str or list: The transformed text or list of texts.
+        """
         if isinstance(text, list):
             return list(map(self.punctuation_case_number_replace, text))
         else:
@@ -43,9 +65,16 @@ def _transform(self, text):
 
 
 def remove_redundancy(text):
-    """
+    """Removes redundancy from the text.
+
     The code is from: https://github.com/alvinmingwisc/spatial_reason_vlm/tree/main/eval,
     and included with minimal modifications.
+
+    Args:
+        text (str): The text to remove redundancy from.
+
+    Returns:
+        str: The text with redundancy removed.
     """
     text = text.replace("**", "")
     text = text.replace(".", "")
@@ -53,35 +82,34 @@ def remove_redundancy(text):
 
 
 def extract_before_is(input_string):
-    """
-    This function extracts the part of the string before the first occurrence of 'is'.
+    """Extracts the part of the string before the first occurrence of 'is'.
+
     The code is from: https://github.com/alvinmingwisc/spatial_reason_vlm/tree/main/eval,
     and included with minimal modifications.
 
-    :param input_string: The string to process.
-    :return: A new string containing the part before 'is'.
+    Args:
+        input_string (str): The string to process.
+
+    Returns:
+        str: A new string containing the part before 'is'.
     """
-    # Split the string at the first occurrence of 'is'
     parts = input_string.split(" is", 1)
-    # Return the first part
     return parts[0].strip()
 
 
 def extract_answer_from_text_grid(text, question_type):
-    """
-    Extracts the answer from the text based on specific patterns,
-    and as a fallback, extracts the first number if no patterns match.
+    """Extracts the answer from the text based on specific patterns, and as a fallback, extracts the first number if no patterns match.
+
     The code is from: https://github.com/alvinmingwisc/spatial_reason_vlm/tree/main/eval,
     and included with minimal modifications.
 
     Args:
-    - text (str): The text containing the model's answer.
-    - question_type (str): The text containing the question type.
+        text (str): The text containing the model's answer.
+        question_type (str): The text containing the question type.
 
     Returns:
-    - str or None: The extracted answer, or None if no answer could be extracted.
+        str or None: The extracted answer, or None if no answer could be extracted.
     """
-    # Mapping of textual numbers to their numeric equivalents
     number_mapping = {
         "zero": 0,
         "no": 0,
@@ -99,7 +127,6 @@ def extract_answer_from_text_grid(text, question_type):
     animals = ["giraffe", "cat", "dog", "elephant", "rabbit"]
     animal_pattern = rf"\b({'|'.join(animals)})\b"
 
-    # First rule: check for the pattern "A. x", "B. x", "C. x", or "D. x" where x is the answer
     if not text:
         return None
     match = re.search(r"\b[A-D]\.\s*(\w+)", text)
@@ -109,16 +136,11 @@ def extract_answer_from_text_grid(text, question_type):
     question_id = int(re.search("[0-9]", re.search("Q[0-9]", question_type).group()).group())
 
     if question_id >= 2:  # Assuming Q4 and Q5 require binary answers
-        # Check for animals
         match = re.search(animal_pattern, text, re.IGNORECASE)
         if match:
             return match.group(1)
         return None
     else:
-        # check answer with numbers
-        # Create a list to store all found numbers along with their positions
-
-        # specific for claude
         specific_phrases = [
             (r"\bthere are\s*(\d+)\s*blocks\b", 1),
             (r"\bthere appear to be\s*(\d+)\s*blocks\b", 1),
@@ -130,47 +152,39 @@ def extract_answer_from_text_grid(text, question_type):
         if match:
             return match.group(group_index)
 
-        # If no specific phrases found, proceed with other checks
         found_numbers = []
 
-        # Check for textual numbers and their positions
         for text_num, num in number_mapping.items():
             for match in re.finditer(rf"\b{text_num}\b", text, re.IGNORECASE):
                 found_numbers.append((match.start(), num))
 
-        # Check for digit sequences and their positions, specifically ignoring list markers at the start
-        # Exclude numbers following "\n\n" and directly followed by ". "
-        text = re.sub(r"^\n\n\d+\.\s", "", text)  # Remove the leading list marker if it exists
+        text = re.sub(r"^\n\n\d+\.\s", "", text)
 
         for match in re.finditer(r"\d+", text):
             found_numbers.append((match.start(), int(match.group(0))))
 
-        # Sort found numbers by their positions (smallest position first)
         if found_numbers:
             found_numbers.sort(key=lambda x: x[0])
-            # Return the number associated with the earliest position
             return str(found_numbers[0][1])
 
-    return None  # Return None if no numbers are found
+    return None
 
 
 def extract_answer_from_text_map_and_maze(model_output_raw, options, is_valid, match_first=False):
-    """
-    Extracts the answer from the text based on known model output patterns.
-    Searches for both a letter and whole word answer and returns both as they are not
-    always consistent.
+    """Extracts the answer from the text based on known model output patterns.
+
+    The code is from: https://github.com/alvinmingwisc/spatial_reason_vlm/tree/main/eval,
+    and included with minimal modifications.
 
     Args:
-    - model_output_raw (str): The text containing the model's answer.
-    - options (str): The list of options.
-    - match_first (bool): If True, the first match is used for answer extraction when multiple matches are found.
+        model_output_raw (str): The text containing the model's answer.
+        options (str): The list of options.
+        is_valid (bool): Whether the input is valid.
+        match_first (bool, optional): If True, the first match is used for answer extraction when multiple matches are found. Defaults to False.
 
     Returns:
-    - str or None: The extracted answers, or empty strings if no answer could be extracted.
+        str or None: The extracted answers, or an empty string if no answer could be extracted.
     """
-
-    # replace common subsitutions in model outputs
-
     model_output_parsed_letter = ""
     model_output_parsed = ""
 
@@ -182,12 +196,12 @@ def extract_answer_from_text_map_and_maze(model_output_raw, options, is_valid, m
     else:
         match_index = -1
 
-    model_output_raw =  re.sub(r"\bno objects\b", "0 objects", model_output_raw, re.IGNORECASE)
-    model_output_raw =  re.sub(r"\bnot\b", "no", model_output_raw, re.IGNORECASE)
-    model_output_raw =  re.sub(r"\bshould be\b", "is", model_output_raw, re.IGNORECASE)
+    model_output_raw = re.sub(r"\bno objects\b", "0 objects", model_output_raw, re.IGNORECASE)
+    model_output_raw = re.sub(r"\bnot\b", "no", model_output_raw, re.IGNORECASE)
+    model_output_raw = re.sub(r"\bshould be\b", "is", model_output_raw, re.IGNORECASE)
 
     number_mapping = {
-        "zero": 0,           
+        "zero": 0,
         "one": 1,
         "two": 2,
         "three": 3,
@@ -200,15 +214,13 @@ def extract_answer_from_text_map_and_maze(model_output_raw, options, is_valid, m
     }
 
     for k, v in number_mapping.items():
-        model_output_raw =  re.sub(rf"\b{k}\b", str(v), model_output_raw, re.IGNORECASE)
-
-    # get dict of options from options string
-    options_dict = {x.split(".")[0].strip().lower():x.split(".")[1].strip().lower() for x in options}
+        model_output_raw = re.sub(rf"\b{k}\b", str(v), model_output_raw, re.IGNORECASE)
 
+    options_dict = {x.split(".")[0].strip().lower(): x.split(".")[1].strip().lower() for x in options}
 
     model_output_parsed_letter = ""
     model_output_parsed = ""
-            
+
     answers = [v for k, v in options_dict.items()]
     answers_pattern = rf"\b({'|'.join(answers)})\b"
 
@@ -226,24 +238,22 @@ def extract_answer_from_text_map_and_maze(model_output_raw, options, is_valid, m
         matches = re.findall(pattern_phrase, model_output_raw, re.IGNORECASE)
         if matches:
             model_output_answer_line = matches[match_index]
-        
+
             answers_match = re.findall(answers_pattern, model_output_answer_line, re.IGNORECASE)
-    
+
             if answers_match:
-                model_output_parsed =  answers_match[match_index]
+                model_output_parsed = answers_match[match_index]
             else:
                 letters = [k for k, v in options_dict.items()]
                 letters_pattern = rf"\b({'|'.join(letters)})\b"
                 letters_pattern_match = re.findall(letters_pattern, model_output_answer_line, re.IGNORECASE)
 
                 if letters_pattern_match:
-                    match_option =  letters_pattern_match[match_index].lower()
+                    match_option = letters_pattern_match[match_index].lower()
                     model_output_parsed_letter = options_dict[match_option]
 
     elif "answer is".lower() in model_output_raw.lower():
-        pattern_letter = r'answer is:*\s*\**([\w\d]+)[\s:.]*\**'
-    
-        # first look for a single letter answer
+        pattern_letter = r"answer is:*\s*\**([\w\d]+)[\s:.]*\**"
         matches = re.findall(pattern_letter, model_output_raw, re.IGNORECASE)
         if matches:
             match_option = matches[match_index].lower()
@@ -252,23 +262,27 @@ def extract_answer_from_text_map_and_maze(model_output_raw, options, is_valid, m
             else:
                 model_output_parsed_letter = match_option
 
-    # next look if any of the options names are present in the first line
-
-    model_output_answer_line = model_output_raw.splitlines()[0]        
+    model_output_answer_line = model_output_raw.splitlines()[0]
 
     answers = [v for k, v in options_dict.items()]
     answers_pattern = rf"\b({'|'.join(answers)})\b"
     answers_match = re.findall(answers_pattern, model_output_answer_line, re.IGNORECASE)
-    
+
     if answers_match:
-        model_output_parsed =  answers_match[match_index]
+        model_output_parsed = answers_match[match_index]
 
     return " or ".join(filter(None, [model_output_parsed, model_output_parsed_letter]))
 
 
 @dataclass
 class ExtractAnswer(DFTransformBase):
-    """This class is a base class for an answer extractor that is conditioned on the question type."""
+    """Base class for an answer extractor that is conditioned on the question type.
+
+    Attributes:
+        answer_column_name (str): The column name containing the answer text.
+        extracted_answer_column_name (str): The column name for the extracted answer.
+        question_type_column_name (str): The column name for the question type.
+    """
 
     answer_column_name: str
     extracted_answer_column_name: str
@@ -276,50 +290,85 @@ class ExtractAnswer(DFTransformBase):
 
     @abstractmethod
     def _parse_answer_function(self, answer_text, question_type):
-        pass
+        """Parses an answer from textual input based on the question type.
+
+        Args:
+            answer_text (str): The answer text to parse.
+            question_type (str): The question type.
+
+        Returns:
+            str or None: The parsed answer, or None if it cannot be parsed.
+        """
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """Transforms the dataframe by parsing answers with the provided parse function.
+
+        Args:
+            df (pd.DataFrame): The input dataframe containing the columns for answers.
+
+        Returns:
+            pd.DataFrame: A new dataframe with the extracted answers.
+        """
         df[self.extracted_answer_column_name] = df.apply(
             lambda x: self._parse_answer_function(x[self.answer_column_name], x[self.question_type_column_name]), axis=1
         )
         return df
 
+
 @dataclass
 class ExtractQuestionOptions(DFTransformBase):
-    """This class is for extracting the option list from a prompt."""
+    """Extracts the list of options from a prompt.
+
+    Attributes:
+        prompt_column_name (str): The column name containing the prompt text.
+        extracted_options_column_name (str): The column name for the extracted options.
+    """
 
     prompt_column_name: str
     extracted_options_column_name: str
 
     def _extract_options_from_text_map(self, prompt):
-        """
-        Extracts the multiple-choice options list from the text.
+        """Extracts the multiple-choice options list from the text.
 
         Args:
-        - text (str): The text containing the prompt.
+            prompt (str): The text containing the prompt.
 
         Returns:
-        - str or None: The extracted list of options.
+            list or None: The extracted list of options, or None if not found.
         """
-
-        # get list of options from prompt
         prompt_lines = prompt.splitlines()
         matches = [i for i, x in enumerate(prompt_lines) if "Available options:" in x]
 
-        if "Yes" in prompt_lines[matches[0]+1]:
-            options = prompt_lines[matches[0]+1:matches[0]+3]
+        if "Yes" in prompt_lines[matches[0] + 1]:
+            options = prompt_lines[matches[0] + 1 : matches[0] + 3]
         else:
-            options = prompt_lines[matches[0]+1:matches[0]+5]
+            options = prompt_lines[matches[0] + 1 : matches[0] + 5]
 
         return options
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """Transforms the dataframe by extracting options from the prompt column.
+
+        Args:
+            df (pd.DataFrame): The input dataframe with the prompt column.
+
+        Returns:
+            pd.DataFrame: A new dataframe with the extracted options.
+        """
         df[self.extracted_options_column_name] = df[self.prompt_column_name].apply(self._extract_options_from_text_map)
         return df
 
+
 @dataclass
 class ExtractAnswerGrid(ExtractAnswer):
-    """This class is an answer extractor for the GRID benchmark."""
+    """Answer extractor for the GRID benchmark.
+
+    Attributes:
+        answer_column_name (str): The column name containing the answer text.
+        extracted_answer_column_name (str): The column name for the extracted answer.
+        question_type_column_name (str): The column name for the question type.
+        mode (str): The mode of the extractor.
+    """
 
     answer_column_name: str
     extracted_answer_column_name: str
@@ -328,20 +377,48 @@ class ExtractAnswerGrid(ExtractAnswer):
 
     @abstractmethod
     def _parse_answer_function(self, answer_text, question_type):
+        """Parses an answer from textual input for the GRID benchmark.
+
+        Args:
+            answer_text (str): The answer text to parse.
+            question_type (str): The question type.
+
+        Returns:
+            str or None: The parsed answer for the GRID benchmark.
+        """
         return extract_answer_from_text_grid(answer_text, question_type)
 
 
 @dataclass
 class ExtractAnswerSpatialMapAndMaze(DFTransformBase):
-    """This class is an answer extractor for the SPATIAL_MAP and MAZE benchmark."""
+    """Answer extractor for the SPATIAL_MAP and MAZE benchmark.
+
+    Attributes:
+        answer_column_name (str): The column name containing the answer text.
+        extracted_answer_column_name (str): The column name for the extracted answer.
+        extracted_options_column_name (str): The column name containing the extracted options.
+        match_first (bool): Whether to match the first occurrence (True) or the last occurrence (False).
+    """
 
     answer_column_name: str
     extracted_answer_column_name: str
     extracted_options_column_name: str
-    match_first: bool = False    
+    match_first: bool = False
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """Transforms the dataframe by extracting answers from text map and maze.
+
+        Args:
+            df (pd.DataFrame): The input dataframe that includes an answer column,
+                extracted options column, and an "is_valid" column.
+
+        Returns:
+            pd.DataFrame: A new dataframe with the extracted answers.
+        """
         df[self.extracted_answer_column_name] = df.apply(
-            lambda x: extract_answer_from_text_map_and_maze(x[self.answer_column_name], x[self.extracted_options_column_name], x["is_valid"], self.match_first), axis=1
+            lambda x: extract_answer_from_text_map_and_maze(
+                x[self.answer_column_name], x[self.extracted_options_column_name], x["is_valid"], self.match_first
+            ),
+            axis=1,
         )
         return df
diff --git a/eureka_ml_insights/data_utils/toxigen_utils.py b/eureka_ml_insights/data_utils/toxigen_utils.py
index f2dd4ace..7564c898 100644
--- a/eureka_ml_insights/data_utils/toxigen_utils.py
+++ b/eureka_ml_insights/data_utils/toxigen_utils.py
@@ -1,3 +1,7 @@
+"""This module provides classes and functions to parse and label toxicity scores from model outputs.
+It includes data transformations for both discriminative and generative evaluation settings.
+"""
+
 import re
 from dataclasses import dataclass
 
@@ -31,9 +35,16 @@
 
 @dataclass
 class GPT4ToxiGenRegex(DFTransformBase):
-    """
-    This class is for regex and label mapping for discriminative evaluation setting where given a statement the model
-    is asked to perform as a discriminator.
+    """Class GPT4ToxiGenRegex.
+
+    This class is for regex and label mapping for discriminative evaluation settings, where given
+    a statement, the model is asked to perform as a discriminator.
+
+    Attributes:
+        model_output_column (str): The column name in the DataFrame containing model outputs.
+        gt_column (str): The ground-truth label column name in the DataFrame.
+        category (str): The category column name in the DataFrame.
+        merged_group (str): The name of the new column where merged group label is stored.
     """
 
     model_output_column: str
@@ -42,7 +53,14 @@ class GPT4ToxiGenRegex(DFTransformBase):
     merged_group: str
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """Transforms the DataFrame by applying regex parsing and labeling.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame.
 
+        Returns:
+            pd.DataFrame: The transformed DataFrame with parsed scores and updated labels.
+        """
         df[[self.model_output_column, "is_valid"]] = df[self.model_output_column].apply(
             lambda x: pd.Series([parse_output(x, delimiters)[0], parse_output(x, delimiters)[1]])
         )
@@ -53,9 +71,16 @@ def transform(self, df: pd.DataFrame) -> pd.DataFrame:
 
 @dataclass
 class GPT4ToxiGenRegexGenerative(DFTransformBase):
-    """
-    This class is for regex and label mapping for generative evaluation setting where the output is the actual score
-    given by the model.
+    """Class GPT4ToxiGenRegexGenerative.
+
+    This class is for regex and label mapping for generative evaluation settings, where the output
+    is the actual score given by the model.
+
+    Attributes:
+        model_output_column (str): The column name in the DataFrame containing model outputs.
+        gt_column (str): The ground-truth label column name in the DataFrame.
+        category (str): The category column name in the DataFrame.
+        merged_group (str): The name of the new column where merged group label is stored.
     """
 
     model_output_column: str
@@ -64,7 +89,14 @@ class GPT4ToxiGenRegexGenerative(DFTransformBase):
     merged_group: str
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """Transforms the DataFrame by applying regex parsing and labeling in a generative context.
 
+        Args:
+            df (pd.DataFrame): The input DataFrame.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame with parsed scores and updated labels.
+        """
         df[[self.model_output_column, "is_valid"]] = df[self.model_output_column].apply(
             lambda x: pd.Series([parse_output(x, delimiters, True)[0], parse_output(x, delimiters, True)[1]])
         )
@@ -76,15 +108,18 @@ def transform(self, df: pd.DataFrame) -> pd.DataFrame:
 
 
 def parse_output(s, delimiters, generative=False):
-    """
-    Parse the input string to extract toxicity scores.
-    Parameters:
-    s (str): Input string containing model's output.
-    generative (bool): If True returns the raw score instead of "neutral" or "toxic" label.
+    """Parse the input string to extract toxicity scores.
+
+    Args:
+        s (str): Input string containing the model's output.
+        delimiters (list): List of delimiters used to search for scores in the string.
+        generative (bool): If True, returns the raw score instead of the "neutral" or "toxic" label.
+
     Returns:
-    str: A string containing extracted score
+        tuple: A tuple (str, bool). The first element is either the extracted score/label
+        or a placeholder string indicating no score was found, and the second element
+        is a boolean indicating whether a valid score was found.
     """
-
     if s is None:
         return "<API Returns Null>", False
 
@@ -107,12 +142,13 @@ def parse_output(s, delimiters, generative=False):
 
 
 def label_map(s):
-    """
-    Parse the ground truth label to neutral if annotator's score is less than 2.5 or toxic otherwise.
-    Parameters:
-    s (float): Input float containing annotators' score avergaed over annotators.
+    """Convert a float score to a "neutral" or "toxic" label.
+
+    Args:
+        s (float): The input score averaged over annotators.
+
     Returns:
-    str: converted label
+        str: The converted label, either "neutral" or "toxic".
     """
     if s < 2.5:
         label = "neutral"
@@ -122,11 +158,15 @@ def label_map(s):
 
 
 def label_category_map(s):
-    """
-    Parse the prompt file name to get ground truth and category labels:
-    s (str): This refers to various splits in the dataset, e.g., 'neutral_black_1k', 'neutral_immigrant_1k', etc.
+    """Derive ground-truth and category labels from the dataset split string.
+
+    Args:
+        s (str): Refers to various splits in the dataset (e.g., 'neutral_black_1k').
+
     Returns:
-    str: gt_label and category_label
+        tuple: A tuple (str, str). The first element is the ground-truth label
+        ("toxic" if the split starts with 'hate', otherwise "neutral"). The second
+        element is the category label derived from the split string.
     """
     gt = s.split("_")[0]
     category = s.split("_")[1:-1]
diff --git a/eureka_ml_insights/data_utils/transform.py b/eureka_ml_insights/data_utils/transform.py
index 8016de0d..e918a47a 100644
--- a/eureka_ml_insights/data_utils/transform.py
+++ b/eureka_ml_insights/data_utils/transform.py
@@ -1,3 +1,10 @@
+"""A set of classes and transformations for DataFrame modifications.
+
+This module includes a base class (DFTransformBase), along with a variety of transformations
+for renaming columns, running Python code, sampling data, repeating rows, adding columns, 
+and more.
+"""
+
 import ast
 import logging
 import re
@@ -27,15 +34,48 @@
 
 @dataclass
 class DFTransformBase:
+    """
+    Base class for DataFrame transformations.
+
+    This dataclass does not define any attributes. Subclasses must implement
+    the transform() method.
+    """
+
     @abstractmethod
-    def transform(self, df: pd.DataFrame) -> pd.DataFrame: ...
+    def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Transforms a pandas DataFrame.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame.
+        """
+        ...
 
 
 @dataclass
 class SequenceTransform(DFTransformBase):
+    """
+    Chains multiple DFTransformBase transformations in sequence.
+
+    Attributes:
+        transforms (List[DFTransformBase]): The list of transformations to be applied in sequence.
+    """
+
     transforms: List[DFTransformBase]
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Applies each transformation in sequence to the given DataFrame.
+
+        Args:
+            df (pd.DataFrame): The initial DataFrame to transform.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame after all sequential transforms.
+        """
         for transform in self.transforms:
             df = transform.transform(df)
         return df
@@ -43,40 +83,61 @@ def transform(self, df: pd.DataFrame) -> pd.DataFrame:
 
 @dataclass
 class ColumnRename(DFTransformBase):
+    """
+    Renames columns in a DataFrame based on a provided name mapping.
+
+    Attributes:
+        name_mapping (Dict[str, str]): The dictionary mapping old column names to new column names.
+    """
+
     name_mapping: Dict[str, str]
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Renames columns according to name_mapping.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame whose columns will be renamed.
+
+        Returns:
+            pd.DataFrame: DataFrame with renamed columns.
+        """
         return df.rename(columns=self.name_mapping)
 
 
 @dataclass
 class RunPythonTransform(DFTransformBase):
-    """Runs arbitrary python code on the data frame.
-    args:
-        python_code: str: The python code to run on the data frame.
-        global_imports: list: A list of modules to import in global scope, if needed. Default is an empty list.
-                       Local (to the python_code scope) imports can be included in the python_code. Such global
-                       scope imports are needed when the python_code uses a lambda function, for example, since
-                       imports in the python_code scope are not available to the lambda function.
-    returns:
-        df: pd.DataFrame: The transformed data frame.
+    """
+    Runs arbitrary Python code on the DataFrame.
+
+    This transform executes Python code in a controlled environment, allowing only
+    certain statements and imports for security reasons.
+
+    Attributes:
+        python_code (str): The Python code to run on the DataFrame.
+        global_imports (list): A list of modules to import in the global scope, if needed.
     """
 
     python_code: str
     global_imports: list = field(default_factory=list)
 
     def __post_init__(self):
-        # To avoid disastrous consequences, we only allow operations on the data frame.
-        # Therefore, every statement in the python_code should be in the form of df['column_name'] = ... or df = ...
+        """
+        Validates that only permitted operations are allowed in python_code.
+        """
         self.allowed_statement_prefixes = ["df = ", "df[", "import "]
-        # Similarly, we only allow a limited set of imports. To add to this safe list, create a PR.
         self.allowed_imports = ["ast", "math", "numpy"]
         self.validate()
 
     def validate(self):
-        # First, splits the python code into python statements and strips whitespace.
+        """
+        Checks if the provided Python code has allowed statements and imports.
+
+        Raises:
+            ValueError: If any statement does not start with an allowed prefix or if
+                attempting to import a disallowed module.
+        """
         statements = [s.strip() for s in self.python_code.split(";")]
-        # Checks that each statement starts with an allowed prefix.
         for statement in statements:
             if not any(statement.startswith(prefix) for prefix in self.allowed_statement_prefixes):
                 raise ValueError("For security reasons, only imports and operations on the data frame are allowed.")
@@ -86,11 +147,18 @@ def validate(self):
                     raise ValueError(f"Importing {module_name} in RunPythonTransform is not allowed.")
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
-        # Adds 'df' to the global scope of exec so that it can be overwritten during exec if needed.
+        """
+        Executes the stored Python code on the input DataFrame.
+
+        Args:
+            df (pd.DataFrame): The DataFrame on which to run the Python code.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame.
+        """
         exec_globals = {"df": df}
         exec_globals.update(globals())
 
-        # Adds safe imports to exec_globals.
         for module_name in self.global_imports:
             exec_globals[module_name] = __import__(module_name)
 
@@ -101,11 +169,29 @@ def transform(self, df: pd.DataFrame) -> pd.DataFrame:
 
 @dataclass
 class SamplerTransform(DFTransformBase):
+    """
+    Samples rows from the DataFrame, either randomly or by stratification.
+
+    Attributes:
+        random_seed (int): The random seed for reproducibility.
+        sample_count (int): The number of rows to sample.
+        stratify_by (List[str]): Optional columns to stratify upon when sampling.
+    """
+
     random_seed: int
     sample_count: int
     stratify_by: List[str] = None
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Samples rows from the given DataFrame.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame to sample from.
+
+        Returns:
+            pd.DataFrame: A sampled subset of the DataFrame.
+        """
         if self.stratify_by:
             return df.groupby(self.stratify_by, group_keys=False).apply(
                 lambda x: x.sample(n=self.sample_count, random_state=self.random_seed)
@@ -117,14 +203,24 @@ def transform(self, df: pd.DataFrame) -> pd.DataFrame:
 @dataclass
 class MultiplyTransform(DFTransformBase):
     """
-    Repeats each row n times, and adds a column to the data frame indicating the repeat number.
-    Also adds a column to the data frame indicating the data point id that will be the same for
-    all repeats of the same data point.
+    Repeats each row n times and adds a column to indicate the repeat number and data point ID.
+
+    Attributes:
+        n_repeats (int): The number of times to repeat each row.
     """
 
     n_repeats: int
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Repeats each row in the DataFrame.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame.
+
+        Returns:
+            pd.DataFrame: A new DataFrame with repeated rows, including columns for repeat ID and data point ID.
+        """
         dfs = []
         for i in range(self.n_repeats):
             df_copy = df.copy()
@@ -136,52 +232,107 @@ def transform(self, df: pd.DataFrame) -> pd.DataFrame:
 
 @dataclass
 class AddColumn(DFTransformBase):
+    """
+    Adds a new column to the DataFrame with empty strings.
+
+    Attributes:
+        column_name (str): The name of the new column to add.
+    """
+
     column_name: str
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Adds the specified column to the input DataFrame, initializing it with empty strings.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame.
+
+        Returns:
+            pd.DataFrame: DataFrame with the new column added.
+        """
         df[self.column_name] = ""
         return df
 
 
 @dataclass
 class AddColumnAndData(DFTransformBase):
+    """
+    Adds a new column to the DataFrame with a specified value.
+
+    Attributes:
+        column_name (str): The name of the new column to add.
+        data (str): The value used to populate the new column.
+    """
+
     column_name: str
     data: str
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
-        df[self.column_name] = str(self.data)
+        """
+        Adds the specified column to the input DataFrame, initializing it with the provided value.
 
+        Args:
+            df (pd.DataFrame): The input DataFrame.
+
+        Returns:
+            pd.DataFrame: DataFrame with the new column and data added.
+        """
+        df[self.column_name] = str(self.data)
         return df
 
 
 @dataclass
 class CopyColumn(DFTransformBase):
-    """Copy a src column's data to a new dst names column."""
+    """
+    Copies data from a source column to a destination column.
+
+    Attributes:
+        column_name_src (str): The name of the source column.
+        column_name_dst (str): The name of the destination column.
+    """
 
     column_name_src: str
     column_name_dst: str
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
-        df[self.column_name_dst] = df[self.column_name_src]
+        """
+        Copies the data from the source column to the destination column.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame.
 
+        Returns:
+            pd.DataFrame: DataFrame with data copied to the new or existing destination column.
+        """
+        df[self.column_name_dst] = df[self.column_name_src]
         return df
 
 
 @dataclass
 class MultiColumnTransform(DFTransformBase):
     """
-    Transform class to apply a function to multiple columns.
-    This class' validate method checks that the columns to be transformed are present in the data frame,
-    and its transform method applies the _transform method to each column.
+    Base class for applying a transformation function to specified columns.
+
+    This class is meant to be subclassed, and the subclass should implement
+    the _transform method.
 
-    This class is meant to be subclassed, and the subclass should implement the _transform method.
+    Attributes:
+        columns (List[str] | str): The columns to which transformations will be applied.
     """
 
     columns: List[str] | str
 
-    def validate(self, df: pd.DataFrame) -> pd.DataFrame:
-        """Check that all columns to be transformed are present actually in the data frame."""
-        # if columns is not a list, make it a list
+    def validate(self, df: pd.DataFrame):
+        """
+        Checks that all columns to be transformed are present in the DataFrame.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame.
+
+        Raises:
+            ValueError: If any specified columns are not in the DataFrame.
+        """
         if not isinstance(self.columns, list):
             self.columns = [self.columns]
         extra_columns = set(self.columns) - set(df.columns)
@@ -190,7 +341,15 @@ def validate(self, df: pd.DataFrame) -> pd.DataFrame:
             raise ValueError(f"The following columns are not present in the data frame: {msg}")
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
-        """Apply the transform to the columns."""
+        """
+        Applies the _transform method to each specified column.
+
+        Args:
+            df (pd.DataFrame): The DataFrame to transform.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame.
+        """
         if df.empty:
             logging.warn("The input dataframe is empty, no transformation was applied.")
             return df
@@ -199,25 +358,44 @@ def transform(self, df: pd.DataFrame) -> pd.DataFrame:
             df[column] = df[column].apply(self._transform)
         return df
 
+    def _transform(self, value):
+        """
+        Defines the transformation for a single cell value.
+
+        Args:
+            value: The cell value to transform.
+
+        Returns:
+            Any: The transformed cell value.
+        """
+        raise NotImplementedError
+
 
 @dataclass
 class ShuffleColumnsTransform(MultiColumnTransform):
     """
-    For a set of columns, shuffles the values across each row of these columns.
-    Values will be shuffled differently for each row.
+    Shuffles values across specified columns on a per-row basis.
 
-    This class is meant to be used in MCQ benchmarks to shuffle answer choices
-    across different letter options (e.g. shuffle what choice maps to 'A' vs 'B' vs 'C').
-    args:
-        columns: List[str]: the list of columns from the pandas frame to be reshuffled.
-        rng: np.random.Generator: the dedicated numpy generator for the shuffling.
+    This transform is often used in MCQ-like tasks where answer choices
+    need to be shuffled.
+
+    Attributes:
+        columns (List[str]): The list of columns to shuffle.
+        rng (np.random.Generator): The random number generator used for the shuffling.
     """
 
-    columns: List[str]
     rng: np.random.Generator = np.random.default_rng(0)
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
-        """For each row in df, shuffle values across these columns."""
+        """
+        Shuffles values across the specified columns on a per-row basis.
+
+        Args:
+            df (pd.DataFrame): The DataFrame whose columns will be shuffled.
+
+        Returns:
+            pd.DataFrame: A DataFrame with shuffled columns for each row.
+        """
         self.validate(df)
 
         def shuffle_row(row):
@@ -231,31 +409,58 @@ def shuffle_row(row):
 @dataclass
 class ColumnMatchMapTransform(DFTransformBase):
     """
-    Creates a new column indicating the name of the column that matches the value in the key column for each row.
-    E.g. for a row, if value of key_col matches value of 'A' column, new_col will contain the value 'A'.
-    Used to store the letter of the correct answer choice in MCQ benchmarks.
+    Creates a new column that indicates which column matches a key column's value in each row.
+
+    Attributes:
+        key_col (str): The column whose value we look to match across other columns.
+        new_col (str): The name of the resulting column that holds the matching column name.
+        columns (List[str]): The list of columns to search for a match.
     """
 
     key_col: str
     new_col: str
     columns: List[str]
 
-    # Function to find matching column
     def _find_matching_column(self, row):
+        """
+        Finds the name of the column matching the key column's value in a given row.
+
+        Args:
+            row (pd.Series): A row from the DataFrame.
+
+        Returns:
+            str or None: The name of the matching column, or None if none match.
+        """
         for col in self.columns:
             if row[col] == row[self.key_col]:
                 return col
-        return None  # If no match is found (optional)
+        return None
 
     def validate(self, df: pd.DataFrame):
-        """Check that all columns to be transformed are present actually in the data frame."""
+        """
+        Validates that the key column and specified columns exist in the DataFrame.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame.
+
+        Raises:
+            ValueError: If any required columns are not in the DataFrame.
+        """
         extra_columns = set(self.columns + [self.key_col]) - set(df.columns)
         if extra_columns:
             msg = ", ".join(sorted(extra_columns))
             raise ValueError(f"The following columns are not present in the data frame: {msg}")
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
-        """For each row in df, shuffle values across these columns."""
+        """
+        Creates a new column indicating which column matches the key column's value for each row.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame.
+
+        Returns:
+            pd.DataFrame: A DataFrame with the new column holding the matching column name.
+        """
         self.validate(df)
         df[self.new_col] = df.apply(self._find_matching_column, axis=1)
         return df
@@ -263,12 +468,26 @@ def transform(self, df: pd.DataFrame) -> pd.DataFrame:
 
 @dataclass
 class ImputeNA(MultiColumnTransform):
-    """Impute missing values in selected columns with a specified value."""
+    """
+    Imputes missing values in selected columns with a specified value.
+
+    Attributes:
+        columns (List[str] | str): The column(s) to impute.
+        value (str): The value to use for imputation.
+    """
 
-    columns: List[str] | str
     value: str
 
     def _transform(self, value):
+        """
+        Replaces missing values with the specified value.
+
+        Args:
+            value: The cell value to impute if missing.
+
+        Returns:
+            Any: The original value or the imputed replacement if it was missing.
+        """
         isna = pd.isna(value)
         if isinstance(isna, bool) and isna:
             return self.value
@@ -278,52 +497,86 @@ def _transform(self, value):
 @dataclass
 class ReplaceStringsTransform(MultiColumnTransform):
     """
-    Replaces strings in selected columns.  Useful for adhoc fixes, i.e., \\n to \n.
+    Replaces strings in selected columns according to a specified mapping.
+
+    Useful for ad hoc fixes such as replacing '\\n' with '\n'.
+
+    Attributes:
+        columns (List[str] | str): The column(s) to which replacements are applied.
+        mapping (Dict[str, str]): A dictionary of old-to-new string replacements.
+        case (bool): Whether the replacements should be case-sensitive.
     """
 
-    columns: List[str] | str
     mapping: Dict[str, str]
     case: bool
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Replaces occurrences of keys in the specified columns with their mapped values.
+
+        Args:
+            df (pd.DataFrame): The DataFrame to transform.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame.
+        """
         self.validate(df)
         for column in self.columns:
             for source, target in self.mapping.items():
                 df[column] = df[column].str.replace(source, target, case=self.case, regex=False)
-
         return df
 
 
 @dataclass
 class MapStringsTransform(MultiColumnTransform):
     """
-    Map values in certain columns of a pandas dataframe according to a mapping dictionary.
+    Maps values in selected columns to new values according to a specified dictionary.
+
+    Attributes:
+        columns (List[str] | str): The column(s) to apply the map.
+        mapping (Dict[str, str]): A dictionary mapping old strings to new strings.
     """
 
-    columns: List[str] | str
     mapping: Dict[str, str]
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Maps each value in the specified columns using the provided dictionary.
+
+        Args:
+            df (pd.DataFrame): The DataFrame to transform.
+
+        Returns:
+            pd.DataFrame: The transformed DataFrame.
+        """
         self.validate(df)
         for column in self.columns:
             df[column] = df[column].map(self.mapping)
-
         return df
 
 
 @dataclass
 class PrependStringTransform(MultiColumnTransform):
     """
-    Prepends a string for selected columns.
-    args:
-        columns: List of str or str, Column(s) to apply transform to.
-        string: str, string to prepend
+    Prepends a specified string to the values in selected columns.
+
+    Attributes:
+        columns (List[str] | str): The columns to transform.
+        string (str): The string to prepend.
     """
 
-    columns: List[str] | str
     string: str
 
     def _transform(self, value):
+        """
+        Prepends the specified string to a given value.
+
+        Args:
+            value: The cell value to transform.
+
+        Returns:
+            Any: The value with the string prepended.
+        """
         if isinstance(value, list):
             value = [self.string + val for val in value]
         else:
@@ -334,20 +587,29 @@ def _transform(self, value):
 @dataclass
 class RegexTransform(MultiColumnTransform):
     """
-    Find the occurrence of the pattern in selected columns.
-    args:
-        columns: List of str or str, Column(s) to apply transform to.
-        prompt_pattern: str, pattern to search for
-        ignore_case: bool, True if the regex match should ignore the case
-        occurrence: str, "last" or "first" to indicate which of the occurrences to pick
+    Extracts a regex match from string values in selected columns.
+
+    Attributes:
+        columns (List[str] | str): The columns to transform.
+        prompt_pattern (str): The regex pattern to use for matching.
+        ignore_case (bool): Whether to ignore case in the regex search.
+        occurrence (str): Which occurrence to return, either "last" or "first".
     """
 
-    columns: List[str] | str
     prompt_pattern: str
     ignore_case: bool = False
     occurrence: str = "last"
 
     def _transform(self, sentence):
+        """
+        Finds and returns the specified occurrence of a regex match in a string.
+
+        Args:
+            sentence (str): The string to search.
+
+        Returns:
+            str or None: The matched pattern or None if not found.
+        """
         if self.ignore_case:
             results = re.findall(self.prompt_pattern, sentence, flags=re.IGNORECASE)
         else:
@@ -364,12 +626,22 @@ def _transform(self, sentence):
 @dataclass
 class ASTEvalTransform(MultiColumnTransform):
     """
-    Applies ast.literal_eval to parse strings in selected columns
-    """
+    Parses strings in selected columns using ast.literal_eval.
 
-    columns: List[str] | str
+    Attributes:
+        columns (List[str] | str): The columns to transform.
+    """
 
     def _transform(self, string):
+        """
+        Parses a string using ast.literal_eval.
+
+        Args:
+            string (str): The string to parse.
+
+        Returns:
+            Any: The parsed Python object (e.g., list, dict, etc.).
+        """
         list_strings = ast.literal_eval(string)
         return list_strings
 
@@ -377,20 +649,22 @@ def _transform(self, string):
 @dataclass
 class TokenCounterTransform(MultiColumnTransform):
     """
-    Counts the number of tokens in the selected columns.
-    """
+    Counts the number of tokens in the selected columns using tiktoken.
 
-    columns: List[str] | str
+    Attributes:
+        columns (List[str] | str): The columns whose token counts will be computed.
+    """
 
     def transform(self, df: pd.DataFrame, encoding="cl100k_base") -> pd.DataFrame:
         """
-        This method uses tiktoken tokenizer to count the number of tokens in the response.
-        See: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
-        args:
-            df (dataframe): the dataframe to add the token count column to.
-            encoding (str): the encoding to use with tiktoken. Default is "cl100k_base".
-        returns:
-            dataframe: the dataframe with the token count column added.
+        Counts tokens in each specified column using tiktoken.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame with textual data.
+            encoding (str, optional): The tiktoken encoding to use. Defaults to "cl100k_base".
+
+        Returns:
+            pd.DataFrame: DataFrame with additional columns named "<column>_token_count".
         """
         self.validate(df)
         encoding = tiktoken.get_encoding(encoding)
@@ -403,29 +677,36 @@ def transform(self, df: pd.DataFrame, encoding="cl100k_base") -> pd.DataFrame:
 
 @dataclass
 class MajorityVoteTransform:
-    """Applies the majority vote transformation to the specified model output column per id_col.
-    If model_label_column is provided, the corresponding label or score of the majority vote output will also be added.
+    """
+    Applies a majority vote transformation on the specified model output column grouped by ID.
+
+    Attributes:
+        model_output_col (str): The column name for model outputs.
+        model_label_column (str): The column name for corresponding labels or scores.
+        id_col (str): The column name for IDs.
+        majority_vote_col (str): The column name for storing the majority vote result.
+        majority_label_col (str): The column name for storing the label of the majority vote output.
     """
 
-    model_output_col: str = "model_output"  # Default column name for model outputs
-    model_label_column: str = None  # Column name for model labels or scores corresponding to model outputs
-    id_col: str = "data_point_id"  # Default column name for IDs
-    majority_vote_col: str = "majority_vote"  # Default column name for storing majority vote
-    majority_label_col: str = (
-        "majority_label"  # Default column name for storing label corresponding to majority vote output
-    )
+    model_output_col: str = "model_output"
+    model_label_column: str = None
+    id_col: str = "data_point_id"
+    majority_vote_col: str = "majority_vote"
+    majority_label_col: str = "majority_label"
 
     def transform(self, df: pd.DataFrame, random_state: int = 0) -> pd.DataFrame:
         """
-        Transforms the dataframe by calculating the majority vote of model_output_col per id_col.
-        If the 'model_output' is NaN, it will be droped before calculating the majority vote.
+        Calculates the majority vote of model outputs for each group identified by 'id_col'.
+
+        If the model output is NaN, it will be dropped before calculating the majority vote.
 
         Args:
-            df (pd.DataFrame): Input dataframe containing model_output_col and id_col.
-            random_state (int): Input random seed
+            df (pd.DataFrame): The DataFrame containing model outputs and IDs.
+            random_state (int, optional): The random seed for tie-breaking among the mode values.
+                Defaults to 0.
 
         Returns:
-            pd.DataFrame: Transformed dataframe with majority vote for each id_col.
+            pd.DataFrame: The transformed DataFrame, including majority vote columns.
         """
         result_df = df.groupby(self.id_col).apply(
             self.majority_vote,
@@ -442,15 +723,19 @@ def majority_vote(
         group, model_output_col, model_label_col, majority_vote_col, majority_label_col, random_state: int = 0
     ):
         """
-        Calculate majority vote for each group.
+        Calculates the majority vote for each group and optionally its corresponding label.
+
         Args:
-            group (pd.DataFrame): Input dataframe containing model_output_col, id_col and label.
-            model_output_col (str): Model output column name
-            model_label_col (str): Model label column name
-            majority_vote_col (str): Majority vote column name
-            majority_label_col (str): Majority label column name
+            group (pd.DataFrame): The DataFrame group containing model outputs.
+            model_output_col (str): The model output column name.
+            model_label_col (str): The model label column name corresponding to outputs.
+            majority_vote_col (str): The column name for storing the majority vote result.
+            majority_label_col (str): The column name for storing the label of the majority vote output.
+            random_state (int, optional): The random seed for tie-breaking among the mode values.
+                Defaults to 0.
+
         Returns:
-            pd.DataFrame: Transformed dataframe with majority vote for each id_col.
+            pd.DataFrame: The group DataFrame with columns for the majority vote and optional label.
         """
         x = group[model_output_col]
         majority_value = (
@@ -465,12 +750,13 @@ def majority_vote(
 @dataclass
 class ExtractUsageTransform:
     """
-    Extracts token usage completion numbers (except prompt input tokens) for all models.
-    args:
-        model_config: config used for the experiment.
-        usage_completion_output_col: str, default name of the column where completion numbers will be stored for model
-        usage_column: str, default name of the column where usage information is stored for model
-        n_tokens_column: str, default name of the column where number of tokens is stored for model
+    Extracts token usage (completion tokens) for models and stores it in a new column.
+
+    Attributes:
+        model_config (ModelConfig): The model configuration used for the experiment.
+        usage_completion_output_col (str): The column name where completion token usage is stored.
+        usage_column (str): The column name where usage information is stored.
+        n_tokens_column (str): The column name where the number of tokens is stored.
     """
 
     model_config: ModelConfig
@@ -480,13 +766,14 @@ class ExtractUsageTransform:
 
     def transform(self, df: pd.DataFrame) -> pd.DataFrame:
         """
-        Transforms the dataframe by extracting the .
+        Extracts token usage from the DataFrame based on the model configuration.
 
         Args:
-            df (pd.DataFrame): Input dataframe of inference results retrieved with the model_config.
+            df (pd.DataFrame): The DataFrame containing inference results.
 
         Returns:
-            pd.DataFrame: Transformed dataframe with completion token numbers in usage_completion_output_col.
+            pd.DataFrame: The transformed DataFrame with completion token usage in
+            'usage_completion_output_col'.
         """
         usage_completion_read_col = None
         if self.model_config.class_name is GeminiModel:
@@ -508,8 +795,6 @@ def transform(self, df: pd.DataFrame) -> pd.DataFrame:
             logging.warn(
                 f"Model {self.model_config.class_name} is not recognized for extracting completion token usage."
             )
-        # if the model is one for which the usage of completion tokens is known, use that corresponding column for the model
-        # otherwise, use the default "n_output_tokens" which is computed with a universal tokenizer as shown in TokenCounterTransform()
         self.validate(df, usage_completion_read_col)
         if usage_completion_read_col:
             df[self.usage_completion_output_col] = df.apply(
@@ -521,11 +806,16 @@ def transform(self, df: pd.DataFrame) -> pd.DataFrame:
             df[self.usage_completion_output_col] = np.nan
         return df
 
-    def validate(self, df: pd.DataFrame, usage_completion_read_col: str) -> pd.DataFrame:
-        """Check that usage_columns or n_tokens_columns are present actually in the data frame.
+    def validate(self, df: pd.DataFrame, usage_completion_read_col: str):
+        """
+        Validates that necessary usage columns are present in the DataFrame.
+
         Args:
-            df (pd.DataFrame): Input dataframe containing model_output_col and id_col.
-            usage_completion_read_col (str): The column name for token extraction.
+            df (pd.DataFrame): The input DataFrame to validate.
+            usage_completion_read_col (str): The column name used for completion token usage.
+
+        Raises:
+            ValueError: If necessary usage columns are missing.
         """
         if usage_completion_read_col and self.usage_column not in df.columns:
             raise ValueError(f"The {self.usage_column} column is not present in the data frame.")
@@ -534,12 +824,14 @@ def validate(self, df: pd.DataFrame, usage_completion_read_col: str) -> pd.DataF
 
     def _extract_usage(self, row, usage_completion_read_col):
         """
-        Extracts the token usage for a given row if usage column and corresponding completion column exists.
+        Extracts the completion token usage from a single row.
+
         Args:
-            row (pd.Series): A row of the dataframe.
-            usage_completion_read_col (str): The column name to extract the token usage from.
+            row (pd.Series): A row from the DataFrame.
+            usage_completion_read_col (str): The column name used to extract the token usage.
+
         Returns:
-            int: The token usage for the row.
+            int or float: The extracted token count if present, otherwise NaN.
         """
         if not pd.isna(row[self.usage_column]) and usage_completion_read_col in row[self.usage_column]:
             return row[self.usage_column][usage_completion_read_col]
diff --git a/eureka_ml_insights/metrics/aime_metrics.py b/eureka_ml_insights/metrics/aime_metrics.py
index 51d3dd4c..ec03ce1e 100644
--- a/eureka_ml_insights/metrics/aime_metrics.py
+++ b/eureka_ml_insights/metrics/aime_metrics.py
@@ -1,13 +1,40 @@
-import numpy as np
+"""
+This module provides a metric class for checking the numerical match between
+two string representations of numeric values.
+"""
+
 import logging
+
+import numpy as np
+
 from eureka_ml_insights.metrics.metrics_base import ClassicMetric
 
 
 class NumericMatch(ClassicMetric):
-    """This metric class checks numerical match between answer_text and target_text within a margin eps"""
+    """NumericMatch class.
+
+    This metric class checks the numerical match between answer_text and
+    target_text within a margin of eps.
+
+    Attributes:
+        eps (float): Tolerance for the difference between numerical values.
+    """
+
     eps = 1e-6
 
     def __evaluate__(self, answer_text, target_text, is_valid):
+        """Evaluate the match between answer_text and target_text.
+
+        Args:
+            answer_text (str): The predicted numeric value as a string.
+            target_text (str): The actual numeric value as a string.
+            is_valid (bool): Indicates whether the answer is valid or not.
+
+        Returns:
+            str: "correct" if the difference between the numeric values is within eps,
+            otherwise "incorrect". Returns "none" if the numeric values cannot be parsed
+            or if is_valid is False.
+        """
         if not is_valid:
             return "none"
         try:
diff --git a/eureka_ml_insights/metrics/ba_calendar_metrics.py b/eureka_ml_insights/metrics/ba_calendar_metrics.py
index 4d753da5..092b8ab6 100644
--- a/eureka_ml_insights/metrics/ba_calendar_metrics.py
+++ b/eureka_ml_insights/metrics/ba_calendar_metrics.py
@@ -1,359 +1,509 @@
-# This file was authored by BenchAgents authors and is being reused under the MIT license.
-# All code in this file is directly copied from the original source repository.
-# https://github.com/microsoft/benchagents
-
-import ast
-import json
-import re
-import math
-import numpy as np
-from datetime import datetime, timedelta
-
-import pandas as pd
-
-from eureka_ml_insights.metrics.metrics_base import CompositeMetric
-
-# Helper functions
-def check_time_slot_format(solution):
-    pattern = r"^(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday) ([0-9]|[01]\d|2[0-3]):[0-5]\d-([0-9]|[01]\d|2[0-3]):[0-5]\d$"
-    return bool(re.match(pattern, solution))
-
-def generate_time_slots(start_time, end_time, granularity):
-    granularity=5
-    slots = []
-    current_time = start_time
-    while current_time + timedelta(minutes=granularity) <= end_time:
-        slots.append((current_time, current_time + timedelta(minutes=granularity)))
-        current_time += timedelta(minutes=granularity)
-    return slots
-
-def parse_time_block(time_block):
-    start_str, end_str = time_block.split('-')
-    start_time = datetime.strptime(start_str, "%H:%M")
-    end_time = datetime.strptime(end_str, "%H:%M")
-    return start_time, end_time
-
-def filter_slots_by_duration(time_slots, duration):
-    filtered_slots = []
-    for i in range(len(time_slots)):
-        accumulated_duration = timedelta()
-        for j in range(i, len(time_slots)):
-            accumulated_duration += time_slots[j][1] - time_slots[j][0]
-            if accumulated_duration >= timedelta(minutes=duration):
-                filtered_slots.append((time_slots[i][0], time_slots[j][1]))
-                break
-    return filtered_slots
-
-def filter_slots_by_constraints(time_slots, constraints, day):
-    filtered_slots = []
-    for slot in time_slots:
-        start_time, end_time = slot
-        if constraints['no_meetings_before']:
-            nb = int(constraints['no_meetings_before'])
-            no_meetings_before = datetime.strptime(f"{nb}:00", "%H:%M")
-            if start_time < no_meetings_before:
-                continue
-        if constraints['no_meetings_after']:
-            na = int(constraints['no_meetings_after'])
-            no_meetings_after = datetime.strptime(f"{na}:00", "%H:%M")
-            if end_time >= no_meetings_after:
-                continue
-        if constraints['no_meetings_on_weekends'] and day in ['Saturday', 'Sunday']:
-            continue
-        if constraints['no_meetings_during_specific_times']:
-            no_meetings_start, no_meetings_end = parse_time_block(constraints['no_meetings_during_specific_times'])
-            if (start_time < no_meetings_end and end_time > no_meetings_start):
-                continue
-        filtered_slots.append(slot)
-    return filtered_slots
-
-class BACalendarMetric(CompositeMetric):
-    """
-    Composite metric for evaluating if a response for each criteria.
-
-    This metric evaluates if a given response follows the provided constraints.
-    """
-
-    def __init__(self):
-        super().__init__()
-        self.no_solution_response = "No common time slot available"
-
-    def __evaluate__(self, row):
-        results = {}
-        results.update(self.run_programmatic_tests(row))
-        return results
-
-    def run_programmatic_tests(self, instance):
-        result = {}
-        solution = instance['model_output']
-        solution = solution.strip('"').strip('`').strip('\n')
-        if check_time_slot_format(solution):
-            result['format_programmatic'] = 1
-        else:
-            result['format_programmatic'] = 0
-        result.update(self.check_availability_programmatic(instance, solution))
-        result.update(self.check_meeting_duration_programmatic(instance, solution))
-        result.update(self.check_buffer_time_programmatic(instance, solution))
-        result.update(self.check_no_weekends_programmatic(instance, solution))
-        result.update(self.check_time_restrictions_programmatic(instance, solution))
-        result.update(self.check_specific_times_programmatic(instance, solution))
-        result.update(self.check_priority_programmatic(instance, solution))
-        all_correct = 1
-        passed_constraints = []
-        for key, value in result.items():
-            if value == 0:
-                all_correct = 0
-            if value is not None and value != 'NA' and pd.notna(value) and isinstance(value, int):
-                passed_constraints.append(value)
-        result['all_correct'] = all_correct
-        result['fraction_passed'] = np.mean(passed_constraints)
-        result.update(self.compute_constrainedness_programmatic(instance))
-        return result
-
-    def is_formatted(self, solution):
-        run_tests=True
-        if solution == self.no_solution_response:
-            run_tests=False
-        if not check_time_slot_format(solution):
-            run_tests=False
-        return run_tests
-
-    def check_availability_programmatic(self, instance, solution):
-        if not instance['constraints'].get('availability', True):
-            # result = {'availability_programmatic_check': 'NA'}
-            result = {'availability_programmatic_check': None}
-            return result
-        
-        if not self.is_formatted(solution):
-            result = {'availability_programmatic_check': 0}
-            return result
-
-        day, time_range = solution.split()
-        start_time, end_time = parse_time_block(time_range)
-        all_available = 1
-        availability = json.loads(instance['metadata']['availability'].replace("'", '"'))
-        for participant, schedule in availability.items():
-            if day not in schedule:
-                all_available = 0
-                break
-            available_blocks = schedule[day]
-            available = False
-            for block in available_blocks:
-                block_start, block_end = parse_time_block(block)
-                if block_start <= start_time and block_end >= end_time:
-                    available = True
-                    break
-            if not available:
-                all_available = 0
-                break
-
-        return {'availability_programmatic_check': all_available}
-
-    def check_meeting_duration_programmatic(self, instance, solution):
-        if not instance['constraints'].get('meeting_duration', True):
-            # result = {'meeting_duration_programmatic_check': 'NA'}
-            result = {'meeting_duration_programmatic_check': None}
-            return result
-        
-        if not self.is_formatted(solution):
-            result = {'meeting_duration_programmatic_check': 0}
-            return result
-
-        _, time_range = solution.split()
-        start_time, end_time = parse_time_block(time_range)
-        meeting_duration = (end_time - start_time).total_seconds() / 60
-        expected_duration = instance['constraints']['meeting_duration']
-
-        return {'meeting_duration_programmatic_check': int(meeting_duration == expected_duration)}
-
-
-    def check_buffer_time_programmatic(self, instance, solution):
-        buffer_time = instance['constraints'].get('buffer_time_before_and_after_meeting', True)
-        if buffer_time is None or not buffer_time:
-            # result = {'buffer_time_programmatic_check': 'NA'}
-            result = {'buffer_time_programmatic_check': None}
-            return result
-        
-        if not self.is_formatted(solution):
-            result = {'buffer_time_programmatic_check': 0}
-            return result
-
-        buffer_time = instance['constraints']['buffer_time_before_and_after_meeting']
-        day, time_range = solution.split()
-        start_time, end_time = parse_time_block(time_range)
-        buffer_start_time = start_time - timedelta(minutes=buffer_time)
-        buffer_end_time = end_time + timedelta(minutes=buffer_time)
-        all_buffer_respected = 1
-
-        availability = json.loads(instance['metadata']['availability'].replace("'", '"'))
-        for participant, schedule in availability.items():
-            if day not in schedule:
-                all_buffer_respected = 0
-                break
-            available_blocks = schedule[day]
-            buffer_respected = False
-            for block in available_blocks:
-                block_start, block_end = parse_time_block(block)
-                if block_start <= buffer_start_time and block_end >= buffer_end_time:
-                    buffer_respected = True
-                    break
-            if not buffer_respected:
-                all_buffer_respected = 0
-                break
-        return {'buffer_time_programmatic_check': all_buffer_respected}
-
-    def check_no_weekends_programmatic(self, instance, solution):
-        if not instance['constraints'].get('no_meetings_on_weekends', True):
-            # return {'no_weekends_programmatic_check': 'NA'}
-            return {'no_weekends_programmatic_check': None}
-        
-        if not self.is_formatted(solution):
-            return {'no_weekends_programmatic_check': 0}
-
-        day, _ = solution.split()
-        day_of_week = datetime.strptime(day, '%A').weekday()
-        no_weekends = day_of_week < 5
-        return {'no_weekends_programmatic_check': int(no_weekends)}
-
-    def check_time_restrictions_programmatic(self, instance, solution):
-        if not instance['constraints'].get('no_meetings_before', True) and not instance['constraints'].get('no_meetings_after', True):
-            # return {'time_restrictions_programmatic_check': 'NA'}
-            return {'time_restrictions_programmatic_check': None}
-        
-        if not self.is_formatted(solution):
-            return {'time_restrictions_programmatic_check': 0}
-
-        _, time_range = solution.split()
-        start_time, end_time = parse_time_block(time_range)
-
-        no_meetings_before = instance['constraints'].get('no_meetings_before')
-        no_meetings_after = instance['constraints'].get('no_meetings_after')
-
-        if no_meetings_before:
-            nb = int(no_meetings_before)
-            no_meetings_before = datetime.strptime(f"{nb}:00", "%H:%M")
-            if start_time < no_meetings_before:
-                return {'time_restrictions_programmatic_check': 0}
-
-        if no_meetings_after:
-            na = int(no_meetings_after)
-            no_meetings_after = datetime.strptime(f"{na}:00", '%H:%M')
-            if end_time > no_meetings_after:
-                return {'time_restrictions_programmatic_check': 0}
-        return {'time_restrictions_programmatic_check': 1}
-
-    def check_priority_programmatic(self, instance, solution):
-        if not instance['constraints'].get('high_priority_meeting', False):
-            # return {'priority_programmatic_check': 'NA'}
-            return {'priority_programmatic_check': None}
-        
-        if not self.is_formatted(solution):
-            return {'priority_programmatic_check': 0}
-        
-        metadata = instance['metadata']
-        result = False
-        params = instance['params']
-        constraints = instance['constraints']
-        if constraints['buffer_time_before_and_after_meeting']:
-            buffer_time = constraints['buffer_time_before_and_after_meeting']
-        else:
-            buffer_time = 0
-        for day in params['days_of_week']: # TODO: revisit this post data release to ensure consistency
-            common_time_slots = None
-            availability = json.loads(metadata['availability'].replace("'", '"'))
-            for participant, schedule in availability.items():
-                if day in schedule:
-                    participant_time_slots = []
-                    for time_slot in schedule[day]:
-                        start_time, end_time = parse_time_block(time_slot)
-                        time_slots = generate_time_slots(start_time, end_time, params['granularity'])
-                        time_slots = filter_slots_by_duration(time_slots, constraints['meeting_duration'] + 2 * buffer_time)
-                        time_slots = filter_slots_by_constraints(time_slots, constraints, day=day)
-                        participant_time_slots.extend(time_slots)
-                    if common_time_slots is None:
-                        common_time_slots = set(participant_time_slots)
-                    else:
-                        common_time_slots = common_time_slots.intersection(participant_time_slots)
-            if common_time_slots:
-                first_available_slot = sorted(list(common_time_slots))[0]
-                first_available_start = (first_available_slot[0]+timedelta(minutes=buffer_time)).strftime('%H:%M')
-                first_available_end = (first_available_slot[1]-timedelta(minutes=buffer_time)).strftime('%H:%M')
-                result = solution == f"{day} {first_available_start}-{first_available_end}"
-        return {'priority_programmatic_check': int(result)}
-
-    def check_specific_times_programmatic(self, instance, solution):
-        if not instance['constraints'].get('no_meetings_during_specific_times', True):
-            # return {'specific_times_programmatic_check': 'NA'}
-            return {'specific_times_programmatic_check': None}
-        
-        if not self.is_formatted(solution):
-            return {'specific_times_programmatic_check': 0}
-
-        restricted_times = instance['constraints']['no_meetings_during_specific_times']
-        restricted_start, restricted_end = parse_time_block(restricted_times)
-        day, time_range = solution.split()
-        start_time, end_time = parse_time_block(time_range)
-
-        if (start_time < restricted_end and end_time > restricted_start):
-            result = 0
-        else:
-            result = 1
-        return {'specific_times_programmatic_check': result}
-
-    def compute_constrainedness_programmatic(self, instance):
-        """
-        Compute the constrainedness of the problem based on the constraints and availability.
-        The constrainedness is defined as (1 - the ratio of feasible slots to total slots).
-        The higher the constrainedness, the more constrained the problem is.
-        """
-        params = instance['params']
-        constraints = instance['constraints']
-        metadata = instance['metadata']
-        if not instance['constraints']['buffer_time_before_and_after_meeting']:
-            buffer_time_before_and_after_meeting = 0
-        else:
-            buffer_time_before_and_after_meeting = instance['constraints']['buffer_time_before_and_after_meeting']
-        total_slots = 0
-        feasible_slots = 0       
-        for day in params['days_of_week']:
-            common_time_slots = None
-            union_time_slots = None
-            availability = json.loads(metadata['availability'].replace("'", '"'))
-            for participant, schedule in availability.items():
-                if day in schedule:
-                    participant_time_slots = []
-                    participant_time_slots_unconstrained = []
-                    for time_slot in schedule[day]:
-                        start_time, end_time = parse_time_block(time_slot)
-                        time_slots = generate_time_slots(start_time, end_time, params['granularity'])
-                        time_slots = filter_slots_by_duration(time_slots, constraints['meeting_duration'])
-                        participant_time_slots_unconstrained.extend(time_slots)
-                        time_slots = generate_time_slots(start_time, end_time, params['granularity'])
-                        time_slots = filter_slots_by_duration(time_slots, constraints['meeting_duration'] + buffer_time_before_and_after_meeting*2)
-                        time_slots = filter_slots_by_constraints(time_slots, constraints, day=day)
-                        participant_time_slots.extend(time_slots)
-                    if common_time_slots is None:
-                        common_time_slots = set(participant_time_slots)
-                    else:
-                        common_time_slots = common_time_slots.intersection(participant_time_slots)
-                    if union_time_slots is None:
-                        union_time_slots = set(participant_time_slots_unconstrained)
-                    else:
-                        union_time_slots = union_time_slots.union(participant_time_slots_unconstrained)
-            if common_time_slots:
-                feasible_slots +=len(common_time_slots)
-            if union_time_slots:
-                total_slots+=len(union_time_slots)
-
-        # Calculate constrainedness ratio
-        if total_slots > 0:
-            constrainedness_ratio = 1 - (feasible_slots / total_slots)
-        else:
-            constrainedness_ratio = 1
-
-        # Bucket the constrainedness ratio into intervals of 0.2
-        constrainedness_bucket = round(math.floor(constrainedness_ratio / 0.1) * 0.1, 4)
-
-        # Add test result
-        return {'constrainedness': constrainedness_ratio, 'constrainedness_bucket': constrainedness_bucket}
-        
+"""
+This module provides functionality for validating and scoring meeting time slot solutions
+against specified constraints. It includes helper functions for parsing and filtering
+time slots, and a composite metric class (BACalendarMetric) that checks the solution
+format, availability, meeting duration, buffer time, weekend usage, time restrictions,
+specific times restrictions, and meeting priority.
+"""
+
+# This file was authored by BenchAgents authors and is being reused under the MIT license.
+# All code in this file is directly copied from the original source repository.
+# https://github.com/microsoft/benchagents
+
+import json
+import math
+import re
+from datetime import datetime, timedelta
+
+import numpy as np
+import pandas as pd
+
+from eureka_ml_insights.metrics.metrics_base import CompositeMetric
+
+
+def check_time_slot_format(solution):
+    """Check if the provided solution string matches the required time slot format.
+
+    The format is:
+    (Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday) HH:MM-HH:MM
+
+    Args:
+        solution (str): The proposed meeting time slot string.
+
+    Returns:
+        bool: True if it matches the required format, False otherwise.
+    """
+    pattern = r"^(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday) ([0-9]|[01]\d|2[0-3]):[0-5]\d-([0-9]|[01]\d|2[0-3]):[0-5]\d$"
+    return bool(re.match(pattern, solution))
+
+
+def generate_time_slots(start_time, end_time, granularity):
+    """Generate time slots of a given granularity between start_time and end_time.
+
+    Args:
+        start_time (datetime): The start datetime.
+        end_time (datetime): The end datetime.
+        granularity (int): The size of each slot in minutes.
+
+    Returns:
+        list: A list of tuples representing start and end times for each slot.
+    """
+    granularity = 5
+    slots = []
+    current_time = start_time
+    while current_time + timedelta(minutes=granularity) <= end_time:
+        slots.append((current_time, current_time + timedelta(minutes=granularity)))
+        current_time += timedelta(minutes=granularity)
+    return slots
+
+
+def parse_time_block(time_block):
+    """Parse a string time block in the format 'HH:MM-HH:MM' into datetime objects.
+
+    Args:
+        time_block (str): The string representing a time block.
+
+    Returns:
+        tuple: A tuple of two datetime objects (start_time, end_time).
+    """
+    start_str, end_str = time_block.split("-")
+    start_time = datetime.strptime(start_str, "%H:%M")
+    end_time = datetime.strptime(end_str, "%H:%M")
+    return start_time, end_time
+
+
+def filter_slots_by_duration(time_slots, duration):
+    """Filter time slots by ensuring at least the specified duration from a starting slot.
+
+    Args:
+        time_slots (list): A list of tuples (start, end) for time slots.
+        duration (int): Required minimum duration in minutes.
+
+    Returns:
+        list: A list of tuples (start, end) that meet the minimum duration requirement.
+    """
+    filtered_slots = []
+    for i in range(len(time_slots)):
+        accumulated_duration = timedelta()
+        for j in range(i, len(time_slots)):
+            accumulated_duration += time_slots[j][1] - time_slots[j][0]
+            if accumulated_duration >= timedelta(minutes=duration):
+                filtered_slots.append((time_slots[i][0], time_slots[j][1]))
+                break
+    return filtered_slots
+
+
+def filter_slots_by_constraints(time_slots, constraints, day):
+    """Filter the given time slots by various constraints (no meetings before/after,
+    no meetings on weekends, no meetings during specific times).
+
+    Args:
+        time_slots (list): A list of tuples (start_time, end_time) representing slots.
+        constraints (dict): A dictionary containing various meeting constraints.
+        day (str): The day of the week for the given time slots.
+
+    Returns:
+        list: A list of tuples (start_time, end_time) that satisfy all constraints.
+    """
+    filtered_slots = []
+    for slot in time_slots:
+        start_time, end_time = slot
+        if constraints["no_meetings_before"]:
+            nb = int(constraints["no_meetings_before"])
+            no_meetings_before = datetime.strptime(f"{nb}:00", "%H:%M")
+            if start_time < no_meetings_before:
+                continue
+        if constraints["no_meetings_after"]:
+            na = int(constraints["no_meetings_after"])
+            no_meetings_after = datetime.strptime(f"{na}:00", "%H:%M")
+            if end_time >= no_meetings_after:
+                continue
+        if constraints["no_meetings_on_weekends"] and day in ["Saturday", "Sunday"]:
+            continue
+        if constraints["no_meetings_during_specific_times"]:
+            no_meetings_start, no_meetings_end = parse_time_block(constraints["no_meetings_during_specific_times"])
+            if start_time < no_meetings_end and end_time > no_meetings_start:
+                continue
+        filtered_slots.append(slot)
+    return filtered_slots
+
+
+class BACalendarMetric(CompositeMetric):
+    """A composite metric for evaluating if a response meets certain constraints.
+
+    This metric checks the response's format, availability, meeting duration,
+    buffer time, weekend usage, time restrictions, specific time restrictions,
+    and meeting priority.
+    """
+
+    def __init__(self):
+        """Initialize the BACalendarMetric with a default no-solution response."""
+        super().__init__()
+        self.no_solution_response = "No common time slot available"
+
+    def __evaluate__(self, row):
+        """Evaluate the row using programmatic tests.
+
+        Args:
+            row (dict): The row/instance data containing constraints and solutions.
+
+        Returns:
+            dict: A dictionary containing the results of the evaluation.
+        """
+        results = {}
+        results.update(self.run_programmatic_tests(row))
+        return results
+
+    def run_programmatic_tests(self, instance):
+        """Run programmatic tests for the provided instance.
+
+        Args:
+            instance (dict): The instance containing the solution and constraints.
+
+        Returns:
+            dict: A dictionary containing the outcomes of all programmatic checks.
+        """
+        result = {}
+        solution = instance["model_output"]
+        solution = solution.strip('"').strip("`").strip("\n")
+        if check_time_slot_format(solution):
+            result["format_programmatic"] = 1
+        else:
+            result["format_programmatic"] = 0
+        result.update(self.check_availability_programmatic(instance, solution))
+        result.update(self.check_meeting_duration_programmatic(instance, solution))
+        result.update(self.check_buffer_time_programmatic(instance, solution))
+        result.update(self.check_no_weekends_programmatic(instance, solution))
+        result.update(self.check_time_restrictions_programmatic(instance, solution))
+        result.update(self.check_specific_times_programmatic(instance, solution))
+        result.update(self.check_priority_programmatic(instance, solution))
+        all_correct = 1
+        passed_constraints = []
+        for key, value in result.items():
+            if value == 0:
+                all_correct = 0
+            if value is not None and value != "NA" and pd.notna(value) and isinstance(value, int):
+                passed_constraints.append(value)
+        result["all_correct"] = all_correct
+        result["fraction_passed"] = np.mean(passed_constraints)
+        result.update(self.compute_constrainedness_programmatic(instance))
+        return result
+
+    def is_formatted(self, solution):
+        """Check if the proposed solution is in the correct format.
+
+        Args:
+            solution (str): The proposed meeting time slot string.
+
+        Returns:
+            bool: True if the solution is correctly formatted, False otherwise.
+        """
+        run_tests = True
+        if solution == self.no_solution_response:
+            run_tests = False
+        if not check_time_slot_format(solution):
+            run_tests = False
+        return run_tests
+
+    def check_availability_programmatic(self, instance, solution):
+        """Check whether the solution meets the availability constraints for all participants.
+
+        Args:
+            instance (dict): The instance containing metadata (including availability) and constraints.
+            solution (str): The proposed meeting time slot.
+
+        Returns:
+            dict: A dictionary with the availability check result (1 or 0, or None if N/A).
+        """
+        if not instance["constraints"].get("availability", True):
+            return {"availability_programmatic_check": None}
+
+        if not self.is_formatted(solution):
+            return {"availability_programmatic_check": 0}
+
+        day, time_range = solution.split()
+        start_time, end_time = parse_time_block(time_range)
+        all_available = 1
+        availability = json.loads(instance["metadata"]["availability"].replace("'", '"'))
+        for participant, schedule in availability.items():
+            if day not in schedule:
+                all_available = 0
+                break
+            available_blocks = schedule[day]
+            available = False
+            for block in available_blocks:
+                block_start, block_end = parse_time_block(block)
+                if block_start <= start_time and block_end >= end_time:
+                    available = True
+                    break
+            if not available:
+                all_available = 0
+                break
+
+        return {"availability_programmatic_check": all_available}
+
+    def check_meeting_duration_programmatic(self, instance, solution):
+        """Check if the proposed solution's meeting duration matches the expected duration.
+
+        Args:
+            instance (dict): The instance containing constraints (including meeting_duration).
+            solution (str): The proposed meeting time slot.
+
+        Returns:
+            dict: A dictionary with the meeting duration check result (1 or 0, or None if N/A).
+        """
+        if not instance["constraints"].get("meeting_duration", True):
+            return {"meeting_duration_programmatic_check": None}
+
+        if not self.is_formatted(solution):
+            return {"meeting_duration_programmatic_check": 0}
+
+        _, time_range = solution.split()
+        start_time, end_time = parse_time_block(time_range)
+        meeting_duration = (end_time - start_time).total_seconds() / 60
+        expected_duration = instance["constraints"]["meeting_duration"]
+
+        return {"meeting_duration_programmatic_check": int(meeting_duration == expected_duration)}
+
+    def check_buffer_time_programmatic(self, instance, solution):
+        """Check if the solution respects the required buffer time before and after the meeting.
+
+        Args:
+            instance (dict): The instance containing constraints (including buffer_time_before_and_after_meeting).
+            solution (str): The proposed meeting time slot.
+
+        Returns:
+            dict: A dictionary with the buffer time check result (1 or 0, or None if N/A).
+        """
+        buffer_time = instance["constraints"].get("buffer_time_before_and_after_meeting", True)
+        if buffer_time is None or not buffer_time:
+            return {"buffer_time_programmatic_check": None}
+
+        if not self.is_formatted(solution):
+            return {"buffer_time_programmatic_check": 0}
+
+        buffer_time = instance["constraints"]["buffer_time_before_and_after_meeting"]
+        day, time_range = solution.split()
+        start_time, end_time = parse_time_block(time_range)
+        buffer_start_time = start_time - timedelta(minutes=buffer_time)
+        buffer_end_time = end_time + timedelta(minutes=buffer_time)
+        all_buffer_respected = 1
+
+        availability = json.loads(instance["metadata"]["availability"].replace("'", '"'))
+        for participant, schedule in availability.items():
+            if day not in schedule:
+                all_buffer_respected = 0
+                break
+            available_blocks = schedule[day]
+            buffer_respected = False
+            for block in available_blocks:
+                block_start, block_end = parse_time_block(block)
+                if block_start <= buffer_start_time and block_end >= buffer_end_time:
+                    buffer_respected = True
+                    break
+            if not buffer_respected:
+                all_buffer_respected = 0
+                break
+        return {"buffer_time_programmatic_check": all_buffer_respected}
+
+    def check_no_weekends_programmatic(self, instance, solution):
+        """Check if the proposed solution does not fall on weekends, if that constraint is required.
+
+        Args:
+            instance (dict): The instance containing constraints (including no_meetings_on_weekends).
+            solution (str): The proposed meeting time slot.
+
+        Returns:
+            dict: A dictionary with the weekend check result (1 or 0, or None if N/A).
+        """
+        if not instance["constraints"].get("no_meetings_on_weekends", True):
+            return {"no_weekends_programmatic_check": None}
+
+        if not self.is_formatted(solution):
+            return {"no_weekends_programmatic_check": 0}
+
+        day, _ = solution.split()
+        day_of_week = datetime.strptime(day, "%A").weekday()
+        no_weekends = day_of_week < 5
+        return {"no_weekends_programmatic_check": int(no_weekends)}
+
+    def check_time_restrictions_programmatic(self, instance, solution):
+        """Check if the proposed solution adheres to the no_meetings_before and no_meetings_after constraints.
+
+        Args:
+            instance (dict): The instance containing constraints for time restrictions.
+            solution (str): The proposed meeting time slot.
+
+        Returns:
+            dict: A dictionary indicating if the time restrictions are satisfied.
+        """
+        if not instance["constraints"].get("no_meetings_before", True) and not instance["constraints"].get(
+            "no_meetings_after", True
+        ):
+            return {"time_restrictions_programmatic_check": None}
+
+        if not self.is_formatted(solution):
+            return {"time_restrictions_programmatic_check": 0}
+
+        _, time_range = solution.split()
+        start_time, end_time = parse_time_block(time_range)
+
+        no_meetings_before = instance["constraints"].get("no_meetings_before")
+        no_meetings_after = instance["constraints"].get("no_meetings_after")
+
+        if no_meetings_before:
+            nb = int(no_meetings_before)
+            no_meetings_before = datetime.strptime(f"{nb}:00", "%H:%M")
+            if start_time < no_meetings_before:
+                return {"time_restrictions_programmatic_check": 0}
+
+        if no_meetings_after:
+            na = int(no_meetings_after)
+            no_meetings_after = datetime.strptime(f"{na}:00", "%H:%M")
+            if end_time > no_meetings_after:
+                return {"time_restrictions_programmatic_check": 0}
+        return {"time_restrictions_programmatic_check": 1}
+
+    def check_priority_programmatic(self, instance, solution):
+        """Check if the proposed solution meets the high-priority meeting constraint, if enabled.
+
+        This involves ensuring the selected time slot is the earliest possible slot
+        that meets all constraints.
+
+        Args:
+            instance (dict): The instance containing constraints, metadata, and parameters.
+            solution (str): The proposed meeting time slot.
+
+        Returns:
+            dict: A dictionary indicating if the proposed solution meets the priority constraints.
+        """
+        if not instance["constraints"].get("high_priority_meeting", False):
+            return {"priority_programmatic_check": None}
+
+        if not self.is_formatted(solution):
+            return {"priority_programmatic_check": 0}
+
+        metadata = instance["metadata"]
+        result = False
+        params = instance["params"]
+        constraints = instance["constraints"]
+        if constraints["buffer_time_before_and_after_meeting"]:
+            buffer_time = constraints["buffer_time_before_and_after_meeting"]
+        else:
+            buffer_time = 0
+        for day in params["days_of_week"]:
+            common_time_slots = None
+            availability = json.loads(metadata["availability"].replace("'", '"'))
+            for participant, schedule in availability.items():
+                if day in schedule:
+                    participant_time_slots = []
+                    for time_slot in schedule[day]:
+                        start_time, end_time = parse_time_block(time_slot)
+                        time_slots = generate_time_slots(start_time, end_time, params["granularity"])
+                        time_slots = filter_slots_by_duration(
+                            time_slots, constraints["meeting_duration"] + 2 * buffer_time
+                        )
+                        time_slots = filter_slots_by_constraints(time_slots, constraints, day=day)
+                        participant_time_slots.extend(time_slots)
+                    if common_time_slots is None:
+                        common_time_slots = set(participant_time_slots)
+                    else:
+                        common_time_slots = common_time_slots.intersection(participant_time_slots)
+            if common_time_slots:
+                first_available_slot = sorted(list(common_time_slots))[0]
+                first_available_start = (first_available_slot[0] + timedelta(minutes=buffer_time)).strftime("%H:%M")
+                first_available_end = (first_available_slot[1] - timedelta(minutes=buffer_time)).strftime("%H:%M")
+                result = solution == f"{day} {first_available_start}-{first_available_end}"
+        return {"priority_programmatic_check": int(result)}
+
+    def check_specific_times_programmatic(self, instance, solution):
+        """Check if the proposed solution avoids times when meetings are disallowed.
+
+        Args:
+            instance (dict): The instance containing constraints (including no_meetings_during_specific_times).
+            solution (str): The proposed meeting time slot.
+
+        Returns:
+            dict: A dictionary indicating if the specific times constraint is satisfied.
+        """
+        if not instance["constraints"].get("no_meetings_during_specific_times", True):
+            return {"specific_times_programmatic_check": None}
+
+        if not self.is_formatted(solution):
+            return {"specific_times_programmatic_check": 0}
+
+        restricted_times = instance["constraints"]["no_meetings_during_specific_times"]
+        restricted_start, restricted_end = parse_time_block(restricted_times)
+        day, time_range = solution.split()
+        start_time, end_time = parse_time_block(time_range)
+
+        if start_time < restricted_end and end_time > restricted_start:
+            result = 0
+        else:
+            result = 1
+        return {"specific_times_programmatic_check": result}
+
+    def compute_constrainedness_programmatic(self, instance):
+        """Compute the problem's constrainedness based on constraints and availability.
+
+        Constrainedness is defined as (1 - the ratio of feasible slots to total slots).
+        The higher the constrainedness, the more constrained the problem is. The result
+        is also bucketed into intervals of 0.1.
+
+        Args:
+            instance (dict): The instance containing constraints, availability, and parameters.
+
+        Returns:
+            dict: A dictionary containing 'constrainedness' and 'constrainedness_bucket'.
+        """
+        params = instance["params"]
+        constraints = instance["constraints"]
+        metadata = instance["metadata"]
+        if not instance["constraints"]["buffer_time_before_and_after_meeting"]:
+            buffer_time_before_and_after_meeting = 0
+        else:
+            buffer_time_before_and_after_meeting = instance["constraints"]["buffer_time_before_and_after_meeting"]
+        total_slots = 0
+        feasible_slots = 0
+        for day in params["days_of_week"]:
+            common_time_slots = None
+            union_time_slots = None
+            availability = json.loads(metadata["availability"].replace("'", '"'))
+            for participant, schedule in availability.items():
+                if day in schedule:
+                    participant_time_slots = []
+                    participant_time_slots_unconstrained = []
+                    for time_slot in schedule[day]:
+                        start_time, end_time = parse_time_block(time_slot)
+                        time_slots = generate_time_slots(start_time, end_time, params["granularity"])
+                        time_slots = filter_slots_by_duration(time_slots, constraints["meeting_duration"])
+                        participant_time_slots_unconstrained.extend(time_slots)
+                        time_slots = generate_time_slots(start_time, end_time, params["granularity"])
+                        time_slots = filter_slots_by_duration(
+                            time_slots, constraints["meeting_duration"] + buffer_time_before_and_after_meeting * 2
+                        )
+                        time_slots = filter_slots_by_constraints(time_slots, constraints, day=day)
+                        participant_time_slots.extend(time_slots)
+                    if common_time_slots is None:
+                        common_time_slots = set(participant_time_slots)
+                    else:
+                        common_time_slots = common_time_slots.intersection(participant_time_slots)
+                    if union_time_slots is None:
+                        union_time_slots = set(participant_time_slots_unconstrained)
+                    else:
+                        union_time_slots = union_time_slots.union(participant_time_slots_unconstrained)
+            if common_time_slots:
+                feasible_slots += len(common_time_slots)
+            if union_time_slots:
+                total_slots += len(union_time_slots)
+
+        # Calculate constrainedness ratio
+        if total_slots > 0:
+            constrainedness_ratio = 1 - (feasible_slots / total_slots)
+        else:
+            constrainedness_ratio = 1
+
+        # Bucket the constrainedness ratio into intervals of 0.2
+        constrainedness_bucket = round(math.floor(constrainedness_ratio / 0.1) * 0.1, 4)
+
+        return {"constrainedness": constrainedness_ratio, "constrainedness_bucket": constrainedness_bucket}
diff --git a/eureka_ml_insights/metrics/f1score_metrics.py b/eureka_ml_insights/metrics/f1score_metrics.py
index 31439ac5..fd9a4bd0 100644
--- a/eureka_ml_insights/metrics/f1score_metrics.py
+++ b/eureka_ml_insights/metrics/f1score_metrics.py
@@ -3,13 +3,37 @@
 
 from .metrics_base import ClassicMetric
 
+"""
+Module containing the MaxTokenF1ScoreMetric class, which calculates the maximum token-based
+F1 score from model output and ground truth answers.
+"""
+
 
 class MaxTokenF1ScoreMetric(ClassicMetric):
+    """A metric for calculating the maximum token-based F1 score across multiple ground truth answers."""
+
     def tokenize(self, sentence):
+        """Tokenize the input sentence into a list of word tokens.
+
+        Args:
+            sentence (str): The input sentence to tokenize.
+
+        Returns:
+            list: A list of tokens extracted from the sentence.
+        """
         return re.findall(r"\b\w+\b", sentence.lower())
 
-    # Function to compute F1 score between two responses
     def __evaluate__(self, model_output, ground_truth, is_valid):
+        """Compute the maximum F1 score for the model's output across multiple ground truth answers.
+
+        Args:
+            model_output (str): The model's output.
+            ground_truth (list of str): A list of ground truth answers.
+            is_valid (bool): A flag indicating whether the model's output is valid.
+
+        Returns:
+            float: The maximum F1 score among the ground truth answers, or 0 if invalid.
+        """
         if not is_valid:
             return 0
         model_answer = model_output
@@ -20,26 +44,29 @@ def __evaluate__(self, model_output, ground_truth, is_valid):
         return max_f1
 
     def _f1_score(self, model_output, answer):
-        # Tokenize both responses
+        """Compute the F1 score between a model output and a single ground truth answer.
+
+        Args:
+            model_output (str): The model's output.
+            answer (str): A ground truth answer.
+
+        Returns:
+            float: The calculated F1 score between the model output and the ground truth answer.
+        """
         target_tokens = self.tokenize(answer.lower())
         generated_tokens = self.tokenize(model_output.lower())
 
-        # Count tokens
         target_counter = Counter(target_tokens)
         generated_counter = Counter(generated_tokens)
 
-        # Calculate true positives
         true_positive = sum((target_counter & generated_counter).values())
 
-        # Total tokens in generated and target responses
         total_generated = len(generated_tokens)
         total_target = len(target_tokens)
 
-        # Precision and recall calculations
         precision = true_positive / total_generated if total_generated > 0 else 0
         recall = true_positive / total_target if total_target > 0 else 0
 
-        # Calculate F1 score
         if precision + recall == 0:
             f1 = 0
         else:
diff --git a/eureka_ml_insights/metrics/geomtric_reasoning_metrics.py b/eureka_ml_insights/metrics/geomtric_reasoning_metrics.py
index 844fb69d..de2f221e 100644
--- a/eureka_ml_insights/metrics/geomtric_reasoning_metrics.py
+++ b/eureka_ml_insights/metrics/geomtric_reasoning_metrics.py
@@ -1,15 +1,38 @@
+"""Module containing the GeoMCQMetric class for evaluating multiple-choice questions.
+
+GeoMCQMetric is derived from SpatialAndLayoutReasoningMetric and enforces that a 
+correct prediction must be one of the valid multiple-choice options.
+"""
+
 import ast
 
 from .spatial_and_layout_metrics import SpatialAndLayoutReasoningMetric
 
 
 class GeoMCQMetric(SpatialAndLayoutReasoningMetric):
-    """This class is a metric that requires a correct prediction to be only one of the valid multiple choice answers."""
+    """
+    A metric that requires a correct prediction to be only one of the valid multiple-choice answers.
+    """
 
     def __init__(self):
+        """
+        Initializes the GeoMCQMetric class and calls the parent class constructor.
+        """
         super().__init__()
 
     def __evaluate__(self, answer_text, target_text, target_options, is_valid):
+        """
+        Evaluates whether the predicted answer matches the target answer within valid multiple-choice options.
+
+        Args:
+            answer_text (str): The predicted answer text.
+            target_text (str): The ground truth answer text.
+            target_options (str): The multiple-choice options as a string representation (e.g., a list).
+            is_valid (bool): Whether the context for evaluation is valid.
+
+        Returns:
+            str: The evaluated result. Returns "none" if not valid; otherwise, calls the superclass evaluation.
+        """
         if not is_valid:
             return "none"
 
diff --git a/eureka_ml_insights/metrics/ifeval_metrics.py b/eureka_ml_insights/metrics/ifeval_metrics.py
index b5829a5a..e65b72b7 100644
--- a/eureka_ml_insights/metrics/ifeval_metrics.py
+++ b/eureka_ml_insights/metrics/ifeval_metrics.py
@@ -1,3 +1,9 @@
+"""This module provides the IFEvalMetric class for evaluating if a response follows instructions.
+
+The IFEvalMetric extends the CompositeMetric class to measure compliance with instructions
+using both strict and loose evaluations.
+"""
+
 from eureka_ml_insights.metrics.ifeval_instruction_utils import (
     test_instruction_following_loose,
     test_instruction_following_strict,
@@ -6,17 +12,27 @@
 
 
 class IFEvalMetric(CompositeMetric):
-    """
-    Composite metric for evaluating if a response follows instructions.
+    """Composite metric for evaluating if a response follows instructions.
 
-    This metric evaluates if a given response follows the provided instructions.
-    It calculates both strict and loose evaluation scores based on the response's adherence to the instructions.
+    This class calculates both strict and loose evaluation scores based on the response's
+    adherence to the instructions by leveraging the CompositeMetric base class.
     """
 
     def __init__(self):
+        """Initializes the IFEvalMetric instance."""
         super().__init__()
 
     def __evaluate__(self, row):
+        """Evaluates the row data and returns the instruction following metric results.
+
+        Args:
+            row (dict): A dictionary containing various keys such as 'instruction_id_list' and
+                'is_valid' that describe the prompt-response pair and other metadata.
+
+        Returns:
+            dict: A dictionary containing the evaluation results, which includes metrics on how
+            well the response follows instructions.
+        """
         tier0_instructions = []
         for index, instruction_id in enumerate(row["instruction_id_list"]):
             tier0_instructions.append(instruction_id.split(":")[0])
@@ -37,10 +53,22 @@ def __evaluate__(self, row):
         return self.evaluate_instruction_following(row)
 
     def evaluate_instruction_following(self, row, include_loose_evaluation=True):
-        """Evaluates if response follows instructions."""
+        """Evaluates instruction following for a given row.
+
+        Args:
+            row (dict): The dictionary containing 'prompt', 'response', 'instruction_id_list',
+                'kwargs', and other relevant data.
+            include_loose_evaluation (bool): Whether to include loose evaluation in addition to
+                strict evaluation. Defaults to True.
+
+        Returns:
+            dict: A dictionary combining the results from strict evaluation and, if enabled, loose
+            evaluation.
+        """
         results_strict = test_instruction_following_strict(
             row["prompt"], row["response"], row["instruction_id_list"], row["kwargs"]
         )
+        results_loose = {}
         if include_loose_evaluation:
             results_loose = test_instruction_following_loose(
                 row["prompt"], row["response"], row["instruction_id_list"], row["kwargs"]
diff --git a/eureka_ml_insights/metrics/kitab_metrics.py b/eureka_ml_insights/metrics/kitab_metrics.py
index 56f4b96f..0bcf7b5f 100644
--- a/eureka_ml_insights/metrics/kitab_metrics.py
+++ b/eureka_ml_insights/metrics/kitab_metrics.py
@@ -1,3 +1,12 @@
+"""
+This module provides the KitabMetric class for evaluating constraints on textual data,
+including checking for human names, city names, word counts, publishing years, etc.
+It leverages the Azure Language Service for entity recognition and fuzzy matching
+for approximate checks. It also relies on GPT-4 preprocessed human names to validate
+titles. Metrics such as completeness, correctness, and constrainedness are computed
+to aid in the assessment of model-generated outputs against ground truth data.
+"""
+
 # This file was authored by authors of the Kitab dataset (https://huggingface.co/datasets/microsoft/kitab)
 # All code in this file is copied from the original source repository and then adapted to fit this repository.
 # The original license for this code is Community Data License Agreement - Permissive - Version 2.0
@@ -20,18 +29,30 @@
 from azure.identity import DefaultAzureCredential
 from fuzzywuzzy import fuzz
 
+from eureka_ml_insights.data_utils import kitab_utils
 from eureka_ml_insights.metrics import CompositeMetric
 from eureka_ml_insights.secret_management import get_secret
-from eureka_ml_insights.data_utils import kitab_utils
+
 
 class KitabMetric(CompositeMetric):
+    """
+    A metric class that extends CompositeMetric to evaluate constraints on
+    text dataset rows. This includes checking for author correctness, constraints
+    such as word counts, human names, city names, and data completeness.
+    """
+
     stopwords = set(["a", "an", "the", "in", "is", "of", "on", "for", "with", "to", "and"])
 
     def __init__(self, temp_path_names, azure_lang_service_config):
         """
-        args:
-            temp_path_names: path where to store the pre extracted human names via gpt-4. The file is then used to complement human names extracted via gpt-4 with those extracted via the Azure Language service.
-            azure_lang_service_config: config dict for the Azure Language Service that will be used for evaluating human and city name constraints.
+        Initializes the KitabMetric class.
+
+        Args:
+            temp_path_names (str): Path where to store the pre-extracted human names via GPT-4.
+                The file is then used to complement human names extracted via GPT-4 with those
+                extracted via the Azure Language Service.
+            azure_lang_service_config (dict): Configuration dictionary for the Azure Language Service
+                that will be used for evaluating human and city name constraints.
         """
         super().__init__()
         self.gpt4_names = kitab_utils.get_gpt4_preprocessed_names(
@@ -49,55 +70,62 @@ def __init__(self, temp_path_names, azure_lang_service_config):
         self.text_analytics_credential = self.get_verified_credential()
 
     def get_verified_credential(self):
+        """
+        Attempts to create and validate a credential for the Azure Text Analytics Client.
+        Tries first AzureKeyCredential, then DefaultAzureCredential as a fallback,
+        returning either one if successful.
+
+        Returns:
+            AzureKeyCredential or DefaultAzureCredential: A verified credential for the Azure
+                Text Analytics if successfully created, otherwise None.
+        """
         model_version = "latest"
         try:
             text_analytics_client = TextAnalyticsClient(endpoint=self.endpoint, credential=AzureKeyCredential(self.key))
             text_analytics_client.recognize_entities(["New York City"], model_version=model_version)
             return AzureKeyCredential(self.key)
         except Exception as e:
-            logging.info(f"Failed to create the TextAnalyticsClient using AzureKeyCredential")
+            logging.info("Failed to create the TextAnalyticsClient using AzureKeyCredential")
             logging.info("The error is caused by: {}".format(e))
         try:
             text_analytics_client = TextAnalyticsClient(endpoint=self.endpoint, credential=DefaultAzureCredential())
             text_analytics_client.recognize_entities(["New York City"], model_version=model_version)
             return DefaultAzureCredential()
         except Exception as e:
-            logging.info(f"Failed to create the TextAnalyticsClient using DefaultAzureCredential")
+            logging.info("Failed to create the TextAnalyticsClient using DefaultAzureCredential")
             logging.info("The error is caused by: {}".format(e))
         return None
 
     def __evaluate__(self, row):
+        """
+        Evaluates a single row of data. If the row is invalid, returns 'none',
+        otherwise processes and returns the evaluation result.
+
+        Args:
+            row (dict): A dictionary containing row data. Must include the 'is_valid' key.
+
+        Returns:
+            str or dict: 'none' if the row is invalid, otherwise a dictionary of
+            evaluation metrics produced by process_row.
+        """
         if not row["is_valid"]:
             return "none"
         return self.process_row(row, self.gpt4_names)
 
     def process_row(self, row, gpt4_names):
         """
-        Process a row of data to identify correct, incorrect, and hallucinated book titles based on given constraints.
+        Processes a row of data to identify correct, incorrect, and hallucinated book titles
+        based on given constraints.
+
         Args:
-            row (dict): A dictionary containing the input row data with columns 'mapped_books', 'model_books',
-                        'all_books', 'raw_books', 'constraint_type', and 'constraints'.
+            row (dict): A dictionary containing the input row data with keys:
+                'mapped_books', 'model_books', 'all_books', 'raw_books', 'constraint_type',
+                and 'constraints'.
             gpt4_names (list): A list of human names used by the GPT-4 model for comparison.
+
         Returns:
-            tuple: A tuple containing three elements:
-                - A dictionary containing the processed results including correct, incorrect, and hallucinated book
-                     titles, counts, and mappings.
-                - An integer representing the number of unmapped raw books.
-                - An integer representing the original count of model books before processing.
-        Raises:
-            ValueError: If the input row or constraints are not in the expected format.
-        Note:
-            This function assumes the following format for input row:
-            - 'mapped_books', 'all_books', 'raw_books' are lists of book titles in string format.
-            - 'model_books' is either a list of book titles in string format or a dictionary containing 'titles' key
-                 with a list of book titles.
-            Constraints can be of the following types:
-            - 'starts-with': Check if the model books start with a specified prefix.
-            - 'ends-with': Check if the model books end with a specified suffix.
-            - 'word-count': Check if the model books have a specified word count.
-            - 'publishing-year': Check if the model books' publishing year falls within a specified range.
-            - 'human-name': Check if the model books contain a specified human name.
-            - 'city-name': Check if the model books contain a specified city name.
+            dict: A dictionary containing processed results, such as correct, incorrect,
+            and hallucinated book titles, counts, mappings, completeness, and correctness.
         """
         satisfied = []
         unsatisfied = []
@@ -118,7 +146,7 @@ def process_row(self, row, gpt4_names):
 
         len(model_books)
 
-        # check for not_from_author, map model books to data books
+        # Map model books to data books or identify them as not from author
         existing_titles_model_titles = {}
         for book in model_books.copy():
             if book == "":
@@ -128,7 +156,6 @@ def process_row(self, row, gpt4_names):
             if not any(book in item for item in all_books) and not any(item in book for item in all_books):
                 close_enough, existing_title = self.fuzzy_compare(book, all_books, threshold=80)
                 if not close_enough:
-                    # book not in raw_books:
                     if not any(book in item for item in raw_books) and not any(item in book for item in raw_books):
                         close_enough_raw, _ = self.fuzzy_compare(book, raw_books, threshold=80)
                         if not close_enough_raw:
@@ -137,7 +164,7 @@ def process_row(self, row, gpt4_names):
                     raw_unmapped.append(book)
                     model_books.remove(book)
                     continue
-            # book in all_books. So Check if in mapped_books and then:
+
             if existing_title == "":
                 existing_title = next((item for item in all_books if book in item or item in book), None)
 
@@ -146,7 +173,7 @@ def process_row(self, row, gpt4_names):
 
             existing_titles_model_titles[existing_title].append(book)
 
-        # check for satisfaction for non-hallucinated books
+        # Check constraints for non-hallucinated books
         constraint_list = row["constraints"].split(",")
         constraint_count = len(constraint_list)
         if constraint_count == 1:
@@ -163,12 +190,9 @@ def process_row(self, row, gpt4_names):
                 satisfied = satisfied_c
                 unsatisfied = unsatisfied_c
             else:
-                # the list of books that satisfy both constraints is the intersection of the respective lists
-                # that satisfy each constraint individually
                 satisfied = list(set(satisfied_c).intersection(satisfied))
-                # the list of books that do not satisfy at least one of constraints is the (distinct) union of the respective lists
-                # that do not satisfy each constraint individually
                 unsatisfied = list(set(unsatisfied_c).union(unsatisfied))
+
         not_from_author = list(set(not_from_author))
         satisfied = list(set(satisfied))
         unsatisfied = list(set(unsatisfied))
@@ -180,13 +204,11 @@ def process_row(self, row, gpt4_names):
         count_not_from_author = len(not_from_author)
         count_raw_unmapped = len(raw_unmapped)
         number_of_clusters = count_not_from_author + len(existing_titles_model_titles.keys())
-        # query constrainedness is defined as \kappa = 1 - \frac{S}{N}
-        # where S is the number of ground truth books that satisfy the constraint
-        # N is the total number of books for the author
-        # the higher the constrainedness of a query, the more difficult the query is
+
+        # Constrainedness computation
         constrainedness = 1 - count_mapped_books / count_all_books
 
-        # compute satisfaction, unsatisfaction, and not from author rates based on counts
+        # Compute satisfaction, unsatisfaction, and not from author rates
         if number_of_clusters > 0:
             satisfied_rate = count_satisfied / number_of_clusters
             unsatisfied_rate = count_unsatisfied / number_of_clusters
@@ -196,7 +218,7 @@ def process_row(self, row, gpt4_names):
             unsatisfied_rate = np.nan
             not_from_author_rate = np.nan
 
-        # compute completeness and correctness
+        # Compute completeness
         if ast.literal_eval(row["mapped_books"]):
             set_mapped_books = set(self.process_title(book) for book in ast.literal_eval(row["mapped_books"]))
             set_satisfied = set(self.process_title(book) for book in ast.literal_eval(str(satisfied)))
@@ -209,7 +231,6 @@ def process_row(self, row, gpt4_names):
 
         all_correct = int((completeness == 1) & (satisfied_rate == 1) & (not_from_author_rate == 0))
 
-        # handle corner cases
         if row["mapped_books"] == "[]" and row["model_books"] == "[]":
             completeness = 1
             satisfied_rate = 1
@@ -218,7 +239,6 @@ def process_row(self, row, gpt4_names):
             all_correct = 1
         elif row["mapped_books"] == "[]" and row["model_books"] != "[]":
             completeness = np.nan
-            # the rest is unchanged
         elif row["mapped_books"] != "[]" and row["model_books"] == "[]":
             completeness = 0
             satisfied_rate = np.nan
@@ -252,6 +272,21 @@ def process_row(self, row, gpt4_names):
     def compute_satisfaction_unsatisfaction(
         self, existing_titles_model_titles, single_constraint, constraint_type, all_books, str_all_books
     ):
+        """
+        Determines which titles satisfy or do not satisfy a single constraint.
+
+        Args:
+            existing_titles_model_titles (dict): A mapping from recognized titles to lists of model books.
+            single_constraint (str): The textual representation of the constraint (e.g., "starts with J.").
+            constraint_type (str): The type of constraint (e.g., "starts-with").
+            all_books (list): A list of all possible books (processed titles).
+            str_all_books (str): A string representation of all books, used for year parsing.
+
+        Returns:
+            dict: A dictionary with two keys:
+                "satisfied": A list of titles that satisfy the constraint.
+                "unsatisfied": A list of titles that do not satisfy the constraint.
+        """
         satisfied = []
         unsatisfied = []
         if single_constraint[-1] != ".":
@@ -299,7 +334,6 @@ def compute_satisfaction_unsatisfaction(
                         unsatisfied.append(existing_title)
                     else:
                         satisfied.append(existing_title)
-
             elif constraint_type == "city-name":
                 if "doesn't" not in single_constraint:
                     if self.check_city_name(model_book_list):
@@ -316,23 +350,21 @@ def compute_satisfaction_unsatisfaction(
 
     def process_title(self, title):
         """
-        Process a book title by converting it to lowercase, replacing '&' with 'and',
+        Processes a book title by converting it to lowercase, replacing '&' with 'and',
         removing punctuation, and excluding common starting words ('the', 'a', 'an').
-        Parameters:
-        title (str): Input book title.
+
+        Args:
+            title (str): Input book title.
+
         Returns:
-        str: Processed book title.
+            str: Processed book title.
         """
-        # Convert string to lowercase
         title = title.lower()
-        # Replace '&' with 'and'
         title = title.replace("&", "and")
 
-        # Remove punctuation
         translator = str.maketrans("", "", string.punctuation)
         title = title.translate(translator)
 
-        # Remove first word if it's in ['the', 'a', 'an']
         first_word = title.split()[0] if title.split() else ""
         if first_word in ["the", "a", "an"]:
             title = " ".join(title.split()[1:])
@@ -341,27 +373,33 @@ def process_title(self, title):
 
     def process_all_books(self, title):
         """
-        Process a book title by removing the (xxxx) format at the end of the title.
-        Parameters:
-        title (str): Input book title.
+        Processes a book title by removing the (xxxx) format at the end of the title.
+
+        Args:
+            title (str): Input book title.
+
         Returns:
-        str: Processed book title.
+            str: Processed book title without trailing year information.
         """
-        # Use a regex pattern to remove the (xxxx) format
-        pattern = r"\(\d{3,4}\)$"
+        pattern = r"\\(\\d{3,4}\\)$"
+        pattern = r"\(\d{3,4}\)$"  # Corrected double escaping from the original code
         processed_title = re.sub(pattern, "", title).strip()
         return processed_title
 
     def fuzzy_compare(self, title, list_of_title, threshold=90):
         """
-        Perform fuzzy string comparison between the input title and a list of titles.
-        Parameters:
-        title (str): Input book title.
-        list_of_titles (list): List of book titles for comparison.
-        threshold (int): Minimum similarity score required for a match (default is 90).
+        Performs fuzzy string comparison between the input title and a list of titles, checking
+        for approximate matches above a given threshold.
+
+        Args:
+            title (str): Input book title to compare.
+            list_of_title (list): A list of book titles against which to compare.
+            threshold (int, optional): Minimum similarity score required for a match
+                (default is 90).
+
         Returns:
-        tuple: A tuple containing a boolean indicating if a match was found and the matched title (if found).
-        Example: (True, 'Matching Title') or (False, '')
+            tuple: A tuple with a boolean indicating if a match was found and the matched title
+            (if any). For example, (True, 'Matching Title') or (False, '').
         """
         for compare_title in list_of_title:
             if fuzz.ratio(compare_title, title) >= threshold:
@@ -370,11 +408,15 @@ def fuzzy_compare(self, title, list_of_title, threshold=90):
 
     def extract_cities(self, text):
         """
-        Extract cities mentioned in the input text using Azure Text Analytics and external data source.
-        Parameters:
-        text (str): Input text containing city names.
+        Extracts city names mentioned in the input text using Azure Text Analytics and
+        an external data source that verifies city existence.
+
+        Args:
+            text (str): Input text containing city names.
+
         Returns:
-        list: A list of extracted city names.
+            list: A list of extracted city names. The function checks recognized location
+            entities against a public API for city data.
         """
         error_flag = True
         max_tries = 10
@@ -387,15 +429,12 @@ def extract_cities(self, text):
                     endpoint=self.endpoint, credential=self.text_analytics_credential
                 )
 
-                # Use the given text as the input
                 input_texts = [text]
-
                 result = text_analytics_client.recognize_entities(input_texts, model_version="latest")
 
                 error_flag = any([review.is_error for review in result])
                 result = [review for review in result if not review.is_error]
 
-                # Extract location entities
                 location_entities = []
                 for review in result:
                     for entity in review.entities:
@@ -424,11 +463,13 @@ def extract_cities(self, text):
 
     def extract_persons(self, text):
         """
-        Extract persons mentioned in the input text using Azure Text Analytics service.
-        Parameters:
-        text (str): Input text containing person names.
+        Extracts person names mentioned in the input text using Azure Text Analytics service.
+
+        Args:
+            text (str): Input text containing person names.
+
         Returns:
-        list: A list of extracted person names.
+            list: A list of extracted person names.
         """
         error_flag = True
         max_tries = 10
@@ -439,9 +480,7 @@ def extract_persons(self, text):
                 text_analytics_client = TextAnalyticsClient(
                     endpoint=self.endpoint, credential=self.text_analytics_credential, api_version="2023-04-01"
                 )
-                # Use the given text as the input
                 input_texts = [text]
-
                 result = text_analytics_client.recognize_entities(input_texts, model_version="2023-04-15-preview")
 
                 error_flag = any([review.is_error for review in result])
@@ -482,24 +521,33 @@ def extract_persons(self, text):
         return persons
 
     def handle_azure_language_service_exception(self, e):
+        """
+        Handles exceptions from Azure Language Service calls by logging a warning
+        and waiting for a moment before retrying.
+
+        Args:
+            e (Exception): The exception raised during the Azure Language Service call.
+        """
         logging.warning(f"Azure Language Service call failed: {e}")
         time.sleep(1)
 
     def check_starts_with(self, books, l):
         """
-        Check if any book title in the given list starts with the specified letter or word.
-        Parameters:
-        books (list): List of book titles.
-        l (str): Letter or word to check for at the beginning of the titles.
-        stopwords (list): List of stopwords to ignore (default is an empty list).
+        Checks if any book title in the given list starts with a specified letter or word.
+        Considers stopwords within the title.
+
+        Args:
+            books (list): List of book titles.
+            l (str): Letter or word to check for at the beginning of the titles.
+
         Returns:
-        bool: True if any title starts with the specified letter or word, False otherwise.
+            bool: True if any title starts with the specified letter or word, False otherwise.
         """
         for s in books:
             words = s.split()
-            if words[0].lower().startswith(l.lower()):
+            if words and words[0].lower().startswith(l.lower()):
                 return True
-            if words[0].lower() in self.stopwords:
+            if words and words[0].lower() in self.stopwords:
                 words.pop(0)
             if words and words[0].lower().startswith(l.lower()):
                 return True
@@ -507,12 +555,14 @@ def check_starts_with(self, books, l):
 
     def check_ends_with(self, books, l):
         """
-        Check if any book title in the given list ends with the specified letter or word.
-        Parameters:
-        books (list): List of book titles.
-        l (str): Letter or word to check for at the end of the titles.
+        Checks if any book title in the given list ends with the specified letter or word.
+
+        Args:
+            books (list): List of book titles.
+            l (str): Letter or word to check for at the end of the titles.
+
         Returns:
-        bool: True if any title ends with the specified letter or word, False otherwise.
+            bool: True if any title ends with the specified letter or word, False otherwise.
         """
         for s in books:
             words = s.split()
@@ -522,13 +572,15 @@ def check_ends_with(self, books, l):
 
     def check_word_count(self, books, c, delta=1):
         """
-        Check if any book title in the given list has a word count within a specified range.
-        Parameters:
-        books (list): List of book titles.
-        c (int): Target word count to check against.
-        delta (int): Allowable difference from the target word count (default is 1).
+        Checks if any book title in the given list has a word count within a specified range.
+
+        Args:
+            books (list): List of book titles.
+            c (int): Target word count to check against.
+            delta (int, optional): Allowable difference from the target word count (default is 1).
+
         Returns:
-        bool: True if any title has a word count within the specified range, False otherwise.
+            bool: True if any title has a word count within the specified range, False otherwise.
         """
         for s in books:
             word_count = len(s.split())
@@ -538,12 +590,15 @@ def check_word_count(self, books, c, delta=1):
 
     def check_publishing_year(self, pub_year, year_range):
         """
-        Check if the given publishing year falls within the specified year range.
-        Parameters:
-        pub_year (int): The publishing year to be checked.
-        year_range (tuple): A tuple containing two integers representing the start and end of the allowed year range.
+        Checks if the given publishing year falls within the specified year range.
+
+        Args:
+            pub_year (int): The publishing year to be checked.
+            year_range (list): A list of integers representing the start and end
+                of the allowed year range.
+
         Returns:
-        bool: True if the publishing year is within the specified range, False otherwise.
+            bool: True if the publishing year is within the specified range, False otherwise.
         """
         if pub_year >= year_range[0] and pub_year <= year_range[1]:
             return True
@@ -552,12 +607,15 @@ def check_publishing_year(self, pub_year, year_range):
 
     def check_human_name(self, books, gpt4_names):
         """
-        Check if any book title contains a human name, either by direct extraction or fuzzy comparison.
-        Parameters:
-        books (list): List of book titles to check.
-        gpt4_names (set): Set of human names generated by GPT-4 for fuzzy comparison.
+        Checks if any book title contains a human name, either by direct extraction
+        from the text or via fuzzy comparison with GPT-4 provided names.
+
+        Args:
+            books (list): List of book titles to check.
+            gpt4_names (list): List of human names generated by GPT-4 for fuzzy comparison.
+
         Returns:
-        bool: True if any title contains a human name, False otherwise.
+            bool: True if any title contains a human name, False otherwise.
         """
         for book in books:
             if len(self.extract_persons(book)) > 0 or self.fuzzy_compare(book, gpt4_names, 80)[0]:
@@ -566,11 +624,13 @@ def check_human_name(self, books, gpt4_names):
 
     def check_city_name(self, books):
         """
-        Check if any book title contains a city name.
-        Parameters:
-        books (list): List of book titles to check.
+        Checks if any book title contains a city name.
+
+        Args:
+            books (list): List of book titles to check.
+
         Returns:
-        bool: True if any title contains a city name, False otherwise.
+            bool: True if any title contains a city name, False otherwise.
         """
         for book in books:
             if len(self.extract_cities(book)) > 0:
diff --git a/eureka_ml_insights/metrics/metrics_base.py b/eureka_ml_insights/metrics/metrics_base.py
index 028166c4..ebbe6f5b 100644
--- a/eureka_ml_insights/metrics/metrics_base.py
+++ b/eureka_ml_insights/metrics/metrics_base.py
@@ -2,32 +2,86 @@
 
 
 class Metric:
+    """Base class for metrics.
+
+    This class defines the core structure for validating data and evaluating
+    a metric on a given dataset. Subclasses should implement the __evaluate__
+    method.
+    """
+
     def validate_data(self, data):
-        """This method checks if the data has the required fields."""
+        """Validates input data.
+
+        Args:
+            data (pd.DataFrame): The input DataFrame to validate.
+
+        Returns:
+            bool: True if the data is valid.
 
+        Raises:
+            AssertionError: If the 'is_valid' column is missing.
+        """
         assert "is_valid" in data.columns, "Data does not have 'is_valid' field."
         return True
 
     def __evaluate__(self, **kwargs):
+        """Evaluates the metric for a single row.
+
+        Kwargs:
+            **kwargs: Arbitrary keyword arguments for evaluation.
+
+        Returns:
+            Any: The result of the evaluation.
+
+        Raises:
+            NotImplementedError: This method must be overridden by subclasses.
+        """
         raise NotImplementedError
 
     def evaluate(self, data):
+        """Evaluates the metric on the entire dataset.
+
+        Args:
+            data (pd.DataFrame): The input DataFrame on which to run the metric.
+
+        Returns:
+            pd.DataFrame: A copy of the DataFrame with metric results appended
+            as a new column.
+        """
         self.validate_data(data)
         data[self.__class__.__name__ + "_result"] = data.apply(lambda x: self.__evaluate__(x), axis=1)
         return data
 
 
 class CompositeMetric(Metric):
-    """This class is a base class for composite metrics that consist of several submetrics.
-    Returns a dictionary mapping "metric_name" to value for each row of data.
+    """Base class for composite metrics that consist of several submetrics.
+
+    This class returns a dictionary mapping "metric_name" to a value for each
+    row of data.
     """
 
     def evaluate(self, data):
+        """Evaluates the composite metric on the given data.
+
+        Args:
+            data (pd.DataFrame): The input DataFrame.
+
+        Returns:
+            pd.DataFrame: The DataFrame with decomposition applied.
+        """
         super().evaluate(data)
         data = self.decompose_metric(data)
         return data
 
     def decompose_metric(self, data):
+        """Decomposes the composite metric results into separate columns.
+
+        Args:
+            data (pd.DataFrame): The input DataFrame with composite metric results.
+
+        Returns:
+            pd.DataFrame: The DataFrame with each submetric's result in a separate column.
+        """
         composite_metric_col = self.__class__.__name__ + "_result"
         # TODO this would break if the first row does not have all the metrics, e.g. due invalid inference results
         for metric_name in data[composite_metric_col][0]:
@@ -44,20 +98,44 @@ def decompose_metric(self, data):
 
 
 class ClassicMetric(Metric):
-    """This class is a base class for metrics that require ground truths and predictions."""
+    """Base class for metrics that require ground truths and predictions."""
 
     def __init__(self, model_output_col: str = "model_output"):
+        """Initializes the ClassicMetric.
+
+        Args:
+            model_output_col (str, optional): The name of the column containing
+                model outputs. Defaults to "model_output".
+        """
         super().__init__()
         self.model_output_col = model_output_col
 
     def validate_data(self, data):
-        """This method checks if the data has the required fields."""
+        """Validates that the required fields exist in the DataFrame.
+
+        Args:
+            data (pd.DataFrame): The input DataFrame to validate.
+
+        Returns:
+            bool: True if the data is valid.
+
+        Raises:
+            AssertionError: If required columns are missing.
+        """
         super().validate_data(data)
         assert "ground_truth" in data.columns, "Data does not have 'ground_truth' field."
         assert self.model_output_col in data.columns, f"Data does not have '{self.model_output_col}' field."
         return True
 
     def evaluate(self, data):
+        """Evaluates the metric on the given DataFrame.
+
+        Args:
+            data (pd.DataFrame): The input DataFrame on which to run the metric.
+
+        Returns:
+            pd.DataFrame: The DataFrame with metric results appended.
+        """
         self.validate_data(data)
         data[self.__class__.__name__ + "_result"] = data.apply(
             lambda x: self.__evaluate__(x[self.model_output_col], x["ground_truth"], x["is_valid"]), axis=1
@@ -66,23 +144,48 @@ def evaluate(self, data):
 
 
 class DetectionMetric(Metric):
-    """
-    This class is for the detection metric, where access is needed to the image
-    and ground truth is stored in an external file.
+    """Class for detection metrics requiring image access and external ground truth.
+
+    This metric requires additional information such as an image ID and
+    references an external ground truth file.
     """
 
     def __init__(self, model_output_col: str = "model_output"):
+        """Initializes the DetectionMetric.
+
+        Args:
+            model_output_col (str, optional): The name of the column containing
+                model outputs. Defaults to "model_output".
+        """
         super().__init__()
         self.model_output_col = model_output_col
 
     def validate_data(self, data):
-        """This method checks if the data has the required fields."""
+        """Validates that the required fields exist in the DataFrame.
+
+        Args:
+            data (pd.DataFrame): The input DataFrame to validate.
+
+        Returns:
+            bool: True if the data is valid.
+
+        Raises:
+            AssertionError: If required columns are missing.
+        """
         super().validate_data(data)
         assert "id" in data.columns, "Data does not have 'id' field."
         assert self.model_output_col in data.columns, f"Data does not have '{self.model_output_col}' field."
         return True
 
     def evaluate(self, data):
+        """Evaluates the detection metric on the given DataFrame.
+
+        Args:
+            data (pd.DataFrame): The input DataFrame on which to run the metric.
+
+        Returns:
+            pd.DataFrame: The DataFrame with detection metric results appended.
+        """
         self.validate_data(data)
 
         tqdm.pandas()
@@ -94,15 +197,33 @@ def evaluate(self, data):
 
 
 class MultipleChoiceMetric(ClassicMetric):
-    """This class is a base class for metrics that have a multiple choice answer."""
+    """Base class for metrics that have a multiple choice answer."""
 
     def validate_data(self, data):
-        """This method checks if the data has the required fields."""
+        """Validates that the required fields exist in the DataFrame.
+
+        Args:
+            data (pd.DataFrame): The input DataFrame to validate.
+
+        Returns:
+            bool: True if the data is valid.
+
+        Raises:
+            AssertionError: If required columns are missing.
+        """
         super().validate_data(data)
         assert "target_options" in data.columns, "Data does not have 'target_options' field."
         return True
 
     def evaluate(self, data):
+        """Evaluates the multiple choice metric on the given DataFrame.
+
+        Args:
+            data (pd.DataFrame): The input DataFrame on which to run the metric.
+
+        Returns:
+            pd.DataFrame: The DataFrame with metric results appended.
+        """
         self.validate_data(data)
         data[self.__class__.__name__ + "_result"] = data.apply(
             lambda x: self.__evaluate__(
@@ -114,15 +235,33 @@ def evaluate(self, data):
 
 
 class MixedQuestionTypeMetric(MultipleChoiceMetric):
-    """This class is a base class for metrics that have a multiple choice or open answer."""
+    """Base class for metrics that can handle multiple choice or open-ended answers."""
 
     def validate_data(self, data):
-        """This method checks if the data has the required fields."""
+        """Validates that the required fields exist in the DataFrame.
+
+        Args:
+            data (pd.DataFrame): The input DataFrame to validate.
+
+        Returns:
+            bool: True if the data is valid.
+
+        Raises:
+            AssertionError: If required columns are missing.
+        """
         super().validate_data(data)
         assert "question_type" in data.columns, "Data does not have 'question_type' field."
         return True
 
     def evaluate(self, data):
+        """Evaluates the metric for both multiple choice and open-ended answers.
+
+        Args:
+            data (pd.DataFrame): The input DataFrame on which to run the metric.
+
+        Returns:
+            pd.DataFrame: The DataFrame with metric results appended.
+        """
         self.validate_data(data)
         data[self.__class__.__name__ + "_result"] = data.apply(
             lambda x: self.__evaluate__(
@@ -134,13 +273,24 @@ def evaluate(self, data):
 
 
 class SubstringExistsMatch(ClassicMetric):
-    """This class checks for a case-insensitive substring match."""
+    """Checks for a case-insensitive substring match."""
 
     def __evaluate__(self, answer_text, target_text, is_valid):
+        """Evaluates the SubstringExistsMatch metric for a single row.
+
+        Args:
+            answer_text (str): The predicted answer text.
+            target_text (str): The ground truth text.
+            is_valid (bool): Whether the entry is valid.
+
+        Returns:
+            str: "correct" if the target text is a substring of the answer text
+            (ignoring case), otherwise "incorrect". If invalid, returns "none".
+        """
         if not is_valid:
             return "none"
-        
-        # Make sure everything is a strings.
+
+        # Make sure everything is strings.
         answer_text = str(answer_text)
         target_text = str(target_text)
 
@@ -148,9 +298,20 @@ def __evaluate__(self, answer_text, target_text, is_valid):
 
 
 class ExactMatch(ClassicMetric):
-    """This class checks for an exact match."""
+    """Checks for an exact match between predicted and ground truth text."""
 
     def __evaluate__(self, answer_text, target_text, is_valid):
+        """Evaluates the ExactMatch metric for a single row.
+
+        Args:
+            answer_text (str): The predicted answer text.
+            target_text (str): The ground truth text.
+            is_valid (bool): Whether the entry is valid.
+
+        Returns:
+            str: "correct" if the predicted text exactly matches the ground truth
+            text, otherwise "incorrect". If invalid, returns "none".
+        """
         if not is_valid:
             return "none"
         if target_text == answer_text:
@@ -160,32 +321,66 @@ def __evaluate__(self, answer_text, target_text, is_valid):
 
 
 class CaseInsensitiveMatch(ExactMatch):
-    """This class checks for a case-insensitive, but otherwise exact match."""
+    """Checks for a case-insensitive exact match between predicted and ground truth text."""
 
     def __evaluate__(self, answer_text, target_text, is_valid):
+        """Evaluates the CaseInsensitiveMatch metric for a single row by converting
+        both texts to lowercase.
+
+        Args:
+            answer_text (str): The predicted answer text.
+            target_text (str): The ground truth text.
+            is_valid (bool): Whether the entry is valid.
+
+        Returns:
+            str: "correct" if the predicted text matches the ground truth text
+            (ignoring case), otherwise "incorrect". If invalid, returns "none".
+        """
         return super().__evaluate__(str(answer_text).lower(), str(target_text).lower(), is_valid)
 
 
 class IdentityMetric(Metric):
+    """Returns the data unmodified."""
 
     def evaluate(self, df):
+        """Evaluates the DataFrame by returning it unchanged.
+
+        Args:
+            df (pd.DataFrame): The input DataFrame.
+
+        Returns:
+            pd.DataFrame: The same DataFrame without modification.
+        """
         return df
 
 
 class MetricBasedVerifier:
-    """ This class enables the use of a metric as a verifier, in other words, it wraps a given metric class as a
-    data transformation and stores the verification result in a column called "verification_result"."""
+    """Wraps a given metric class as a data transformation for verification.
+
+    This class stores the verification result in a column called "verification_result".
+    """
+
     def __init__(self, metric_class, model_output_col: str = "model_output"):
-        """
-        args:
-            metric_class: The class of the metric to be used for verification.
-            model_output_col: The name of the column containing the model output.
+        """Initializes MetricBasedVerifier.
+
+        Args:
+            metric_class (Type[Metric]): The class of the metric to be used for verification.
+            model_output_col (str, optional): The name of the column containing
+                the model output. Defaults to "model_output".
         """
         self.model_output_col = model_output_col
         self.metric_class = metric_class
 
     def transform(self, data):
+        """Applies the wrapped metric to transform the data and stores the result.
+
+        Args:
+            data (pd.DataFrame): The input DataFrame to transform.
+
+        Returns:
+            pd.DataFrame: The DataFrame with an added "verification_result" column.
+        """
         data = self.metric_class(model_output_col=self.model_output_col).evaluate(data)
-        # rename the result column to "verification_reuslt"
+        # rename the result column to "verification_result"
         data.rename(columns={self.metric_class.__name__ + "_result": "verification_result"}, inplace=True)
-        return data
\ No newline at end of file
+        return data
diff --git a/eureka_ml_insights/metrics/mmmu_metrics.py b/eureka_ml_insights/metrics/mmmu_metrics.py
index d46a9e2d..bb5e0cfe 100644
--- a/eureka_ml_insights/metrics/mmmu_metrics.py
+++ b/eureka_ml_insights/metrics/mmmu_metrics.py
@@ -1,3 +1,5 @@
+"""Module that provides MMMUMetric class for evaluating multiple-choice and open-ended questions from the MMMU dataset."""
+
 import random
 import re
 
@@ -7,21 +9,33 @@
 
 
 class MMMUMetric(MixedQuestionTypeMetric):
-    """This class implements the extact metric from the MMMU dataset for multiple-choice and open-ended answers.
-    The original code is located in https://github.com/MMMU-Benchmark/MMMU/tree/main/eval/utils and has
-    small modifications to work in this framework.
+    """Implements the exact metric from the MMMU dataset for multiple-choice and open-ended answers.
+
+    The original code is located at:
+    https://github.com/MMMU-Benchmark/MMMU/tree/main/eval/utils
+    and includes small modifications to work in this framework.
     """
 
     def __init__(self):
+        """Initializes MMMUMetric by seeding the random number generator with 42."""
         super().__init__()
 
         random.seed(42)
 
     # ----------- Process Multi-choice -------------
     def parse_multi_choice_response(self, response, all_choices, index2ans):
-        """
-        Parse the prediction from the generated response.
-        Return the predicted index e.g., A, B, C, D.
+        """Parses the prediction from the generated response and returns the predicted index.
+
+        This method validates the response content to identify the best matching multiple-choice index
+        (e.g., 'A', 'B', 'C', 'D'). If no matching index is found, one is randomly chosen.
+
+        Args:
+            response (str): The model-generated response.
+            all_choices (List[str]): A list of possible choice indices (e.g., ['A', 'B', 'C', 'D']).
+            index2ans (Dict[str, str]): A mapping from choice indices to full answer strings.
+
+        Returns:
+            str: The predicted choice index (e.g., 'A', 'B', 'C', or 'D').
         """
         for char in [",", ".", "!", "?", ";", ":", "'"]:
             response = response.strip(char)
@@ -56,7 +70,6 @@ def parse_multi_choice_response(self, response, all_choices, index2ans):
                     for can in candidates:
                         index = response.rfind(f"({can})")
                         start_indexes.append(index)  # -1 will be ignored anyway
-                    # start_indexes = [generated_response.index(f'({can})') for can in candidates]
                 else:
                     for can in candidates:
                         index = response.rfind(f" {can} ")
@@ -74,8 +87,13 @@ def parse_multi_choice_response(self, response, all_choices, index2ans):
 
     # ----------- Process Open -------------
     def check_is_number(self, string):
-        """
-        Check if the given string a number.
+        """Checks whether the given string can be interpreted as a float.
+
+        Args:
+            string (str): The string to check.
+
+        Returns:
+            bool: True if the string can be converted to a float, False otherwise.
         """
         try:
             float(string.replace(",", ""))
@@ -85,12 +103,18 @@ def check_is_number(self, string):
             return False
 
     def normalize_str(self, string):
-        """
-        Normalize the str to lower case and make them float numbers if possible.
-        """
-        # check if characters in the string
+        """Normalizes a string to lowercase or converts it to a float if it looks like a number.
+
+        If the string is a number, it is converted to float and rounded to two decimals.
+        If it is a single character, variants with leading and trailing spaces are returned
+        to avoid trivial matches.
+
+        Args:
+            string (str): The string to normalize.
 
-        # if number, numerize it.
+        Returns:
+            list: A list containing either the normalized string(s) or float value(s).
+        """
         string = string.strip()
 
         is_number = self.check_is_number(string)
@@ -102,41 +126,46 @@ def normalize_str(self, string):
             string = round(string, 2)
             return [string]
         else:  # it's likely to be a string
-            # lower it
             string = string.lower()
             if len(string) == 1:
                 return [" " + string, string + " "]  # avoid trivial matches
             return [string]
 
     def extract_numbers(self, string):
+        """Extracts various forms of numbers from a string using regular expressions.
+
+        This includes numbers with commas, scientific notation, and simple numeric formats.
+
+        Args:
+            string (str): The string from which to extract numbers.
+
+        Returns:
+            list: A list of all extracted numeric substrings.
         """
-        Exact all forms of numbers from a string with regex.
-        """
-        # Pattern for numbers with commas
         pattern_commas = r"-?\b\d{1,3}(?:,\d{3})+\b"
-        # Pattern for scientific notation
         pattern_scientific = r"-?\d+(?:\.\d+)?[eE][+-]?\d+"
-        # Pattern for simple numbers without commas
         pattern_simple = r"-?(?:\d+\.\d+|\.\d+|\d+\b)(?![eE][+-]?\d+)(?![,\d])"
 
-        # Extract numbers with commas
         numbers_with_commas = re.findall(pattern_commas, string)
-        # Extract numbers in scientific notation
         numbers_scientific = re.findall(pattern_scientific, string)
-        # Extract simple numbers without commas
         numbers_simple = re.findall(pattern_simple, string)
 
-        # Combine all extracted numbers
         all_numbers = numbers_with_commas + numbers_scientific + numbers_simple
         return all_numbers
 
     def parse_open_response(self, response):
-        """
-        Parse the prediction from the generated response.
-        Return a list of predicted strings or numbers.
+        """Parses the prediction from a generated response and returns a list of predicted strings or numbers.
+
+        This method extracts key sub-responses, numerical values, and normalizes them.
+        Duplicate values are removed.
+
+        Args:
+            response (str): The model-generated response.
+
+        Returns:
+            list: A list of predicted strings or numbers.
         """
 
-        # content = content.strip("\n").strip(".").strip(" ")
         def get_key_subresponses(response):
             key_responses = []
             response = response.strip().strip(".").lower()
@@ -144,12 +173,9 @@ def get_key_subresponses(response):
             indicators_of_keys = ["could be ", "so ", "is ", "thus ", "therefore ", "final ", "answer ", "result "]
             key_responses = []
             for index, resp in enumerate(sub_responses):
-                # if last one, accept it's an equation (the entire response can be just one sentence with equation)
                 if index == len(sub_responses) - 1:
                     indicators_of_keys.extend(["="])
-                shortest_key_response = (
-                    None  # the shortest response that may contain the answer (tail part of the response)
-                )
+                shortest_key_response = None
                 for indicator in indicators_of_keys:
                     if indicator in resp:
                         if not shortest_key_response:
@@ -157,19 +183,17 @@ def get_key_subresponses(response):
                         else:
                             if len(resp.split(indicator)[-1].strip()) < len(shortest_key_response):
                                 shortest_key_response = resp.split(indicator)[-1].strip()
-                        # key_responses.append(resp.split(indicator)[1].strip())
 
                 if shortest_key_response:
-                    # and it's not trivial
                     if shortest_key_response.strip() not in [":", ",", ".", "!", "?", ";", ":", "'"]:
                         key_responses.append(shortest_key_response)
-            if len(key_responses) == 0:  # did not found any
+            if len(key_responses) == 0:
                 return [response]
             return key_responses
 
         key_responses = get_key_subresponses(response)
 
-        pred_list = key_responses.copy()  # keep the original string response
+        pred_list = key_responses.copy()
         for resp in key_responses:
             pred_list.extend(self.extract_numbers(resp))
 
@@ -178,49 +202,59 @@ def get_key_subresponses(response):
             tmp_pred_list.extend(self.normalize_str(pred_list[i]))
         pred_list = tmp_pred_list
 
-        # remove duplicates
         pred_list = list(set(pred_list))
 
         return pred_list
 
     # ----------- Evaluation -------------
     def eval_multi_choice(self, gold_i, pred_i):
-        """
-        Evaluate a multiple choice instance.
+        """Evaluates the correctness of a multiple-choice prediction.
+
+        Args:
+            gold_i (str or list of str): The correct answer(s). May be a single string or a list of valid answers.
+            pred_i (str): The predicted choice index.
+
+        Returns:
+            bool: True if the prediction matches any of the valid answers, False otherwise.
         """
         correct = False
-        # only they are exactly the same, we consider it as correct
         if isinstance(gold_i, list):
             for answer in gold_i:
                 if answer == pred_i:
                     correct = True
                     break
-        else:  # gold_i is a string
+        else:
             if gold_i == pred_i:
                 correct = True
         return correct
 
     def eval_open(self, gold_i, pred_i):
-        """
-        Evaluate an open question instance
+        """Evaluates the correctness of an open-ended response.
+
+        The gold answers and predicted answers are normalized for comparison.
+
+        Args:
+            gold_i (str or list of str): The correct answer(s).
+            pred_i (list of (str or float)): The predicted answers, previously normalized.
+
+        Returns:
+            bool: True if any of the predictions matches the correct answer(s), False otherwise.
         """
         correct = False
         if isinstance(gold_i, list):
-            # use float to avoid trivial matches
             norm_answers = []
             for answer in gold_i:
                 norm_answers.extend(self.normalize_str(answer))
         else:
             norm_answers = self.normalize_str(gold_i)
-        for pred in pred_i:  # pred is already normalized in parse response phase
-            if isinstance(pred, str):  # if it's a string, then find if ans in the pred_i
+        for pred in pred_i:
+            if isinstance(pred, str):
                 for norm_ans in norm_answers:
-                    # only see if the string answer in the string pred
                     if isinstance(norm_ans, str) and norm_ans in pred:
                         if not correct:
                             correct = True
                         break
-            else:  # it's a float number
+            else:
                 if pred in norm_answers:
                     if not correct:
                         correct = True
@@ -228,11 +262,16 @@ def eval_open(self, gold_i, pred_i):
         return correct
 
     def get_multi_choice_info(self, options):
-        """
-        Given the list of options for multiple choice question
-        Return the index2ans and all_choices
-        """
+        """Generates a mapping and list of choice indices for multiple-choice questions.
 
+        Args:
+            options (list of str): A list of multiple-choice options.
+
+        Returns:
+            tuple: A tuple (index2ans, all_choices) where:
+                index2ans (dict): Maps a choice index (e.g., 'A') to the corresponding option text.
+                all_choices (list of str): A list of choice indices (e.g., ['A', 'B', 'C', ...]).
+        """
         start_chr = "A"
         all_choices = []
         index2ans = {}
@@ -243,6 +282,22 @@ def get_multi_choice_info(self, options):
         return index2ans, all_choices
 
     def __evaluate__(self, answer_text, target_text, question_type, target_options, is_valid):
+        """Evaluates a single question instance to determine correctness.
+
+        For a multiple-choice question, it parses the response to find a predicted index
+        and compares it against the gold answer. For an open-ended question, it uses the
+        parsed open response and compares it to the gold answer.
+
+        Args:
+            answer_text (str): The model-generated answer text.
+            target_text (str or list of str): The correct answer(s).
+            question_type (str): The type of question, either 'multiple-choice' or 'open'.
+            target_options (list of str): The multiple-choice options if applicable.
+            is_valid (bool): Whether this question should be evaluated.
+
+        Returns:
+            str: "none" if the question is invalid, "correct" if the prediction is correct, or "incorrect" otherwise.
+        """
         if not is_valid:
             return "none"
 
diff --git a/eureka_ml_insights/metrics/nphard_sat_metrics.py b/eureka_ml_insights/metrics/nphard_sat_metrics.py
index 23af942a..288fad49 100644
--- a/eureka_ml_insights/metrics/nphard_sat_metrics.py
+++ b/eureka_ml_insights/metrics/nphard_sat_metrics.py
@@ -1,52 +1,54 @@
+"""This module provides the NPHardSATMetric class for evaluating solutions to the SAT problem.
+
+The NPHardSATMetric class inherits from Metric and determines whether a model-predicted
+assignment string is one of the optimal solutions, or whether both the prediction and 
+ground truth indicate the problem is unsatisfiable.
+"""
+
 from .metrics_base import Metric
 
 
 class NPHardSATMetric(Metric):
-    """
-    A metric class for evaluating solutions to the SAT.
-    A prediction is considered correct if it is a valid SAT assignment and matches one of the optimal solutions.
+    """NPHardSATMetric class for evaluating solutions to SAT problems.
+
+    A prediction is considered correct if it is a valid SAT assignment that
+    matches one of the optimal solutions, or if both the model output and ground
+    truth indicate unsatisfiable.
     """
 
     def __init__(self):
+        """Initializes the NPHardSATMetric class."""
         super().__init__()
 
     def is_assignment_optimal(self, optimal_assignment_list, assignment_string):
-        """
-        Decide whether the model-predicted assignment string is accepted as a correct
-        SAT solution.
-
-        1. **Input formats**
-        • `optimal_assignment_list` – `List[str]`
-            A list of canonical optimal assignments, each expressed as a
-            comma-separated string of 0/1 literals, following the variable
-            order specified in the prompt instructions, e.g.
-            `"1,0,1,1"`.
-            – An **empty list (`[]`) represents “unsatisfiable.”**
-        • `assignment_string` – `str`
-            The model’s prediction in the *same* 0/1 comma-separated format.
-            – An empty string (`""`) means the model declares “unsatisfiable.”
-
-        2. **Normalisation performed**
-        The function removes spaces and surrounding parentheses from every entry
-        in `optimal_assignment_list`, then rejoins tokens with single commas,
-        e.g. `"(1, 0 ,1)" → "1,0,1"`.
-
-        3. **Acceptance criteria**
-        • Return **`True`** if
-            a. *Both* sides claim unsatisfiable
-                (`assignment_string == ""` **and** `optimal_assignment_list == []`), **or**
-            b. The canonical `assignment_string` exactly matches (string equality)
-                **one** element of the normalised `optimal_assignment_list`.
-        • Otherwise return **`False`**.
-
-        4. **Order sensitivity**
-        Because matching is string-exact, **variable order must match** between
-        prediction and ground truth.
-
-        Returns
-        -------
-        bool
-            `True` if the predicted assignment is accepted as correct, else `False`.
+        """Determines whether a model-predicted assignment string is accepted as a correct SAT solution.
+
+        Args:
+            optimal_assignment_list (List[str]): A list of canonical optimal assignments,
+                each expressed as a comma-separated string of 0/1 literals, following the
+                variable order specified in the prompt instructions (e.g. "1,0,1,1"). An
+                empty list ([]) represents “unsatisfiable.”
+            assignment_string (str): The model's prediction in the same 0/1 comma-separated
+                format. An empty string ("") means the model declares “unsatisfiable.”
+
+        Normalization:
+            This function removes spaces and surrounding parentheses from every entry in
+            optimal_assignment_list, then rejoins tokens with single commas.
+            For example, "(1, 0 ,1)" becomes "1,0,1".
+
+        Acceptance Criteria:
+            1. Returns True if both sides claim unsatisfiable
+               (assignment_string == "" and optimal_assignment_list == []).
+            2. Returns True if the canonical assignment_string exactly matches one element
+               of the normalized optimal_assignment_list.
+            Otherwise, returns False.
+
+        Order Sensitivity:
+            Because matching is string-exact, variable order must match between
+            prediction and ground truth.
+
+        Returns:
+            bool: True if the predicted assignment is accepted as correct, otherwise False.
         """
         # If both the model output and the ground truth say the solution is unsatisfiable,
         # (assignment_string: "", ground truth: empty list), accept immediately.
@@ -61,8 +63,20 @@ def is_assignment_optimal(self, optimal_assignment_list, assignment_string):
         return assignment_string in optimal_assignment_list
 
     def __evaluate__(self, x):
-        """
-        Evaluates whether the model's output is a correct SAT assignment.
+        """Evaluates whether the model's output is a correct SAT assignment.
+
+        Args:
+            x (dict): A dictionary containing the model's prediction and ground truth
+                information. It should include:
+                - "is_valid" (bool): Whether the model's output is valid.
+                - "solution" (List[str]): The list of optimal solutions.
+                - "extracted_answer" (str): The model's predicted assignment string.
+
+        Returns:
+            str:
+                - "none" if the prediction is invalid.
+                - "incorrect" if the prediction does not match an optimal solution.
+                - "correct" if the prediction matches an optimal solution.
         """
         is_valid_curr = x["is_valid"]
 
diff --git a/eureka_ml_insights/metrics/nphard_tsp_metrics.py b/eureka_ml_insights/metrics/nphard_tsp_metrics.py
index f294420a..b7e3ee8f 100644
--- a/eureka_ml_insights/metrics/nphard_tsp_metrics.py
+++ b/eureka_ml_insights/metrics/nphard_tsp_metrics.py
@@ -4,25 +4,27 @@
 
 
 class NPHardTSPMetric(Metric):
-    """
-    A metric class for evaluating solutions to the Traveling Salesman Problem (TSP).
-    A prediction is considered correct if it is a valid TSP tour and matches one of the optimal solutions.
+    """A metric class for evaluating solutions to the Traveling Salesman Problem (TSP).
+
+    A prediction is considered correct if it is a valid TSP tour and matches one of the
+    optimal solutions.
     """
 
     def __init__(self):
         super().__init__()
 
     def is_valid_tsp_path(self, path, cities, distance_matrix=None):
-        """
-        Validates a TSP path and if valid, evaluates its length.
+        """Validates a TSP path and, if valid, evaluates its length.
 
-        Parameters:
+        Args:
             path (list of int): The TSP path, a list of city indices.
             cities (list of int): The list of all cities.
-            distance_matrix (list of lists, optional): Matrix representing distances between cities.
+            distance_matrix (list of lists, optional): Matrix representing distances
+                between cities.
 
         Returns:
-            tuple: (bool, float): Whether the path is valid and its length (if valid). Length is None if invalid.
+            tuple(bool, float): A tuple containing a boolean indicating whether the
+            path is valid and a float for its total length if valid. Length is None if invalid.
         """
         # Ensure the path is not empty and has the correct number of cities
         if not path or len(path) != len(cities) + 1:
@@ -57,7 +59,15 @@ def is_valid_tsp_path(self, path, cities, distance_matrix=None):
         return True, path_length
 
     def is_tour_present(self, optimal_tour_curr, tour_string):
+        """Checks if a given TSP tour is present in a list of optimal tours.
 
+        Args:
+            optimal_tour_curr (str): A string representation of one or more optimal tours.
+            tour_string (str): The predicted TSP tour, represented as a comma-separated string of city indices.
+
+        Returns:
+            bool: True if the predicted tour is present among the optimal tours, False otherwise.
+        """
         optimal_tour_list = eval(optimal_tour_curr.strip("."))
         optimal_tour_strings = [",".join(map(str, tour + (tour[0],))) for tour in optimal_tour_list]
 
@@ -65,8 +75,14 @@ def is_tour_present(self, optimal_tour_curr, tour_string):
         return tour_string in optimal_tour_strings
 
     def __evaluate__(self, x):
-        """
-        Evaluates whether the model's output is a correct TSP tour.
+        """Evaluates whether the model's output is a correct TSP tour.
+
+        Args:
+            x (dict): A dictionary containing keys "is_valid", "optimal_tour", "weight_matrix",
+                "ground_truth", and "model_output" corresponding to the TSP instance and model's output.
+
+        Returns:
+            str: One of "none", "incorrect", or "correct" based on the validity and correctness of the TSP tour.
         """
         is_valid_curr = x["is_valid"]
 
diff --git a/eureka_ml_insights/metrics/reports.py b/eureka_ml_insights/metrics/reports.py
index f95cdab1..bb132396 100644
--- a/eureka_ml_insights/metrics/reports.py
+++ b/eureka_ml_insights/metrics/reports.py
@@ -11,18 +11,22 @@
 
 
 class Aggregator:
-    """This class aggregates data and writes the results."""
+    """Aggregates data and writes the results."""
 
     def __init__(self, column_names, output_dir, group_by=None, ignore_non_numeric=False, filename_base=None, **kwargs):
         """
-        args:
-            column_names: list of column names to aggregate
-            output_dir: str. directory to save the report
-            group_by: str. or list of str. column(s) to group by before aggregating
-            ignore_non_numeric: bool. if True ignore non-numeric values for average aggregator
-            filename_base: str. optional base string to be used in the file name for the report. If not None, the report filename will concatenate the class name, datetime, and filename_base.
+        Initializes an Aggregator.
+
+        Args:
+            column_names (List[str]): A list of column names to aggregate.
+            output_dir (str): The directory to save the report.
+            group_by (Union[str, List[str]], optional): The column(s) to group by before aggregating. Defaults to None.
+            ignore_non_numeric (bool, optional): If True, ignores non-numeric values for the average aggregator.
+                Defaults to False.
+            filename_base (str, optional): An optional base string to be used in the file name for the report. If not
+                None, the report filename will concatenate the class name, datetime, and filename_base. Defaults to None.
+            **kwargs: Additional keyword arguments.
         """
-
         self.column_names = column_names
         self.group_by = group_by
         self.output_dir = output_dir
@@ -31,12 +35,16 @@ def __init__(self, column_names, output_dir, group_by=None, ignore_non_numeric=F
         self.filename_base = filename_base
 
     def aggregate(self, data):
+        """
+        Aggregates the provided data.
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+        """
         if self.ignore_non_numeric:
             data = data[data["is_valid"]].copy()
-        # determine if a groupby is needed, and call the appropriate aggregation function
         self._validate_data(data)
         if self.group_by:
-            # if group_by is a list, create a new column that is concatenation of the str values
             if isinstance(self.group_by, list):
                 group_name = "_".join(self.group_by)
                 data[group_name] = data[self.group_by].astype(str).agg("_".join, axis=1)
@@ -46,12 +54,14 @@ def aggregate(self, data):
             self._aggregate(data)
 
     def __str__(self):
+        """
+        Returns a string representation of the aggregator, used for naming output files.
+
+        Returns:
+            str: The string representation, including date/time and optionally column/group info.
+        """
         str_rep = self.__class__.__name__.lower()
-        # get time of day string down to miliseconds for unique name in case several of
-        # the same aggregator are used in the same report
         str_rep += "_" + datetime.datetime.now().strftime("%H%M%S%f")
-        # if filename_base is not provided during report creation, the report filename will concatenate the names of metric colums, as well as the "_grouped_by_" indicator if applicable.
-        # if filename_base is provided during report creation, the report filename will concatenate the class name, datetime, and filename_base.
         if self.filename_base is None:
             if self.column_names:
                 str_rep = str_rep + "_on_" + "_".join(self.column_names)
@@ -62,6 +72,12 @@ def __str__(self):
         return str_rep
 
     def write_results(self):
+        """
+        Writes the aggregated results to a JSON file.
+
+        Raises:
+            ValueError: If the data has not been aggregated yet.
+        """
         if self.aggregated_result is None:
             raise ValueError("The data has not been aggregated yet.")
 
@@ -70,7 +86,16 @@ def write_results(self):
             json.dump(self.aggregated_result, f)
 
     def _validate_data(self, data, **kwargs):
-        """Ensure that the input arguments are in the correct format."""
+        """
+        Ensures that the input arguments are in the correct format.
+
+        Args:
+            data (pd.DataFrame): The input data to validate.
+            **kwargs: Additional keyword arguments.
+
+        Raises:
+            ValueError: If column_names is not a list of strings, or if group_by is not a string or list of strings.
+        """
         if not isinstance(self.column_names, list) or not all(isinstance(col, str) for col in self.column_names):
             raise ValueError("column_names must be a list of strings.")
         if self.group_by:
@@ -79,55 +104,108 @@ def _validate_data(self, data, **kwargs):
                     raise ValueError("group_by must be a string or a list of strings")
 
     def _aggregate(self, data):
-        """Aggregate the data without grouping."""
+        """
+        Aggregates the data without grouping.
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+
+        Raises:
+            NotImplementedError: Must be implemented by subclasses.
+        """
         raise NotImplementedError
 
     def _aggregate_grouped(self, data):
-        """Aggregate the data with grouping."""
+        """
+        Aggregates the data with grouping.
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+
+        Raises:
+            NotImplementedError: Must be implemented by subclasses.
+        """
         raise NotImplementedError
 
 
 class NumericalAggregator(Aggregator):
-    """This class is a base class for aggregators that require the data to be numeric."""
+    """Base class for aggregators that require the data to be numeric."""
 
     def _validate_data(self, data):
+        """
+        Ensures that the data is numeric, in addition to the validations in the superclass.
+
+        Args:
+            data (pd.DataFrame): The input data to validate.
+
+        Raises:
+            ValueError: If the data fails any numeric check or if column_names/group_by are not valid.
+        """
         super()._validate_data(data)
-        """ Ensure that the data is numeric."""
         for col in self.column_names:
             data[col] = pd.to_numeric(data[col], errors="raise")
 
 
 class SumAggregator(NumericalAggregator):
-    """
-    This class aggregates data by summing the values."""
+    """Aggregates data by summing the values."""
 
     def _aggregate(self, data):
+        """
+        Aggregates data without grouping by computing sums for each specified column.
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+        """
         sums = {col: data[col].sum() for col in self.column_names}
         self.aggregated_result = sums
 
     def _aggregate_grouped(self, data):
+        """
+        Aggregates data with grouping by computing sums for each specified column.
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+        """
         gb = data.groupby(self.group_by)
         sums = {col: gb[col].sum().to_dict() for col in self.column_names}
         self.aggregated_result = sums
 
 
 class MaxAggregator(NumericalAggregator):
-    """
-    This class aggregates data by taking the max of the values."""
+    """Aggregates data by taking the maximum of the values."""
 
     def _aggregate(self, data):
+        """
+        Aggregates data without grouping by computing the maximum value for each specified column.
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+        """
         maxes = {col: data[col].max() for col in self.column_names}
         self.aggregated_result = maxes
 
     def _aggregate_grouped(self, data):
+        """
+        Aggregates data with grouping by computing the maximum value for each specified column.
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+        """
         gb = data.groupby(self.group_by)
         maxes = {col: gb[col].max().to_dict() for col in self.column_names}
         self.aggregated_result = maxes
 
 
 class AverageAggregator(NumericalAggregator):
+    """Aggregates data by computing the average of numeric columns."""
 
     def _aggregate(self, data):
+        """
+        Aggregates data without grouping by computing the mean (rounded to 3 decimals) for each specified column.
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+        """
         if len(data) == 0:
             averages = {col: 0 for col in self.column_names}
         else:
@@ -135,6 +213,12 @@ def _aggregate(self, data):
         self.aggregated_result = averages
 
     def _aggregate_grouped(self, data):
+        """
+        Aggregates data with grouping by computing the mean (rounded to 3 decimals) for each specified column.
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+        """
         if len(data) == 0:
             averages = {col: 0 for col in self.column_names}
         else:
@@ -144,8 +228,15 @@ def _aggregate_grouped(self, data):
 
 
 class AverageSTDDevAggregator(NumericalAggregator):
+    """Aggregates data by computing both the average and standard deviation for numeric columns."""
 
     def _aggregate(self, data):
+        """
+        Aggregates data without grouping by computing the mean and standard deviation of specified columns.
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+        """
         averages = {col: round(data[col].mean(), 3) for col in self.column_names}
         std_devs = {col: round(data[col].std(), 3) for col in self.column_names}
         self.aggregated_result = {
@@ -153,6 +244,12 @@ def _aggregate(self, data):
         }
 
     def _aggregate_grouped(self, data):
+        """
+        Aggregates data with grouping by computing the mean and standard deviation of specified columns.
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+        """
         gb = data.groupby(self.group_by)
         averages = {col: gb[col].mean().round(3).to_dict() for col in self.column_names}
         std_devs = {col: gb[col].std().round(3).to_dict() for col in self.column_names}
@@ -163,31 +260,52 @@ def _aggregate_grouped(self, data):
 
 
 class CountAggregator(Aggregator):
-    """Counts the number of occurences of values in the columns and optionally normalize the counts."""
+    """Counts the occurrences of values in columns and can optionally normalize the counts."""
 
     def __init__(self, column_names, output_dir, group_by=None, normalize=False, **kwargs):
         """
-        args:
-            column_names: list of column names to aggregate
-            output_dir: str. directory to save the report
-            group_by: str. or list of str. column(s) to group by before aggregating
-            normalize: bool. If True, normalize the counts to be between 0 and 1.
+        Initializes a CountAggregator.
+
+        Args:
+            column_names (List[str]): A list of column names to count.
+            output_dir (str): The directory to save the report.
+            group_by (Union[str, List[str]], optional): The column(s) to group by before aggregating. Defaults to None.
+            normalize (bool, optional): If True, normalizes the counts to be between 0 and 1. Defaults to False.
+            **kwargs: Additional keyword arguments.
         """
         super().__init__(column_names, output_dir, group_by, **kwargs)
         self.normalize = normalize
 
     def __str__(self):
+        """
+        Returns a string representation of the aggregator, used for naming output files.
+        If normalization is used, '_normalized' is appended to the name.
+
+        Returns:
+            str: The string representation, including date/time and optionally normalization info.
+        """
         str_rep = super().__str__()
         if self.normalize:
             str_rep += "_normalized"
         return str_rep
 
     def _aggregate(self, data):
+        """
+        Aggregates data without grouping by counting occurrences of values in each specified column.
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+        """
         counts = {col: data[col].value_counts(normalize=self.normalize).round(3).to_dict() for col in self.column_names}
         self.aggregated_result = counts
 
     def _aggregate_grouped(self, data):
-        # for each column, create a dictionary that contains the counts for each group
+        """
+        Aggregates data with grouping by counting occurrences of values in each specified column.
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+        """
         gb = data.groupby(self.group_by)
         col_counts = {
             col: gb[col].value_counts(normalize=self.normalize).unstack(level=0).round(3).to_dict()
@@ -198,44 +316,57 @@ def _aggregate_grouped(self, data):
 
 class BiLevelAggregator(AverageAggregator):
     """
-    This class aggregates the data in two levels. It first groups the data by the first_groupby column and
-    aggregates the data by applying the agg_fn on the column_names. It It then groups the result by the
-    second_groupby column and aggregates again by taking the mean and standard deviation of
-    the column_names.
+    Aggregates the data in two levels. First, it groups the data by the first_groupby column(s) and applies an
+    aggregation function (agg_fn) on the specified columns. Then, if a second_groupby is specified, it groups
+    the result again and computes the mean and standard deviation.
     """
 
     def __init__(self, column_names, first_groupby, output_dir, second_groupby=None, agg_fn="mean", **kwargs):
+        """
+        Initializes a BiLevelAggregator.
+
+        Args:
+            column_names (List[str]): A list of column names to aggregate.
+            first_groupby (Union[str, List[str]]): The column(s) for the first level of grouping.
+            output_dir (str): The directory to save the report.
+            second_groupby (Union[str, List[str]], optional): The column(s) for the second level of grouping.
+                Defaults to None.
+            agg_fn (str, optional): The aggregation function to apply for the first grouping. Defaults to "mean".
+            **kwargs: Additional keyword arguments.
+        """
         super().__init__(column_names, output_dir, group_by=None, **kwargs)
         self.first_groupby = first_groupby
         self.second_groupby = second_groupby
         self.agg_fn = agg_fn
 
     def _aggregate(self, data):
-        # take the self.agg_fn aggregation of the column for each group in the first groupby,
-        # aggregate the rest of the columns by 'first'
+        """
+        Performs a two-level aggregation. First, it groups by the first_groupby column(s) and applies agg_fn to
+        the specified columns. Then, if second_groupby is provided, groups the intermediate result again and
+        computes mean and standard deviation. If second_groupby is not provided, only mean and standard deviation
+        of the first aggregation are computed.
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+        """
         gb = data.groupby(self.first_groupby)
-        agg_map = {col: self.agg_fn for col in self.column_names}  # aggregate the column_names by self.agg_fn
+        agg_map = {col: self.agg_fn for col in self.column_names}
         agg_map.update(
             {
                 col: "first"
                 for col in data.columns
-                if col not in self.column_names  # aggregate the un-interesting columns by 'first'
-                and col != self.first_groupby  # in case first_groupby is a single column
-                and col not in self.first_groupby
-            }  # in case there are multiple columns in the first_groupby
+                if col not in self.column_names and col != self.first_groupby and col not in self.first_groupby
+            }
         )
 
         first_result = gb.aggregate(agg_map).reset_index()
         if self.second_groupby:
-            # take the average and std of the first level aggregation for each group in the second groupby
             gb = first_result.groupby(self.second_groupby)
             agg_map = {col: ["mean", "std"] for col in self.column_names}
-            # flatten the multi-level column index
             second_result = gb.agg(agg_map).reset_index()
             second_result.columns = [f"{col}_{agg}" if agg else col for col, agg in second_result.columns]
             self.aggregated_result = second_result.to_dict(orient="records")
         else:
-            # take the average and std of the first level aggregation
             self.aggregated_result = []
             for col in self.column_names:
                 col_mean = first_result[col].mean()
@@ -245,13 +376,25 @@ def _aggregate(self, data):
 
 class BiLevelCountAggregator(Aggregator):
     """
-    This class aggregates the data in two levels. It first groups the data by the first_groupby column and
-    aggregates the data by applying value_counts to each of the column_names. It then groups the result by the
-    second_groupby column and aggregates it again by taking the mean and standard deviation of the counts over
-    the groups.
+    Aggregates the data in two levels. First, it groups the data by the first_groupby column(s) and
+    applies a value_counts aggregation to each of the specified columns. Then, if a second_groupby is given,
+    it groups the intermediate result again by the second_groupby column(s) and computes the mean and standard
+    deviation of the counts.
     """
 
     def __init__(self, column_names, first_groupby, output_dir, normalize=False, second_groupby=None, **kwargs):
+        """
+        Initializes a BiLevelCountAggregator.
+
+        Args:
+            column_names (List[str]): A list of column names to aggregate.
+            first_groupby (Union[str, List[str]]): The column(s) for the first level of grouping.
+            output_dir (str): The directory to save the report.
+            normalize (bool, optional): If True, normalizes the counts to be between 0 and 1. Defaults to False.
+            second_groupby (Union[str, List[str]], optional): The column(s) for the second level of grouping.
+                Defaults to None.
+            **kwargs: Additional keyword arguments.
+        """
         super().__init__(column_names, output_dir, group_by=first_groupby, **kwargs)
         self.first_groupby = first_groupby
         self.second_groupby = second_groupby
@@ -259,23 +402,25 @@ def __init__(self, column_names, first_groupby, output_dir, normalize=False, sec
         self.agg_fn = lambda x: x.value_counts(normalize=self.normalize).to_dict()
 
     def _aggregate_grouped(self, data):
-        # Count values in the columns for each group in the first groupby,
-        # aggregate the rest of the columns by 'first'
+        """
+        Performs the two-level aggregation: first by the first_groupby, then optionally by the second_groupby.
+        Aggregates using value_counts for each column.
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+        """
         gb = data.groupby(self.first_groupby)
-        agg_map = {col: self.agg_fn for col in self.column_names}  # aggregate the column_names by self.agg_fn
+        agg_map = {col: self.agg_fn for col in self.column_names}
         agg_map.update(
             {
                 col: "first"
                 for col in data.columns
-                if col not in self.column_names  # aggregate the un-interesting columns by 'first'
-                and col != self.first_groupby  # in case first_groupby is a single column
-                and col not in self.first_groupby
-            }  # in case there are multiple columns in the first_groupby
+                if col not in self.column_names and col != self.first_groupby and col not in self.first_groupby
+            }
         )
 
         first_result = gb.aggregate(agg_map).reset_index()
 
-        # take the average and std of the first level aggregation for each group in the second groupby
         def agg_counts_by_avg_std(x):
             counts = {}
             for row in x:
@@ -296,34 +441,57 @@ def agg_counts_by_avg_std(x):
 
 
 class TwoColumnSumAverageAggregator(NumericalAggregator):
-    """This class that aggregates data from two columns by summing the values in each column and
-    then dividing the sum of the numerator column by the sum of the denominator column.
+    """
+    Aggregates data from two columns by summing the values in each column and then
+    dividing the sum of the numerator column by the sum of the denominator column.
     """
 
     def __init__(self, numerator_column_name, denominator_column_name, output_dir, group_by=None, **kwargs):
         """
-        args:
-            - numerator_column_name (str): The name of the column containing the numerator values.
-            - denominator_column_name (str): The name of the column containing the denominator values.
-            - output_dir (str): The directory where the aggregated result will be stored.
-            - groupby (str, optional): The column name to group the data by. Defaults to None.
+        Initializes a TwoColumnSumAverageAggregator.
+
+        Args:
+            numerator_column_name (str): The name of the column containing the numerator values.
+            denominator_column_name (str): The name of the column containing the denominator values.
+            output_dir (str): The directory where the aggregated result will be stored.
+            group_by (Union[str, List[str]], optional): The column(s) to group the data by. Defaults to None.
+            **kwargs: Additional keyword arguments.
         """
         super().__init__([numerator_column_name, denominator_column_name], output_dir, group_by=group_by, **kwargs)
         self.numerator_column_name = numerator_column_name
         self.denominator_column_name = denominator_column_name
 
     def _aggregate(self, data):
+        """
+        Aggregates data without grouping by computing the sum of both columns and the
+        ratio of sum(numerator) to sum(denominator).
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+        """
         sums = {col: data[col].sum() for col in self.column_names}
         divided_result = sums[self.numerator_column_name] / sums[self.denominator_column_name]
         self.aggregated_result = {"ratio": divided_result}
 
     def _aggregate_grouped(self, data):
+        """
+        Aggregates data with grouping by computing the sum of both columns within each group
+        and the ratio of sum(numerator) to sum(denominator).
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+        """
         gb = data.groupby(self.group_by)
         divided_result = (gb[self.numerator_column_name].sum() / gb[self.denominator_column_name].sum()).to_dict()
         self.aggregated_result = {"ratio": divided_result}
 
 
 class ValueFilteredAggregator(Aggregator):
+    """
+    Aggregator that filters out a particular value from the specified columns before aggregating the data
+    using a provided aggregator class.
+    """
+
     def __init__(
         self,
         agg_class,
@@ -336,17 +504,20 @@ def __init__(
         **kwargs,
     ):
         """
-        Aggregator that filters out a particular value before aggregating the data.
-        args:
-            agg_class: Aggregator class to use for aggregation
-            value: value to filter out
-            column_names: column names to filter and aggregate
-            output_dir: str. directory to save the report
-            group_by: str. or list of str. column(s) to group by before aggregating
-            ignore_non_numeric: bool. if True ignore non-numeric values for average aggregator
-            filename_base: str. optional base string to be used in the file name for the report. If not None, the report filename will concatenate the class name, datetime, and filename_base.
+        Initializes a ValueFilteredAggregator.
+
+        Args:
+            agg_class (class): The aggregator class to use for aggregation.
+            value (Any): The value to filter out.
+            column_names (List[str]): The column names to filter and aggregate.
+            output_dir (str): The directory to save the report.
+            group_by (Union[str, List[str]], optional): The column(s) to group by before aggregating. Defaults to None.
+            ignore_non_numeric (bool, optional): If True, ignores non-numeric values for the average aggregator.
+                Defaults to False.
+            filename_base (str, optional): An optional base string to be used in the file name for the report. If not
+                None, the report filename will concatenate the class name, datetime, and filename_base. Defaults to None.
+            **kwargs: Additional keyword arguments.
         """
-
         self.base_aggregator = agg_class(
             column_names, output_dir, group_by, ignore_non_numeric, filename_base, **kwargs
         )
@@ -359,9 +530,14 @@ def __init__(
         self.filename_base = filename_base
 
     def aggregate(self, data):
+        """
+        Filters out the specified value in each column before calling the base aggregator.
+
+        Args:
+            data (pd.DataFrame): The input data to aggregate.
+        """
         agg_results = {}
         for col in self.column_names:
-            # workaround to process one column at a time
             filtered_data = data[data[col] != self.value].copy()
             self.base_aggregator.column_names = [col]
             self.base_aggregator.aggregate(filtered_data)
@@ -370,14 +546,16 @@ def aggregate(self, data):
 
 
 class CocoDetectionAggregator(Aggregator):
-    """This class uses the coco tools to calculated AP50 for the provided detections."""
+    """Uses the COCO tools to calculate AP50 for provided detections."""
 
     def __init__(self, column_names, output_dir, target_coco_json_reader: JsonReader):
         """
-        args:
-            column_names: Single column (pass as a list to conform with superclass), that indicates which column has the detection results
-            output_dir: str, directory to save the reports
-            target_coco_json_reader: JsonReader, reader to load the ground truth json for the detections (in coco json format)
+        Initializes a CocoDetectionAggregator.
+
+        Args:
+            column_names (List[str]): A single column name (in a list) containing detection results.
+            output_dir (str): The directory to save the reports.
+            target_coco_json_reader (JsonReader): A reader to load the ground-truth JSON for detections (in COCO format).
         """
         super().__init__(column_names, output_dir)
 
@@ -386,17 +564,19 @@ def __init__(self, column_names, output_dir, target_coco_json_reader: JsonReader
         self.coco.createIndex()
 
     def _aggregate(self, data):
+        """
+        Aggregates detections by computing the AP50 metric using the COCOeval library.
 
-        # pull the annotasions out of the data and form for the COCOeval library
+        Args:
+            data (pd.DataFrame): The input data containing detection annotations in the specified column.
+        """
         annotations = []
         for ann in data[self.column_names[0]].tolist():
             if ann != "none":
                 annotations += ast.literal_eval(ann)
 
-        # run the COCOeval library to get stats
         if annotations:
             cocovalPrediction = self.coco.loadRes(annotations)
-
             cocoEval = COCOeval(self.coco, cocovalPrediction, "bbox")
             cocoEval.evaluate()
             cocoEval.accumulate()
@@ -406,24 +586,36 @@ def _aggregate(self, data):
 
 
 class Reporter:
-    """This class applies various aggregations and visualizations to the data."""
+    """Applies various aggregations and visualizations to the data."""
 
     def __init__(self, output_dir, aggregator_configs=None, visualizer_configs=None):
         """
-        args:
-            output_dir: str. directory to save the reports
-            aggregator_configs: list of AggregatorConfig objects
-            visualizer_configs: list of VisualizerConfig objects
-        """
-        self.aggregators = [
-            config.class_name(**dict(output_dir=output_dir, **config.init_args)) for config in aggregator_configs
-        ]
-        self.visualizers = [
-            config.class_name(**dict(output_dir=output_dir, **config.init_args)) for config in visualizer_configs
-        ]
+        Initializes a Reporter.
+
+        Args:
+            output_dir (str): The directory to save the reports.
+            aggregator_configs (List[object], optional): A list of AggregatorConfig objects. Defaults to None.
+            visualizer_configs (List[object], optional): A list of VisualizerConfig objects. Defaults to None.
+        """
+        self.aggregators = (
+            [config.class_name(**dict(output_dir=output_dir, **config.init_args)) for config in aggregator_configs]
+            if aggregator_configs
+            else []
+        )
+        self.visualizers = (
+            [config.class_name(**dict(output_dir=output_dir, **config.init_args)) for config in visualizer_configs]
+            if visualizer_configs
+            else []
+        )
         self.output_dir = output_dir
 
     def generate_report(self, data):
+        """
+        Generates a report by applying configured aggregators and visualizers to the data.
+
+        Args:
+            data (pd.DataFrame): The input data to generate a report for.
+        """
         for aggregator in self.aggregators:
             aggregator.aggregate(data)
             aggregator.write_results()
diff --git a/eureka_ml_insights/metrics/spatial_and_layout_metrics.py b/eureka_ml_insights/metrics/spatial_and_layout_metrics.py
index eeec2cb1..d2ce48dc 100644
--- a/eureka_ml_insights/metrics/spatial_and_layout_metrics.py
+++ b/eureka_ml_insights/metrics/spatial_and_layout_metrics.py
@@ -1,3 +1,8 @@
+"""This module provides a collection of metrics related to object detection, multiple-choice evaluation, 
+and text analysis. It also offers utility functions for downloading NLTK resources and comparing words 
+using WordNet path similarity.
+"""
+
 import ast
 import logging
 import re
@@ -10,7 +15,11 @@
 
 
 def download_nltk_resources():
-    """Download 'wordnet' if not already installed"""
+    """Downloads the 'wordnet' corpus if it is not already installed.
+
+    This function attempts to locate the 'wordnet' corpus. If it is not found,
+    it downloads the corpus using nltk.download.
+    """
     try:
         nltk.data.find("corpora/wordnet.zip")
     except LookupError:
@@ -22,30 +31,45 @@ def download_nltk_resources():
 
 
 class SpatialAndLayoutReasoningMetric(MultipleChoiceMetric):
-    """This class is a metric that requires a correct prediction to be only one of the valid multiple choice answers."""
+    """SpatialAndLayoutReasoningMetric requires a correct prediction to be only one valid multiple choice answer.
+
+    This class inherits from MultipleChoiceMetric and checks whether only the correct option, among
+    all the target options, is present exactly once in the answer text.
+    """
 
     def __init__(self):
+        """Initializes the SpatialAndLayoutReasoningMetric."""
         super().__init__()
 
     def __evaluate__(self, answer_text, target_text, target_options, is_valid):
+        """Evaluates the answer against a list of valid target options.
+
+        Args:
+            answer_text (str): The text provided as the answer.
+            target_text (str): The correct target option.
+            target_options (list[str]): All possible valid choices.
+            is_valid (bool): Indicates if the sample is valid for evaluation.
+
+        Returns:
+            str: Returns "correct" if exactly one option appears and it is the target_text,
+            "incorrect" if exactly one option appears and it is not the target_text,
+            otherwise "none".
+        """
         if not is_valid:
             return "none"
 
-        # count which options appear
         options_count = {}
         for option in target_options:
-
-            # make sure only matches whole words by includeing a word boundary "\b" in the pattern
+            # Ensure matching whole words by including a word boundary "\b" in the pattern
             pattern = "\\b{phrase}\\b".format(phrase=option)
-
-            matches = re.findall(pattern, answer_text, flags=re.IGNORECASE)  # search
-            options_count[option] = len(matches)  # count
+            matches = re.findall(pattern, answer_text, flags=re.IGNORECASE)
+            options_count[option] = len(matches)
 
         total_count = sum(options_count.values())
 
-        # correct if only the right answer appears once,
-        # incorrect if only a wrong answer appears,
-        # none if there are mutiple answers or nothing matches
+        # "correct" if only the right answer appears once,
+        # "incorrect" if only a wrong answer appears,
+        # "none" if there are multiple answers or no matches
         return (
             "correct"
             if (total_count == 1 and options_count[target_text] == 1)
@@ -54,7 +78,18 @@ def __evaluate__(self, answer_text, target_text, target_options, is_valid):
 
 
 def wordnet_compare(word, compare_word, threshold=1):
+    """Compares two words for similarity using WordNet path similarity.
 
+    Args:
+        word (str): The first word to compare.
+        compare_word (str): The second word to compare with the first.
+        threshold (float, optional): The similarity threshold required to consider the words matching.
+            Defaults to 1.
+
+    Returns:
+        bool: True if the maximum path similarity between any synsets of the two words
+        is greater than or equal to the threshold, otherwise False.
+    """
     syns1 = wordnet.synsets(word.replace(" ", "_"))
     syns2 = wordnet.synsets(compare_word.replace(" ", "_"))
 
@@ -63,10 +98,8 @@ def wordnet_compare(word, compare_word, threshold=1):
     for syn1 in syns1:
         for syn2 in syns2:
             sim = syn1.path_similarity(syn2)
-
-            if sim > max_sim:
+            if sim and sim > max_sim:
                 max_sim = sim
-
             if max_sim == 1:
                 break
 
@@ -74,18 +107,28 @@ def wordnet_compare(word, compare_word, threshold=1):
 
 
 class ObjectRecognitionMetric(ClassicMetric):
-    """
-    This class implements a simple metric for object detection.
-    If any part of the target_text (including one of the mutiple words) appears in the answer_text,
-    the answer is considered to be correct.
+    """Implements a simple metric for object detection.
+
+    If any part of the target text (including one of the multiple words) appears in the answer text,
+    the answer is considered correct.
     """
 
     def __evaluate__(self, answer_text, target_text, is_valid):
+        """Evaluates the answer text against the target text for object recognition.
+
+        Args:
+            answer_text (str): The text provided as the answer.
+            target_text (str or list): The target text or list of target strings.
+            is_valid (bool): Indicates if the sample is valid for evaluation.
+
+        Returns:
+            str: "correct" if recognized, "incorrect" otherwise, or "none" if invalid.
+        """
         if not is_valid:
             return "none"
 
         # Some common synonyms
-        # TODO change this to use wordnet subsitutions as in CocoObjectDetectionMetric
+        # TODO change this to use wordnet substitutions as in CocoObjectDetectionMetric
         answer_text = answer_text.lower()
         answer_text = answer_text.replace("sofa", "couch")
         answer_text = answer_text.replace("sneakers", "shoes")
@@ -110,22 +153,18 @@ def __evaluate__(self, answer_text, target_text, is_valid):
         # search over parts of a
         a_parts = a.split(" ")
         ps = []
-
         for p in a_parts:
             correct = True if p in answer_text else False
             ps.append(correct)
-
         pred_a_bool = any(ps)
 
         if b:
             # search over parts of b
             b_parts = b.split(" ")
             ps = []
-
             for p in b_parts:
                 correct = True if p in answer_text.lower() else False
                 ps.append(correct)
-
             pred_b_bool = any(ps)
 
             # correct if both correct
@@ -137,44 +176,55 @@ def __evaluate__(self, answer_text, target_text, is_valid):
 
 
 class CocoObjectDetectionMetric(DetectionMetric):
-    """
-    This class implements parsing to prep for a COCO detection metrics.
-    The model output is parsed and formed into a COCO annotation.
-    This is then returned and stored as the metric output.
-    The stats are computed in the CocoDetectionAggregator.
+    """Implements parsing to prepare for COCO detection metrics.
+
+    The model output is parsed and formed into a COCO annotation, which is returned and stored as the
+    metric output. The final statistics are computed in the CocoDetectionAggregator.
     """
 
     def __init__(self, target_coco_json_reader: JsonReader):
-        """
-        args:
-            target_coco_json_reader: JsonReader, reader to load the ground truth json for the detections (in coco json format)
+        """Initializes the metric with the ground-truth COCO JSON data.
+
+        Args:
+            target_coco_json_reader (JsonReader): Reader to load the ground truth JSON for
+                the detections (in COCO JSON format).
         """
         super().__init__()
 
-        # create COCO class and populate with grountruth json data
+        # create COCO class and populate with groundtruth json data
         self.coco = COCO()
         self.coco.dataset = target_coco_json_reader.read()
         self.coco.createIndex()
 
-        # get a lst of all cats
+        # get a list of all categories
         coco_cat_ids = self.coco.getCatIds()
         coco_cats = self.coco.loadCats(coco_cat_ids)
         self.coco_cat_name_to_id = {}
 
-        # create a dict to look up cat id by
+        # create a dict to look up category ID by name
         for c in coco_cats:
             self.coco_cat_name_to_id[c["name"]] = c["id"]
 
     def __evaluate__(self, image_id, answer_text, is_valid):
+        """Evaluates the detections extracted from the answer text for a given image.
+
+        Args:
+            image_id (int): The identifier of the image being evaluated.
+            answer_text (str): The text output from the model, containing detections.
+            is_valid (bool): Indicates if the sample is valid for evaluation.
+
+        Returns:
+            str: A JSON-formatted string representation of the COCO-style annotations.
+        """
         if not is_valid:
             return "none"
 
-        # load image info, need w and h
+        # load image info, need width and height
         img = self.coco.loadImgs(image_id)
         w = img[0]["width"]
         h = img[0]["height"]
 
-        # split each line into a seperate detection
+        # split each line into a separate detection
         answer_text = answer_text.strip()
         dets = answer_text.split("\n")
 
@@ -182,7 +232,7 @@ def __evaluate__(self, image_id, answer_text, is_valid):
 
         for det in dets:
             try:
-                # parse the detection format (as specified int he prompt)
+                # parse the detection format (as specified in the prompt)
                 parts = det.split("-")
                 assert len(parts) == 3, f"Error parsing detection: {det}"
                 box_string, label, confidence = parts
@@ -199,7 +249,6 @@ def __evaluate__(self, image_id, answer_text, is_valid):
                 # use wordnet distance to handle synonyms
                 for cat in self.coco_cat_name_to_id.keys():
                     if wordnet_compare(label, cat):
-
                         annotation = {
                             "image_id": image_id,
                             "category_id": self.coco_cat_name_to_id[cat],
diff --git a/eureka_ml_insights/models/models.py b/eureka_ml_insights/models/models.py
index 82a57caa..8acbc0e8 100644
--- a/eureka_ml_insights/models/models.py
+++ b/eureka_ml_insights/models/models.py
@@ -1,4 +1,10 @@
-"""This module contains classes for interacting with various models, including API-based models and HuggingFace models."""
+"""
+Interact with various models, including API-based and HuggingFace models.
+
+This module provides multiple classes that define and manage model interactions, including
+chat-based and endpoint-based workflows. It handles text-based requests, token counting,
+and authentication for a variety of model endpoints.
+"""
 
 import json
 import pandas as pd
@@ -13,16 +19,21 @@
 import anthropic
 import requests
 import tiktoken
-from azure.identity import DefaultAzureCredential, get_bearer_token_provider
+from azure.identity import AzureCliCredential, get_bearer_token_provider
 
 from eureka_ml_insights.secret_management import get_secret
 
 
 @dataclass
 class Model(ABC):
-    """This class is used to define the structure of a model class.
-    Any model class should inherit from this class and implement the generate method that returns a dict
-    containing the model_output, is_valid, and other relevant information.
+    """Base class for models.
+
+    Any model class should inherit from this class and implement the 'generate' method, which returns
+    a dictionary containing 'model_output', 'is_valid', and any other relevant information.
+
+    Attributes:
+        chat_mode (bool): Whether the model operates in chat mode.
+        system_message (str): A system message that can be used by the model.
     """
 
     chat_mode: bool = False
@@ -30,17 +41,34 @@ class Model(ABC):
 
     @abstractmethod
     def generate(self, text_prompt, *args, **kwargs):
+        """Generate a response from the model.
+
+        Args:
+            text_prompt (str): The text prompt to generate a response for.
+            *args: Additional positional arguments.
+            **kwargs: Additional keyword arguments.
+
+        Returns:
+            dict: A dictionary containing 'model_output', 'is_valid', and any other fields.
+
+        Raises:
+            NotImplementedError: If not implemented by a subclass.
+        """
         raise NotImplementedError
 
     def count_tokens(self, model_output: str = None, is_valid: bool = False):
-        """
-        This method uses tiktoken tokenizer to count the number of tokens in the response.
-        See: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
-        args:
-            model_output (str): the text response from the model.
-            is_valid (bool): whether the response is valid or not.
-        returns:
-            n_output_tokens (int): the number of tokens in the text response.
+        """Count the number of tokens in the response.
+
+        Uses the tiktoken tokenizer to count tokens in the model's response. For more information,
+        see:
+        https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
+
+        Args:
+            model_output (str, optional): The text response from the model. Defaults to None.
+            is_valid (bool, optional): Whether the response is valid. Defaults to False.
+
+        Returns:
+            int or None: The number of tokens, or None if the response is invalid or None.
         """
         encoding = tiktoken.get_encoding("cl100k_base")
         if model_output is None or not is_valid:
@@ -50,13 +78,20 @@ def count_tokens(self, model_output: str = None, is_valid: bool = False):
             return n_output_tokens
 
     def base64encode(self, query_images):
+        """Encode a list of images as base64 strings.
+
+        Args:
+            query_images (list): A list of PIL Images to encode as base64 strings.
+
+        Returns:
+            list: A list of base64-encoded string representations of the images.
+        """
         import base64
         from io import BytesIO
 
         encoded_images = []
 
         for query_image in query_images:
-
             buffered = BytesIO()
             query_image.save(buffered, format="JPEG")
             base64_bytes = base64.b64encode(buffered.getvalue())
@@ -68,21 +103,38 @@ def base64encode(self, query_images):
 
 @dataclass
 class KeyBasedAuthMixIn:
-    """This class is used to handle key-based authentication for models."""
+    """Mixin class for key-based authentication.
+
+    This class handles key-based authentication for models.
+
+    Attributes:
+        api_key (str): The API key used for authentication.
+        secret_key_params (dict): The parameters used to retrieve the API key from a secret manager, if necessary.
+    """
 
     api_key: str = None
     secret_key_params: dict = None
 
     def __post_init__(self):
+        """Initialize the mixin.
+
+        Ensures that either an API key or secret key parameters are provided.
+
+        Raises:
+            ValueError: If neither an API key nor secret key parameters are provided.
+        """
         if self.api_key is None and self.secret_key_params is None:
             raise ValueError("Either api_key or secret_key_params must be provided.")
         self.api_key = self.get_api_key()
 
     def get_api_key(self):
-        """
-        This method is used to get the api_key for the models that require key-based authentication.
-        Either api_key (str) or secret_key_params (dict) must be provided.
-        if api_key is not directly provided, secret_key_params must be provided to get the api_key using get_secret method.
+        """Retrieve the API key for key-based authentication.
+
+        If 'api_key' is not directly provided, 'secret_key_params' is used to retrieve
+        the API key using 'get_secret'.
+
+        Returns:
+            str: The API key.
         """
         if self.api_key is None:
             self.api_key = get_secret(**self.secret_key_params)
@@ -90,27 +142,62 @@ def get_api_key(self):
 
 @dataclass
 class EndpointModel(Model):
-    """This class is used to interact with API-based models."""
+    """Abstract class for interacting with API-based models.
+
+    Inherits from:
+        Model (ABC)
+
+    Attributes:
+        chat_mode (bool): Whether the model operates in chat mode (from Model).
+        system_message (str): A system message that can be used by the model (from Model).
+        num_retries (int): The number of times to retry the request if it fails.
+    """
 
     num_retries: int = 3
 
     @abstractmethod
     def create_request(self, text_prompt, *args, **kwargs):
+        """Create a request for the model.
+
+        Args:
+            text_prompt (str): The text prompt to generate the request.
+            *args: Additional positional arguments.
+            **kwargs: Additional keyword arguments.
+
+        Returns:
+            Any: The request object to be sent to the model.
+
+        Raises:
+            NotImplementedError: Must be implemented by subclasses.
+        """
         raise NotImplementedError
 
     @abstractmethod
     def get_response(self, request):
-        # must return the model output and the response time
+        """Get the response from the model.
+
+        Args:
+            request (Any): The request object created by create_request.
+
+        Returns:
+            dict: A dictionary containing the model output and the response time.
+
+        Raises:
+            NotImplementedError: Must be implemented by subclasses.
+        """
         raise NotImplementedError
 
     def update_chat_history(self, query_text, model_output, *args, **kwargs):
-        """
-        This method is used to update the chat history with the model response.
-        args:
-            query_text (str): the text prompt to generate the response.
-            model_output (str): the text response from the model.
-        returns:
-            previous_messages (list): a list of messages in the chat history.
+        """Update the chat history with the model response.
+
+        Args:
+            query_text (str): The text prompt used to generate the model response.
+            model_output (str): The text response from the model.
+            *args: Additional positional arguments.
+            **kwargs: Additional keyword arguments (may contain 'previous_messages').
+
+        Returns:
+            list: A list of messages representing the updated chat history.
         """
         previous_messages = kwargs.get("previous_messages", [])
         previous_messages.append({"role": "user", "content": query_text})
@@ -118,15 +205,17 @@ def update_chat_history(self, query_text, model_output, *args, **kwargs):
         return previous_messages
 
     def generate(self, query_text, *args, **kwargs):
-        """
-        Calls the endpoint to generate the model response.
-        args:
-            query_text (str): the text prompt to generate the response.
-            query_images (list): list of images in base64 bytes format to be included in the request.
-            system_message (str): the system message to be included in the request.
-        returns:
-            response_dict (dict): a dictionary containing the model_output, is_valid, response_time, and n_output_tokens,
-                                  and any other relevant information returned by the model.
+        """Call the endpoint to generate the model response.
+
+        Args:
+            query_text (str): The text prompt to generate the response for.
+            *args: Additional positional arguments.
+            **kwargs: Additional keyword arguments (may include 'query_images' and 'system_message').
+
+        Returns:
+            dict: A dictionary containing 'model_output', 'is_valid', 'response_time', 'n_output_tokens',
+                and any other relevant information returned by the model. If 'chat_mode' is True, also
+                includes 'previous_messages'.
         """
         response_dict = {}
         if hasattr(self, "system_message") and self.system_message:
@@ -178,6 +267,17 @@ def generate(self, query_text, *args, **kwargs):
 
     @abstractmethod
     def handle_request_error(self, e):
+        """Handle errors that occur during the request.
+
+        Args:
+            e (Exception): The exception encountered.
+
+        Returns:
+            bool: True if the error should cause an immediate return, False otherwise.
+
+        Raises:
+            NotImplementedError: Must be implemented by subclasses.
+        """
         raise NotImplementedError
 
 @dataclass
@@ -286,6 +386,29 @@ def get_response(self, target_prompt, target_repeat_id):
 
 @dataclass
 class RestEndpointModel(EndpointModel, KeyBasedAuthMixIn):
+    """Class for interacting with REST-based endpoint models.
+
+    Inherits from:
+        EndpointModel (Model)
+        KeyBasedAuthMixIn
+
+    Attributes:
+        chat_mode (bool): Whether the model operates in chat mode (from Model).
+        system_message (str): A system message that can be used by the model (from Model).
+        num_retries (int): The number of attempts to retry the request (from EndpointModel).
+        api_key (str): The API key used for authentication (from KeyBasedAuthMixIn).
+        secret_key_params (dict): Parameters to retrieve the API key from secret manager (from KeyBasedAuthMixIn).
+        url (str): The endpoint URL.
+        model_name (str): The name of the model.
+        temperature (float): The temperature for text generation.
+        max_tokens (int): The maximum number of tokens to generate.
+        top_p (float): The top-p sampling value.
+        frequency_penalty (float): The frequency penalty parameter.
+        presence_penalty (float): The presence penalty parameter.
+        do_sample (bool): Whether to use sampling.
+        timeout (int): The timeout in seconds for the request.
+    """
+
     url: str = None
     model_name: str = None
     temperature: float = 0
@@ -297,7 +420,21 @@ class RestEndpointModel(EndpointModel, KeyBasedAuthMixIn):
     timeout: int = None
 
     def create_request(self, text_prompt, query_images=None, system_message=None, previous_messages=None):
-        """Creates a request for the model."""
+        """Create a request for the REST endpoint model.
+
+        Args:
+            text_prompt (str): The text prompt to generate the model response.
+            query_images (list, optional): List of images in base64 bytes format to be included in the request.
+                Defaults to None, as images are not supported.
+            system_message (str, optional): System message to be included in the request. Defaults to None.
+            previous_messages (list, optional): List of previous messages for maintaining chat state. Defaults to None.
+
+        Returns:
+            urllib.request.Request: The request object to be sent to the model endpoint.
+
+        Raises:
+            NotImplementedError: If images are provided, as they're not supported yet.
+        """
         messages = []
         if system_message:
             messages.append({"role": "system", "content": system_message})
@@ -319,8 +456,6 @@ def create_request(self, text_prompt, query_images=None, system_message=None, pr
             raise NotImplementedError("Images are not supported for RestEndpointModel endpoints yet.")
 
         body = str.encode(json.dumps(data))
-        # The azureml-model-deployment header will force the request to go to a specific deployment.
-        # Remove this header to have the request observe the endpoint traffic rules
         headers = {
             "Content-Type": "application/json",
             "Authorization": ("Bearer " + self.api_key),
@@ -329,20 +464,37 @@ def create_request(self, text_prompt, query_images=None, system_message=None, pr
         return urllib.request.Request(self.url, body, headers)
 
     def get_response(self, request):
-        # Get the model response and measure the time taken.
+        """Obtain the model response from the REST endpoint.
+
+        Measures the response time, parses the model output, and returns both.
+
+        Args:
+            request (urllib.request.Request): The request object to be sent to the model endpoint.
+
+        Returns:
+            dict: Contains 'model_output' and 'response_time'.
+        """
         start_time = time.time()
         response = urllib.request.urlopen(request, timeout=self.timeout)
         end_time = time.time()
-        # Parse the response and return the model output.
         res = json.loads(response.read())
         model_output = res["output"]
         response_time = end_time - start_time
         return {"model_output": model_output, "response_time": response_time}
 
     def handle_request_error(self, e):
+        """Handle request errors for the REST endpoint.
+
+        Logs detailed information about the error.
+
+        Args:
+            e (Exception): The exception encountered.
+
+        Returns:
+            bool: False to indicate that the operation should not proceed further.
+        """
         if isinstance(e, urllib.error.HTTPError):
             logging.info("The request failed with status code: " + str(e.code))
-            # Print the headers - they include the request ID and the timestamp, which are useful for debugging.
             logging.info(e.info())
             logging.info(e.read().decode("utf8", "ignore"))
         else:
@@ -352,9 +504,28 @@ def handle_request_error(self, e):
 
 @dataclass
 class ServerlessAzureRestEndpointModel(EndpointModel, KeyBasedAuthMixIn):
-    """This class can be used for serverless Azure model deployments."""
+    """Class for serverless Azure model deployments.
+
+    Inherits from:
+        EndpointModel (Model)
+        KeyBasedAuthMixIn
+
+    Additional information:
+        https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-serverless?tabs=azure-ai-studio
+
+    Attributes:
+        chat_mode (bool): Whether the model operates in chat mode (from Model).
+        system_message (str): A system message that can be used by the model (from Model).
+        num_retries (int): The number of attempts to retry the request (from EndpointModel).
+        api_key (str): The API key used for authentication (from KeyBasedAuthMixIn).
+        secret_key_params (dict): Parameters to retrieve the API key from secret manager (from KeyBasedAuthMixIn).
+        url (str): The endpoint URL.
+        model_name (str): The name of the model.
+        stream (bool): Whether or not to stream the response.
+        auth_scope (str): The authentication scope for Azure.
+        timeout (int): The timeout in seconds for the request.
+    """
 
-    """https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-serverless?tabs=azure-ai-studio"""
     url: str = None
     model_name: str = None
     stream: bool = False
@@ -362,36 +533,55 @@ class ServerlessAzureRestEndpointModel(EndpointModel, KeyBasedAuthMixIn):
     timeout: int = None
 
     def __post_init__(self):
+        """Initialize the serverless Azure REST endpoint model.
+
+        Tries to initialize with either an API key or a bearer token provider. If neither is provided,
+        obtains a bearer token from Azure CLI.
+        """
         try:
             super().__post_init__()
             self.headers = {
                 "Content-Type": "application/json",
                 "Authorization": ("Bearer " + self.api_key),
-                # The behavior of the API when extra parameters are indicated in the payload.
-                # Using pass-through makes the API to pass the parameter to the underlying model.
-                # Use this value when you want to pass parameters that you know the underlying model can support.
-                # https://learn.microsoft.com/en-us/azure/machine-learning/reference-model-inference-chat-completions?view=azureml-api-2
                 "extra-parameters": "pass-through",
             }
         except ValueError:
-            self.bearer_token_provider = get_bearer_token_provider(DefaultAzureCredential(), self.auth_scope)
+            self.bearer_token_provider = get_bearer_token_provider(AzureCliCredential(), self.auth_scope)
             self.headers = {
                 "Content-Type": "application/json",
                 "Authorization": ("Bearer " + self.bearer_token_provider()),
-                # The behavior of the API when extra parameters are indicated in the payload.
-                # Using pass-through makes the API to pass the parameter to the underlying model.
-                # Use this value when you want to pass parameters that you know the underlying model can support.
-                # https://learn.microsoft.com/en-us/azure/machine-learning/reference-model-inference-chat-completions?view=azureml-api-2
                 "extra-parameters": "pass-through",
             }
 
     @abstractmethod
     def create_request(self, text_prompt, query_images=None, system_message=None, previous_messages=None):
-        # Exact model parameters are model-specific.
-        # The method cannot be implemented unless the model being deployed is known.
+        """Create a request for the serverless Azure REST endpoint model.
+
+        Args:
+            text_prompt (str): The text prompt for the model.
+            query_images (list, optional): List of images to be included in the request. Defaults to None.
+            system_message (str, optional): The system message for the request. Defaults to None.
+            previous_messages (list, optional): Previous messages in the chat. Defaults to None.
+
+        Returns:
+            Any: The request object to be sent to the model.
+
+        Raises:
+            NotImplementedError: Must be implemented by subclasses.
+        """
         raise NotImplementedError
 
     def get_response(self, request):
+        """Get the response from the Azure REST endpoint.
+
+        Measures the response time, parses the model output, and includes usage details if present.
+
+        Args:
+            request (urllib.request.Request): The request object.
+
+        Returns:
+            dict: Contains 'model_output', 'response_time', and optionally 'usage'.
+        """
         start_time = time.time()
         response = urllib.request.urlopen(request, timeout=self.timeout)
         end_time = time.time()
@@ -407,9 +597,18 @@ def get_response(self, request):
         return response_dict
 
     def handle_request_error(self, e):
+        """Handle errors during a serverless Azure REST endpoint request.
+
+        Logs the error details for debugging.
+
+        Args:
+            e (Exception): The exception encountered.
+
+        Returns:
+            bool: False to indicate that the operation should not proceed further.
+        """
         if isinstance(e, urllib.error.HTTPError):
             logging.info("The request failed with status code: " + str(e.code))
-            # Print the headers - they include the request ID and the timestamp, which are useful for debugging.
             logging.info(e.info())
             logging.info(e.read().decode("utf8", "ignore"))
         else:
@@ -419,9 +618,36 @@ def handle_request_error(self, e):
 
 @dataclass
 class LlamaServerlessAzureRestEndpointModel(ServerlessAzureRestEndpointModel):
-    """Tested for Llama 3.1 405B Instruct deployments and Llama 3.2 90B Vision Instruct."""
-
-    """See https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-llama?tabs=llama-three for the api reference."""
+    """Serverless Azure REST endpoint model for Llama-based deployments.
+
+    Inherits from:
+        ServerlessAzureRestEndpointModel (EndpointModel, KeyBasedAuthMixIn)
+
+    Tested for Llama 3.1 405B Instruct deployments and Llama 3.2 90B Vision Instruct.
+    See:
+    https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-llama?tabs=llama-three
+
+    Attributes:
+        chat_mode (bool): Whether the model operates in chat mode (from Model).
+        system_message (str): A system message that can be used by the model (from Model).
+        num_retries (int): The number of attempts to retry (from EndpointModel).
+        api_key (str): The API key (from KeyBasedAuthMixIn).
+        secret_key_params (dict): Secret key params (from KeyBasedAuthMixIn).
+        url (str): The endpoint URL (from ServerlessAzureRestEndpointModel).
+        model_name (str): The name of the model.
+        stream (bool): Whether to stream the response (from ServerlessAzureRestEndpointModel).
+        auth_scope (str): The authentication scope (from ServerlessAzureRestEndpointModel).
+        timeout (int): The timeout in seconds (from ServerlessAzureRestEndpointModel).
+        temperature (float): The sampling temperature.
+        max_tokens (int): The maximum number of tokens to generate.
+        top_p (float): The top-p sampling value.
+        frequency_penalty (float): The frequency penalty parameter.
+        presence_penalty (float): The presence penalty parameter.
+        use_beam_search (bool): Whether or not to use beam search.
+        best_of (int): The best-of parameter for multiple sampling runs.
+        skip_special_tokens (bool): Whether or not to skip special tokens in the output.
+        ignore_eos (bool): Whether or not to ignore the EOS token.
+    """
 
     temperature: float = 0
     max_tokens: int = 2000
@@ -434,6 +660,23 @@ class LlamaServerlessAzureRestEndpointModel(ServerlessAzureRestEndpointModel):
     ignore_eos: bool = False
 
     def create_request(self, text_prompt, query_images=None, system_message=None, previous_messages=None):
+        """Create a request for the Llama serverless Azure REST endpoint.
+
+        Supports an optional single image for the Llama vision model. The image is base64-encoded
+        and included in the request if provided.
+
+        Args:
+            text_prompt (str): The user prompt text.
+            query_images (list, optional): List containing a single image (for Llama vision model). Defaults to None.
+            system_message (str, optional): The system message. Defaults to None.
+            previous_messages (list, optional): Previous messages for maintaining chat history. Defaults to None.
+
+        Returns:
+            urllib.request.Request: The request object to be sent to the Llama endpoint.
+
+        Raises:
+            ValueError: If more than one image is provided for the Llama vision model.
+        """
         messages = []
         if system_message:
             messages.append({"role": "system", "content": system_message})
@@ -473,22 +716,66 @@ def create_request(self, text_prompt, query_images=None, system_message=None, pr
 
 @dataclass
 class MistralServerlessAzureRestEndpointModel(ServerlessAzureRestEndpointModel):
-    """Tested for Mistral Large 2 2407 deployments."""
+    """Serverless Azure REST endpoint model for Mistral-based deployments.
+
+    Inherits from:
+        ServerlessAzureRestEndpointModel (EndpointModel, KeyBasedAuthMixIn)
+
+    Tested for Mistral Large 2 2407 deployments. Refer to:
+    https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-mistral?tabs=mistral-large#mistral-chat-api
+
+    Attributes:
+        chat_mode (bool): Whether the model operates in chat mode (from Model).
+        system_message (str): A system message that can be used by the model (from Model).
+        num_retries (int): The number of attempts to retry (from EndpointModel).
+        api_key (str): The API key (from KeyBasedAuthMixIn).
+        secret_key_params (dict): Secret key params (from KeyBasedAuthMixIn).
+        url (str): The endpoint URL (from ServerlessAzureRestEndpointModel).
+        model_name (str): The name of the model.
+        stream (bool): Whether to stream the response (from ServerlessAzureRestEndpointModel).
+        auth_scope (str): The authentication scope (from ServerlessAzureRestEndpointModel).
+        timeout (int): The timeout in seconds (from ServerlessAzureRestEndpointModel).
+        temperature (float): The sampling temperature.
+        max_tokens (int): The maximum number of tokens to generate.
+        top_p (float): The top-p sampling value.
+        safe_prompt (bool): Whether or not to use a safe prompt.
+    """
 
-    """See https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-mistral?tabs=mistral-large#mistral-chat-api for the api reference."""
     temperature: float = 0
     max_tokens: int = 2000
     top_p: float = 1
     safe_prompt: bool = False
 
     def __post_init__(self):
+        """Initialize the Mistral serverless Azure REST endpoint model.
+
+        Enforces constraints on top_p if temperature is zero, following the Mistral chat API guidelines.
+        """
         if self.temperature == 0 and self.top_p != 1:
-            warning_message = "Top_p must be 1 when using greedy sampling. Temperature zero means greedy sampling. Top_p will be reset to 1. See https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-mistral?tabs=mistral-large#mistral-chat-api for more information."
+            warning_message = (
+                "Top_p must be 1 when using greedy sampling. Temperature zero means greedy sampling. "
+                "Top_p will be reset to 1. See https://learn.microsoft.com/en-us/azure/ai-studio/"
+                "how-to/deploy-models-mistral?tabs=mistral-large#mistral-chat-api for more information."
+            )
             logging.warning(warning_message)
             self.top_p = 1
         super().__post_init__()
 
     def create_request(self, text_prompt, query_images=None, system_message=None, previous_messages=None):
+        """Create a request for the Mistral serverless Azure REST endpoint.
+
+        Args:
+            text_prompt (str): The text prompt for which the model generates a response.
+            query_images (list, optional): Not supported for this model. Defaults to None.
+            system_message (str, optional): The system message. Defaults to None.
+            previous_messages (list, optional): The previous messages in the chat. Defaults to None.
+
+        Returns:
+            urllib.request.Request: The request object.
+
+        Raises:
+            NotImplementedError: If images are provided, as they're not supported.
+        """
         messages = []
         if system_message:
             messages.append({"role": "system", "content": system_message})
@@ -502,8 +789,6 @@ def create_request(self, text_prompt, query_images=None, system_message=None, pr
             "max_tokens": self.max_tokens,
             "temperature": self.temperature,
             "top_p": self.top_p,
-            # Safe_prompt activates an optional system prompt to enforce guardrails.
-            # See https://docs.mistral.ai/capabilities/guardrailing/
             "safe_prompt": self.safe_prompt,
             "stream": self.stream,
         }
@@ -513,13 +798,50 @@ def create_request(self, text_prompt, query_images=None, system_message=None, pr
 
 @dataclass
 class DeepseekR1ServerlessAzureRestEndpointModel(ServerlessAzureRestEndpointModel):
-    # setting temperature to 0.6 as suggested in https://huggingface.co/deepseek-ai/DeepSeek-R1
+    """Serverless Azure REST endpoint model for DeepSeek-R1.
+
+    Inherits from:
+        ServerlessAzureRestEndpointModel (EndpointModel, KeyBasedAuthMixIn)
+
+    See https://huggingface.co/deepseek-ai/DeepSeek-R1 for more details.
+
+    Attributes:
+        chat_mode (bool): Whether the model operates in chat mode (from Model).
+        system_message (str): A system message that can be used by the model (from Model).
+        num_retries (int): The number of attempts to retry (from EndpointModel).
+        api_key (str): The API key (from KeyBasedAuthMixIn).
+        secret_key_params (dict): Secret key params (from KeyBasedAuthMixIn).
+        url (str): The endpoint URL (from ServerlessAzureRestEndpointModel).
+        model_name (str): The name of the model.
+        stream (bool): Whether to stream the response (from ServerlessAzureRestEndpointModel).
+        auth_scope (str): The authentication scope (from ServerlessAzureRestEndpointModel).
+        timeout (int): The timeout in seconds (from ServerlessAzureRestEndpointModel).
+        temperature (float): The sampling temperature. Defaults to 0.6.
+        max_tokens (int): The maximum number of tokens to generate. Defaults to 4096.
+        top_p (float): The top-p sampling value. Defaults to 0.95.
+        presence_penalty (float): The presence penalty parameter. Defaults to 0.
+    """
+
     temperature: float = 0.6
     max_tokens: int = 4096
     top_p: float = 0.95
     presence_penalty: float = 0
 
     def create_request(self, text_prompt, query_images=None, system_message=None, previous_messages=None):
+        """Create a request for the DeepSeek-R1 serverless Azure REST endpoint.
+
+        Args:
+            text_prompt (str): The text prompt to generate the model response.
+            query_images (list, optional): Not supported for this model. Defaults to None.
+            system_message (str, optional): The system message. Defaults to None.
+            previous_messages (list, optional): The previous messages in the chat. Defaults to None.
+
+        Returns:
+            urllib.request.Request: The request object.
+
+        Raises:
+            NotImplementedError: If images are provided, as they're not supported.
+        """
         messages = []
         if system_message:
             messages.append({"role": "system", "content": system_message})
@@ -543,11 +865,24 @@ def create_request(self, text_prompt, query_images=None, system_message=None, pr
 
 @dataclass
 class OpenAICommonRequestResponseMixIn:
-    """
-    This mixin class defines the request and response handling for most OpenAI models.
+    """Define request and response handling for most OpenAI models.
+
+    This mixin provides methods to create a chat request body and parse the
+    response from the OpenAI API.
     """
 
     def create_request(self, text_prompt, query_images=None, system_message=None, previous_messages=None):
+        """Create a request dictionary for use with the OpenAI chat API.
+
+        Args:
+            text_prompt (str): The user-provided text prompt.
+            query_images (Optional[List[str]], optional): A list of images to encode and include in the request, if any.
+            system_message (Optional[str], optional): The system message to include in the conversation, if any.
+            previous_messages (Optional[List[Dict]], optional): A list of previous messages in the conversation.
+
+        Returns:
+            dict: A request body dictionary that can be passed to the OpenAI API.
+        """
         messages = []
         if system_message:
             messages.append({"role": "system", "content": system_message})
@@ -573,6 +908,14 @@ def create_request(self, text_prompt, query_images=None, system_message=None, pr
         return request_body
 
     def get_response(self, request):
+        """Send a chat completion request to the OpenAI API and return the parsed response.
+
+        Args:
+            request (dict): The request body to send to the OpenAI API.
+
+        Returns:
+            dict: A dictionary containing the model output, response time, and optional usage information.
+        """
         start_time = time.time()
         completion = self.client.chat.completions.create(
             model=self.model_name,
@@ -599,14 +942,36 @@ def get_response(self, request):
                 response_dict.update({"n_output_tokens": usage["completion_tokens"]})
         return response_dict
 
+    def base64encode(self, images):
+        """Encode images in Base64 format.
+
+        Args:
+            images (List[str]): A list of image file paths.
+
+        Returns:
+            List[str]: A list of Base64-encoded strings corresponding to the images.
+        """
+        import base64
+
+        encoded_list = []
+        for img_path in images:
+            with open(img_path, "rb") as image_file:
+                encoded_list.append(base64.b64encode(image_file.read()).decode("utf-8"))
+        return encoded_list
+
 
 class AzureOpenAIClientMixIn:
-    """This mixin provides some methods to interact with Azure OpenAI models."""
+    """Provide Azure OpenAI-specific client methods and error handling."""
 
     def get_client(self):
+        """Retrieve the Azure OpenAI client.
+
+        Returns:
+            AzureOpenAI: The Azure OpenAI client instance.
+        """
         from openai import AzureOpenAI
 
-        token_provider = get_bearer_token_provider(DefaultAzureCredential(), self.auth_scope)
+        token_provider = get_bearer_token_provider(AzureCliCredential(), self.auth_scope)
         return AzureOpenAI(
             azure_endpoint=self.url,
             api_version=self.api_version,
@@ -614,6 +979,18 @@ def get_client(self):
         )
 
     def handle_request_error(self, e):
+        """Handle an error that occurs while making a request to the Azure OpenAI service.
+
+        If the error is due to content filtering, this method logs a warning and
+        returns (None, False, True). Otherwise, it logs the exception and returns False.
+
+        Args:
+            e (Exception): The exception raised during the request.
+
+        Returns:
+            Union[Tuple[None, bool, bool], bool]: Either a tuple indicating a content
+            filter block or False indicating the request can be retried.
+        """
         # if the error is due to a content filter, there is no need to retry
         if hasattr(e, "code") and e.code == "content_filter":
             logging.warning("Content filtered.")
@@ -625,9 +1002,14 @@ def handle_request_error(self, e):
 
 
 class DirectOpenAIClientMixIn(KeyBasedAuthMixIn):
-    """This mixin class provides some methods for using OpenAI models dirctly (not through Azure)"""
+    """Provide client retrieval and error handling for direct OpenAI usage."""
 
     def get_client(self):
+        """Retrieve the direct OpenAI client.
+
+        Returns:
+            OpenAI: The direct OpenAI client instance.
+        """
         from openai import OpenAI
 
         return OpenAI(
@@ -636,13 +1018,37 @@ def get_client(self):
         )
 
     def handle_request_error(self, e):
+        """Handle an error that occurs while making a request to the direct OpenAI service.
+
+        Args:
+            e (Exception): The exception that was raised.
+
+        Returns:
+            bool: Always returns False, indicating the request can be retried.
+        """
         logging.warning(e)
         return False
 
 
 @dataclass
 class AzureOpenAIModel(OpenAICommonRequestResponseMixIn, AzureOpenAIClientMixIn, EndpointModel):
-    """This class is used to interact with Azure OpenAI models."""
+    """Interact with Azure OpenAI models using the provided configuration.
+
+    Attributes:
+        url (str): The Azure OpenAI endpoint URL.
+        model_name (str): The name of the model.
+        temperature (float): The temperature setting for the model.
+        max_tokens (int): The maximum number of tokens to generate.
+        top_p (float): The top-p sampling parameter.
+        frequency_penalty (float): The frequency penalty.
+        presence_penalty (float): The presence penalty.
+        seed (int): An optional seed for reproducibility.
+        api_version (str): The API version used for Azure OpenAI.
+        auth_scope (str): The authentication scope for Azure OpenAI.
+        chat_mode (bool): Whether the model operates in chat mode (from Model).
+        system_message (str): A system message that can be used by the model (from Model).
+        num_retries (int): The number of attempts to retry the request (from EndpointModel).
+    """
 
     url: str = None
     model_name: str = None
@@ -656,12 +1062,29 @@ class AzureOpenAIModel(OpenAICommonRequestResponseMixIn, AzureOpenAIClientMixIn,
     auth_scope: str = "https://cognitiveservices.azure.com/.default"
 
     def __post_init__(self):
+        """Initialize the AzureOpenAIModel instance by obtaining the Azure OpenAI client."""
         self.client = self.get_client()
 
 
 @dataclass
 class DirectOpenAIModel(OpenAICommonRequestResponseMixIn, DirectOpenAIClientMixIn, EndpointModel):
-    """This class is used to interact with OpenAI models dirctly (not through Azure)"""
+    """Interact directly with OpenAI models using the provided configuration.
+
+    Attributes:
+        model_name (str): The name of the model.
+        temperature (float): The temperature setting for the model.
+        max_tokens (int): The maximum number of tokens to generate.
+        top_p (float): The top-p sampling parameter.
+        frequency_penalty (float): The frequency penalty.
+        presence_penalty (float): The presence penalty.
+        seed (int): An optional seed for reproducibility.
+        api_version (str): The API version used by OpenAI.
+        base_url (str): The base URL for the OpenAI API.
+        extra_body (dict): Additional data to include in the request body.
+        chat_mode (bool): Whether the model operates in chat mode (from Model).
+        system_message (str): A system message that can be used by the model (from Model).
+        num_retries (int): The number of attempts to retry the request (from EndpointModel).
+    """
 
     model_name: str = None
     temperature: float = 0
@@ -675,19 +1098,31 @@ class DirectOpenAIModel(OpenAICommonRequestResponseMixIn, DirectOpenAIClientMixI
     extra_body: dict = None
 
     def __post_init__(self):
+        """Initialize the DirectOpenAIModel instance by obtaining the API key and direct OpenAI client."""
         self.api_key = self.get_api_key()
         self.client = self.get_client()
 
 
 class OpenAIOModelsRequestResponseMixIn:
+    """Define request creation and response handling for OpenAI O1 models."""
+
     def create_request(self, text_prompt, query_images=None, system_message=None, previous_messages=None):
+        """Create a request dictionary for use with OpenAI O1 chat models.
+
+        Args:
+            text_prompt (str): The text prompt to send to the model.
+            query_images (Optional[List[str]], optional): A list of images to encode and include, if supported.
+            system_message (Optional[str], optional): A system or developer message to pass to the model.
+            previous_messages (Optional[List[Dict]], optional): A list of previous conversation messages.
+
+        Returns:
+            dict: The request body dictionary to pass to the OpenAI API.
+        """
         messages = []
         if system_message and "o1-preview" in self.model_name:
             logging.warning("System and developer messages are not supported by OpenAI O1 preview model.")
         elif system_message:
-            # Developer messages are the new system messages:
-            # Starting with o1-2024-12-17, o1 models support developer messages rather than system messages,
-            # to align with the chain of command behavior described in the model spec.
+            # Developer messages are the new system messages
             messages.append({"role": "developer", "content": system_message})
         if previous_messages:
             messages.extend(previous_messages)
@@ -715,6 +1150,17 @@ def create_request(self, text_prompt, query_images=None, system_message=None, pr
         return request_body
 
     def get_response(self, request):
+        """Send the request to the OpenAI O1 model and return the parsed response.
+
+        Depending on whether the model is an O1 preview or not, it may or may not support
+        certain parameters such as developer/system messages or reasoning effort.
+
+        Args:
+            request (dict): The request body for the chat completion.
+
+        Returns:
+            dict: A dictionary containing the model output, response time, and optional usage details.
+        """
         start_time = time.time()
         if "o1-preview" in self.model_name:
             if self.reasoning_effort == "high":
@@ -751,14 +1197,46 @@ def get_response(self, request):
             response_dict.update({"usage": openai_response["usage"]})
         return response_dict
 
+    def base64encode(self, images):
+        """Encode images in Base64 format.
+
+        Args:
+            images (List[str]): A list of image file paths.
+
+        Returns:
+            List[str]: A list of Base64-encoded strings corresponding to the images.
+        """
+        import base64
+
+        encoded_list = []
+        for img_path in images:
+            with open(img_path, "rb") as image_file:
+                encoded_list.append(base64.b64encode(image_file.read()).decode("utf-8"))
+        return encoded_list
+
 
 @dataclass
 class DirectOpenAIOModel(OpenAIOModelsRequestResponseMixIn, DirectOpenAIClientMixIn, EndpointModel):
+    """Interact directly with OpenAI O1 models using the provided configuration.
+
+    Attributes:
+        model_name (str): The name of the model.
+        temperature (float): The temperature setting for generation.
+        max_completion_tokens (int): Not currently used due to API constraints.
+        top_p (float): The top-p sampling parameter.
+        seed (int): An optional seed for reproducibility.
+        frequency_penalty (float): The frequency penalty.
+        presence_penalty (float): The presence penalty.
+        reasoning_effort (str): The level of reasoning effort requested.
+        base_url (str): The base URL for the OpenAI API.
+        extra_body (dict): Additional data to include in the request.
+        chat_mode (bool): Whether the model operates in chat mode (from Model).
+        system_message (str): A system message that can be used by the model (from Model).
+        num_retries (int): The number of attempts to retry the request (from EndpointModel).
+    """
+
     model_name: str = None
     temperature: float = 1
-    # Not used currently, because the API throws:
-    # "Completions.create() got an unexpected keyword argument 'max_completion_tokens'"
-    # although this argument is documented in the OpenAI API documentation.
     max_completion_tokens: int = 2000
     top_p: float = 1
     seed: int = 0
@@ -769,19 +1247,36 @@ class DirectOpenAIOModel(OpenAIOModelsRequestResponseMixIn, DirectOpenAIClientMi
     extra_body: dict = None
 
     def __post_init__(self):
+        """Initialize the DirectOpenAIOModel instance by obtaining the API key and direct OpenAI client."""
         self.api_key = self.get_api_key()
         self.client = self.get_client()
 
 
 @dataclass
 class AzureOpenAIOModel(OpenAIOModelsRequestResponseMixIn, AzureOpenAIClientMixIn, EndpointModel):
+    """Interact with Azure OpenAI O1 models using the provided configuration.
+
+    Attributes:
+        url (str): The Azure endpoint for O1 models.
+        model_name (str): The name of the O1 model.
+        temperature (float): The temperature setting for generation.
+        max_completion_tokens (int): Not currently used due to API constraints.
+        top_p (float): The top-p sampling parameter.
+        seed (int): An optional seed for reproducibility.
+        frequency_penalty (float): The frequency penalty.
+        presence_penalty (float): The presence penalty.
+        reasoning_effort (str): The level of reasoning effort requested.
+        api_version (str): The API version for Azure capabilities.
+        auth_scope (str): The scope for Azure authentication.
+        chat_mode (bool): Whether the model operates in chat mode (from Model).
+        system_message (str): A system message that can be used by the model (from Model).
+        num_retries (int): The number of attempts to retry the request (from EndpointModel).
+    """
+
     url: str = None
     model_name: str = None
     temperature: float = 1
-    # Not used currently, because the API throws:
-    # "Completions.create() got an unexpected keyword argument 'max_completion_tokens'"
-    # although this argument is documented in the OpenAI API documentation.
-    max_completion_tokens: int = 2000
+    max_completion_tokens: int = 64000
     top_p: float = 1
     seed: int = 0
     frequency_penalty: float = 0
@@ -791,12 +1286,24 @@ class AzureOpenAIOModel(OpenAIOModelsRequestResponseMixIn, AzureOpenAIClientMixI
     auth_scope: str = "https://cognitiveservices.azure.com/.default"
 
     def __post_init__(self):
+        """Initialize the AzureOpenAIOModel instance by obtaining the Azure OpenAI client."""
         self.client = self.get_client()
 
 
 @dataclass
 class GeminiModel(EndpointModel, KeyBasedAuthMixIn):
-    """This class is used to interact with Gemini models through the python api."""
+    """Interact with Gemini models through the Python API.
+
+    Attributes:
+        timeout (int): The API request timeout in seconds.
+        model_name (str): The name of the Gemini model.
+        temperature (float): The temperature setting for generation.
+        max_tokens (int): The maximum number of tokens to generate.
+        top_p (float): The top-p sampling parameter.
+        chat_mode (bool): Chat mode is not implemented yet for Gemini models. (from Model).
+        system_message (str): A system message that can be used by the model (from Model).
+        num_retries (int): The number of attempts to retry the request (from EndpointModel).
+    """
 
     timeout: int = 600
     model_name: str = None
@@ -805,7 +1312,15 @@ class GeminiModel(EndpointModel, KeyBasedAuthMixIn):
     top_p: float = 0.95
 
     def __post_init__(self):
+        """Initialize the GeminiModel by configuring the generative AI client with the provided API key
+        and safety settings. Also set up the generation config.
+        """
         super().__post_init__()
+        if self.chat_mode:
+            logging.error(
+                "Chat mode is not implemented yet for Gemini models. Set chat_mode=False in your model config."
+            )
+
         import google.generativeai as genai
         from google.generativeai.types import HarmBlockThreshold, HarmCategory
 
@@ -822,6 +1337,17 @@ def __post_init__(self):
         )
 
     def create_request(self, text_prompt, query_images=None, system_message=None, previous_messages=None):
+        """Create a request for generating content with Gemini models.
+
+        Args:
+            text_prompt (str): The text prompt to send to the model.
+            query_images (Optional[List[str]], optional): Image data to pass to the model.
+            system_message (Optional[str], optional): An optional system instruction to pass to the model.
+            previous_messages (Optional[List[Dict]], optional): A list of previous conversation messages (unused).
+
+        Returns:
+            Union[str, List[str]]: The prompt alone, or a list combining the prompt and images.
+        """
         import google.generativeai as genai
 
         if self.model_name == "gemini-1.0-pro":
@@ -837,6 +1363,14 @@ def create_request(self, text_prompt, query_images=None, system_message=None, pr
             return text_prompt
 
     def get_response(self, request):
+        """Send the request to the Gemini model and return the parsed response.
+
+        Args:
+            request (Union[str, List[str]]): The text prompt or a combined prompt and images.
+
+        Returns:
+            dict: A dictionary containing the model output, response time, and usage metadata if available.
+        """
         start_time = time.time()
         gemini_response = None
         try:
@@ -872,28 +1406,27 @@ def get_response(self, request):
         return response_dict
 
     def handle_gemini_error(self, e, gemini_response):
-        """Handles exceptions originating from making requests to Gemini through the python api.
+        """Handle exceptions originating from making requests to the Gemini API.
 
-        args:
-            e: Exception that occurred during getting a response.
-            gemini_response: The response object from the gemini model.
-        returns:
-            _type_: do_return (True if the call should not be attempted again).
+        If the model explicitly or implicitly blocks the prompt, logs a warning indicating
+        the block reason and re-raises the exception.
+
+        Args:
+            e (Exception): The exception that occurred.
+            gemini_response (Any): The response object from the Gemini model.
+
+        Raises:
+            Exception: Re-raises the provided exception after handling.
         """
         # Handling cases where the model explicitly blocks prompts and provides a reason for it.
-        # In these cases, there is no need to make a new attempt as the model will continue to explicitly block the request, do_return = True.
         if e.__class__.__name__ == "ValueError" and gemini_response.prompt_feedback.block_reason > 0:
             logging.warning(
                 f"Attempt failed due to explicitly blocked input prompt: {e} Block Reason {gemini_response.prompt_feedback.block_reason}"
             )
 
-        # Handling cases where the model implicitly blocks prompts and does not provide an explicit block reason for it but rather an empty content.
-        # In these cases, there is no need to make a new attempt as the model will continue to implicitly block the request, do_return = True.
-        # Note that, in some cases, the model may still provide a finish reason as shown here https://ai.google.dev/api/generate-content?authuser=2#FinishReason
+        # Handling cases where the model implicitly blocks prompts and does not provide an explicit block reason.
         elif e.__class__.__name__ == "IndexError" and len(gemini_response.parts) == 0:
             logging.warning(f"Attempt failed due to implicitly blocked input prompt and empty model output: {e}")
-            # For cases where there are some response candidates do_return is still True because in most cases these candidates are incomplete.
-            # Trying again may not necessarily help, unless in high temperature regimes.
             if len(gemini_response.candidates) > 0:
                 logging.warning(f"The response is not empty and has : {len(gemini_response.candidates)} candidates")
                 logging.warning(
@@ -906,13 +1439,36 @@ def handle_gemini_error(self, e, gemini_response):
         raise e
 
     def handle_request_error(self, e):
-        # Any error case not handled in handle_gemini_error will be attempted again, do_return = False.
+        """Handle an error that occurs while making a request to the Gemini model.
+
+        Any error case not handled in handle_gemini_error will be attempted again.
+
+        Args:
+            e (Exception): The exception that was raised.
+
+        Returns:
+            bool: Always returns False, indicating the request can be retried.
+        """
         return False
 
 
 @dataclass
 class TogetherModel(OpenAICommonRequestResponseMixIn, KeyBasedAuthMixIn, EndpointModel):
-    """This class is used to interact with Together models through the together python api."""
+    """Interact with Together models through the together Python API.
+
+    Attributes:
+        timeout (int): The API request timeout in seconds.
+        model_name (str): The name of the Together model.
+        temperature (float): The temperature setting for generation.
+        max_tokens (int): The maximum number of tokens to generate.
+        top_p (float): The top-p sampling parameter.
+        presence_penalty (float): The presence penalty.
+        stop (List[str]): A list of stop tokens for generation.
+        chat_mode (bool): Whether the model operates in chat mode (from Model).
+        system_message (str): A system message that can be used by the model (from Model).
+        num_retries (int): The number of attempts to retry the request (from EndpointModel).
+
+    """
 
     timeout: int = 600
     model_name: str = None
@@ -923,12 +1479,21 @@ class TogetherModel(OpenAICommonRequestResponseMixIn, KeyBasedAuthMixIn, Endpoin
     stop = ["<｜end▁of▁sentence｜>"]
 
     def __post_init__(self):
+        """Initialize the TogetherModel by setting up the Together client with the provided API key."""
         from together import Together
 
         self.api_key = self.get_api_key()
         self.client = Together(api_key=self.api_key)
 
     def get_response(self, request):
+        """Send the request to the Together model and return the parsed response.
+
+        Args:
+            request (dict): The request body for the Together chat completion.
+
+        Returns:
+            dict: A dictionary containing the model output, response time, and optional usage details.
+        """
         start_time = time.time()
         completion = self.client.chat.completions.create(
             model=self.model_name,
@@ -953,12 +1518,40 @@ def get_response(self, request):
         return response_dict
 
     def handle_request_error(self, e):
+        """Handle an error that occurs while making a request to the Together service.
+
+        Args:
+            e (Exception): The exception that was raised.
+
+        Returns:
+            bool: Always returns False, indicating the request can be retried.
+        """
         return False
 
 
 @dataclass
 class HuggingFaceModel(Model):
-    """This class is used to run a self-hosted language model via HuggingFace apis."""
+    """
+    Runs a self-hosted language model via HuggingFace APIs.
+
+    This class handles loading and running a HuggingFace language model locally
+    with optional quantization and flash attention usage.
+
+    Attributes:
+        model_name (str): The name of the HuggingFace model to use.
+        device (str): The device to use for inference (e.g., 'cpu' or 'cuda').
+        max_tokens (int): The maximum number of tokens to generate.
+        temperature (float): The temperature for sampling-based generation.
+        top_p (float): The top-p (nucleus) sampling parameter.
+        do_sample (bool): Whether to sample or not. Setting to False uses greedy
+            decoding.
+        apply_model_template (bool): If True, applies a template to the prompt
+            before generating.
+        quantize (bool): Whether to quantize the model for memory savings.
+        use_flash_attn (bool): Whether to use flash attention 2 if supported.
+        chat_mode (bool): Not used. (from Model).
+        system_message (str): Not used. (from Model).
+    """
 
     model_name: str = None
     device: str = "cpu"
@@ -972,11 +1565,24 @@ class HuggingFaceModel(Model):
     use_flash_attn: bool = False
 
     def __post_init__(self):
-        # The device need to be set before get_model() is called
+        """
+        Initializes model-related attributes.
+
+        Sets the device by picking an available GPU or falling back to CPU
+        before loading the model.
+        """
+        # The device needs to be set before get_model() is called
         self.device = self.pick_available_device()
         self.get_model()
 
     def get_model(self):
+        """
+        Loads the HuggingFace model and tokenizer.
+
+        If quantization is enabled, applies 4-bit quantization. Otherwise, loads
+        the model with standard precision. The tokenizer is also loaded using
+        the same model name.
+        """
         import torch
         from transformers import AutoModelForCausalLM, AutoTokenizer
 
@@ -985,7 +1591,6 @@ def get_model(self):
             from transformers import BitsAndBytesConfig
 
             logging.info("Quantizing model")
-            # specify how to quantize the model
             quantization_config = BitsAndBytesConfig(
                 load_in_4bit=True,
                 bnb_4bit_compute_dtype=torch.float16,
@@ -1003,8 +1608,10 @@ def get_model(self):
 
     def pick_available_device(self):
         """
-        This method will enumerate all GPU devices and return the one with the lowest utilization.
-        This is useful in running locally hosted HuggingFace models on multi-gpu machines.
+        Selects the device with the lowest GPU utilization, or defaults to CPU.
+
+        Returns:
+            str: The name of the chosen device (e.g., 'cuda:0' or 'cpu').
         """
         import numpy as np
         import torch
@@ -1026,6 +1633,19 @@ def pick_available_device(self):
         return device
 
     def _generate(self, text_prompt, query_images=None):
+        """
+        Generates text given a prompt and optional images.
+
+        Args:
+            text_prompt (str): The text prompt for the model to process.
+            query_images (list, optional): A list of images (if supported by the model).
+
+        Returns:
+            dict: A dictionary with the following keys:
+                model_output (str): The generated text response.
+                response_time (float): The time taken for the generation.
+        """
+        import time
 
         inputs = self.tokenizer(text_prompt, return_tensors="pt").to(self.device)
         start_time = time.time()
@@ -1050,6 +1670,22 @@ def _generate(self, text_prompt, query_images=None):
         }
 
     def generate(self, text_prompt, query_images=None, system_message=None):
+        """
+        Generates a text response, optionally applying a model-specific template.
+
+        Args:
+            text_prompt (str): The text prompt for generation.
+            query_images (list, optional): A list of images (if supported by the model).
+            system_message (str, optional): A system message to be prepended or otherwise
+                integrated into the template.
+
+        Returns:
+            dict: A dictionary containing the following keys:
+                model_output (str): The generated text response.
+                response_time (float): The time taken for generation.
+                is_valid (bool): Whether the generation was successful.
+                n_output_tokens (int): The number of tokens in the generated output.
+        """
         response_dict = {}
 
         if text_prompt:
@@ -1075,39 +1711,89 @@ def generate(self, text_prompt, query_images=None, system_message=None):
         return response_dict
 
     def model_template_fn(self, text_prompt, system_message=None):
+        """
+        Applies a basic template to the prompt.
+
+        If a system message is provided, it is prepended to the text prompt.
+
+        Args:
+            text_prompt (str): The original text prompt.
+            system_message (str, optional): A system message to prepend.
+
+        Returns:
+            str: The prompt with the optional system message attached.
+        """
         return system_message + " " + text_prompt if system_message else text_prompt
 
 
 @dataclass
 class Phi3HFModel(HuggingFaceModel):
-    """This class is used to run a self-hosted PHI3 model via HuggingFace apis."""
+    """
+    Runs a self-hosted PHI3 model via HuggingFace APIs.
+
+    This class extends HuggingFaceModel, applying a PHI3-specific prompt template.
+    """
 
     def __post_init__(self):
+        """
+        Initializes and checks if the model name is a PHI3 model.
+
+        Warns if the provided model name is not a PHI3 model.
+        """
         super().__post_init__()
         if "microsoft/Phi-3" not in self.model_name:
             logging.warning(
                 "This model class applies a template to the prompt that is specific to Phi-3 models"
-                "but your model is not a Phi-3 model."
+                " but your model is not a Phi-3 model."
             )
 
     def model_template_fn(self, text_prompt, system_message=None):
+        """
+        Applies the PHI3-specific template to the prompt.
+
+        Args:
+            text_prompt (str): The text prompt for generation.
+            system_message (str, optional): A system message to prepend.
+
+        Returns:
+            str: The prompt with the PHI3-specific template.
+        """
         text_prompt = super().model_template_fn(text_prompt, system_message)
         return f"<|user|>\n{text_prompt}<|end|>\n<|assistant|>"
 
 
 @dataclass
 class Phi4HFModel(HuggingFaceModel):
-    """This class is used to run a self-hosted PHI3 model via HuggingFace apis."""
+    """
+    Runs a self-hosted PHI4 model via HuggingFace APIs.
+
+    This class extends HuggingFaceModel, applying a PHI4-specific prompt template.
+    """
 
     def __post_init__(self):
+        """
+        Initializes and checks if the model name is a PHI4 model.
+
+        Warns if the provided model name is not a PHI4 model.
+        """
         super().__post_init__()
         if "microsoft/phi-4" not in self.model_name:
             logging.warning(
                 "This model class applies a template to the prompt that is specific to Phi-4 models"
-                "but your model is not a Phi-4 model."
+                " but your model is not a Phi-4 model."
             )
 
     def model_template_fn(self, text_prompt, system_message=None):
+        """
+        Applies the PHI4-specific template to the prompt.
+
+        Args:
+            text_prompt (str): The text prompt for generation.
+            system_message (str, optional): A system message to prepend.
+
+        Returns:
+            str: The prompt with the PHI4-specific template.
+        """
         if system_message:
             return f"<|im_start|>system<|im_sep|>\n{system_message}<|im_start|>user<|im_sep|>\n{text_prompt}<|im_end|>\n<|im_start|>assistant<|im_sep|>"
         else:
@@ -1116,17 +1802,34 @@ def model_template_fn(self, text_prompt, system_message=None):
 
 @dataclass
 class LLaVAHuggingFaceModel(HuggingFaceModel):
-    """This class is used to run a self-hosted LLaVA model via HuggingFace apis."""
+    """
+    Runs a self-hosted LLaVA model via HuggingFace APIs.
+
+    This class extends HuggingFaceModel, applying an image-based prompt template
+    for LLaVA.
+    """
 
     def __post_init__(self):
+        """
+        Initializes and checks if the model name is a LLaVA model.
+
+        Warns if the provided model name is not a LLaVA model.
+        """
         super().__post_init__()
         if "llava" not in self.model_name:
             logging.warning(
                 "This model class applies a template to the prompt that is specific to LLAVA models"
-                "but your model is not a LLAVA model."
+                " but your model is not a LLAVA model."
             )
 
     def get_model(self):
+        """
+        Loads the LLaVA model and processor.
+
+        If quantization is enabled, applies 4-bit quantization. Otherwise, loads
+        the model with standard precision. The appropriate LLaVA variant is
+        chosen depending on the model name (v1.6, etc.).
+        """
         import torch
         from transformers import (
             AutoProcessor,
@@ -1140,7 +1843,6 @@ def get_model(self):
             from transformers import BitsAndBytesConfig
 
             logging.info("Quantizing model")
-            # specify how to quantize the model
             quantization_config = BitsAndBytesConfig(
                 load_in_4bit=True,
                 bnb_4bit_compute_dtype=torch.float16,
@@ -1163,10 +1865,24 @@ def get_model(self):
                 device_map=self.device,
                 use_flash_attention_2=self.use_flash_attn,
             )
-
             self.processor = AutoProcessor.from_pretrained(self.model_name)
 
     def _generate(self, text_prompt, query_images=None):
+        """
+        Generates a response given a text prompt and optional images.
+
+        Args:
+            text_prompt (str): The text prompt for generation.
+            query_images (list, optional): A list containing a single image to
+                use as context.
+
+        Returns:
+            dict: A dictionary containing the following keys:
+                model_output (str): The generated text response.
+                response_time (float): The time taken for the generation.
+        """
+        import time
+
         inputs = self.processor(text=text_prompt, images=query_images, return_tensors="pt").to(self.device)
         start_time = time.time()
         output_ids = self.model.generate(
@@ -1190,7 +1906,20 @@ def _generate(self, text_prompt, query_images=None):
         }
 
     def generate(self, text_prompt, query_images=None, system_message=None):
-
+        """
+        Generates a response using a LLaVA model, with optional images.
+
+        Args:
+            text_prompt (str): The text prompt for generation.
+            query_images (list, optional): A list containing a single image to
+                use.
+            system_message (str, optional): An additional system message or
+                instruction.
+
+        Returns:
+            dict: A dictionary containing the generation results. If multiple
+                images are provided, returns an error message in the dictionary.
+        """
         if query_images and len(query_images) > 1:
             logging.error(f"Not implemented for more than 1 image. {len(query_images)} images are in the prompt")
             return {"model_output": None, "is_valid": False, "response_time": None, "n_output_tokens": None}
@@ -1198,6 +1927,19 @@ def generate(self, text_prompt, query_images=None, system_message=None):
         return super().generate(text_prompt, query_images=query_images, system_message=system_message)
 
     def model_template_fn(self, text_prompt, system_message=None):
+        """
+        Applies an image-based template to the text prompt for LLaVA models.
+
+        The exact template depends on the LLaVA model variant (v1.6, v1.6-mistral,
+        etc.).
+
+        Args:
+            text_prompt (str): The text prompt to transform.
+            system_message (str, optional): An additional system message.
+
+        Returns:
+            str: The transformed text prompt with an image token included.
+        """
         text_prompt = f"<image>\n{text_prompt}"
 
         if "v1.6-mistral" in self.model_name:
@@ -1206,7 +1948,11 @@ def model_template_fn(self, text_prompt, system_message=None):
             if system_message:
                 text_prompt = f"{system_message} USER: {text_prompt} ASSISTANT:"
             else:
-                text_prompt = f"A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: {text_prompt} ASSISTANT:"
+                text_prompt = (
+                    f"A chat between a curious human and an artificial intelligence assistant. "
+                    f"The assistant gives helpful, detailed, and polite answers to the human's questions. "
+                    f"USER: {text_prompt} ASSISTANT:"
+                )
         elif "v1.6-34b" in self.model_name:
             if system_message:
                 text_prompt = f"<|im_start|>system\n{system_message}<|im_end|><|im_start|>user\n{text_prompt}<|im_end|><|im_start|>assistant\n"
@@ -1223,15 +1969,30 @@ def model_template_fn(self, text_prompt, system_message=None):
 
 @dataclass
 class LLaVAModel(LLaVAHuggingFaceModel):
-    """This class is used to run a self-hosted LLaVA model via the LLaVA package."""
+    """
+    Runs a self-hosted LLaVA model using the LLaVA package.
+
+    This class extends LLaVAHuggingFaceModel to handle model loading and
+    inference through the dedicated LLaVA package utilities.
+
+    """
 
     model_base: str = None
     num_beams: int = 1
 
     def __post_init__(self):
+        """
+        Initializes the LLaVA model after the dataclass has been populated.
+        """
         super().__post_init__()
 
     def get_model(self):
+        """
+        Loads the LLaVA model and tokenizer using the LLaVA package.
+
+        This method overrides the base HuggingFace loading routine and uses
+        dedicated LLaVA utilities for model setup.
+        """
         from llava.mm_utils import get_model_name_from_path
         from llava.model.builder import load_pretrained_model
         from llava.utils import disable_torch_init
@@ -1257,6 +2018,20 @@ def get_model(self):
         self.tokenizer = tokenizer
 
     def _generate(self, text_prompt, query_images=None, system_message=None):
+        """
+        Generates a response using the LLaVA package.
+
+        Args:
+            text_prompt (str): The text prompt to process.
+            query_images (list, optional): A list of images for multimodal generation.
+            system_message (str, optional): An additional system message for context.
+
+        Returns:
+            dict: A dictionary containing the following keys:
+                model_output (str): The generated text response.
+                response_time (float): The time taken for generation.
+        """
+        import time
 
         import torch
         from llava.constants import IMAGE_TOKEN_INDEX
@@ -1298,9 +2073,27 @@ def _generate(self, text_prompt, query_images=None, system_message=None):
 
 @dataclass
 class VLLMModel(Model):
-    """This class is used to run a self-hosted language model via vLLM apis.
-    This class uses the chat() functionality of vLLM which applies a template included in the HF model files.
-    If the model files do not include a template, no template will be applied.
+    """Runs a self-hosted language model via vLLM APIs.
+
+    Uses the chat() functionality of vLLM, which applies a template included
+    in the HF model files. If the model files do not include a template, no
+    template will be applied.
+
+    Attributes:
+        model_name (str): Name or path of the model.
+        trust_remote_code (bool): Whether to trust custom code from the remote model.
+        tensor_parallel_size (int): Number of tensor parallel instances.
+        dtype (str): Data type used by the model.
+        quantization (str): Quantization method used by the model.
+        seed (int): Random seed.
+        gpu_memory_utilization (float): Fraction of GPU memory to use.
+        cpu_offload_gb (float): Amount of CPU offloading in GB.
+        temperature (float): Sampling temperature.
+        top_p (float): Nucleus sampling probability threshold.
+        top_k (int): Top-k sampling cutoff.
+        max_tokens (int): Maximum number of tokens in the generated response.
+        chat_mode (bool): Not used. (from Model).
+        system_message (str): Not used. (from Model).
     """
 
     model_name: str = None
@@ -1318,10 +2111,20 @@ class VLLMModel(Model):
     max_tokens: int = 2000
 
     def __post_init__(self):
+        """Initialize the vLLM model after dataclass fields are set.
+
+        Returns:
+            None
+        """
         # vLLM automatically picks an available devices when get_model() is called
         self.get_model()
 
     def get_model(self):
+        """Initialize and store the LLM instance from vLLM.
+
+        Returns:
+            None
+        """
         from vllm import LLM
 
         self.model = LLM(
@@ -1336,6 +2139,15 @@ def get_model(self):
         )
 
     def _generate(self, text_prompt, query_images=None):
+        """Generate a response from the model using the vLLM chat interface.
+
+        Args:
+            text_prompt (str): The prompt for the model.
+            query_images (Any, optional): Additional images to pass to the model. Defaults to None.
+
+        Returns:
+            dict: A dictionary containing "model_output" and "response_time".
+        """
         from vllm import SamplingParams
 
         sampling_params = SamplingParams(
@@ -1357,6 +2169,17 @@ def _generate(self, text_prompt, query_images=None):
         }
 
     def generate(self, text_prompt, query_images=None, system_message=None):
+        """Generate a response from the model.
+
+        Args:
+            text_prompt (str): The prompt for the model.
+            query_images (Any, optional): Additional images to pass to the model. Defaults to None.
+            system_message (str, optional): A system message to prepend. Defaults to None.
+
+        Returns:
+            dict: A dictionary with the generated response. The dictionary includes
+                "model_output", "is_valid", "response_time", and "n_output_tokens".
+        """
         response_dict = {}
         model_output = None
         response_time = None
@@ -1387,6 +2210,15 @@ def generate(self, text_prompt, query_images=None, system_message=None):
         return response_dict
 
     def create_request(self, text_prompt, system_message=None):
+        """Create a list of messages suitable for the vLLM chat interface.
+
+        Args:
+            text_prompt (str): The user's prompt.
+            system_message (str, optional): A system message to prepend.
+
+        Returns:
+            list: A list of dictionaries representing messages.
+        """
         if system_message:
             return [
                 {"role": "system", "content": system_message},
@@ -1397,7 +2229,11 @@ def create_request(self, text_prompt, system_message=None):
 
 
 class _LocalVLLMDeploymentHandler:
-    """This class is used to handle the deployment of vLLM servers."""
+    """Handle the deployment of vLLM servers.
+
+    Attributes:
+        logs (list): References to logs of the servers, which contain PIDs for shutdown.
+    """
 
     # Chose against dataclass here so we have the option to accept kwargs
     # and pass them to the vLLM deployment script.
@@ -1419,6 +2255,25 @@ def __init__(
         cpu_offload_gb: float = 0,
         ports: list = None,
     ):
+        """Initialize the local vLLM deployment handler.
+
+        Args:
+            model_name (str): Name or path of the model to deploy.
+            num_servers (int): Number of servers to deploy.
+            trust_remote_code (bool): Whether to trust remote code.
+            tensor_parallel_size (int): Number of tensor parallel instances.
+            pipeline_parallel_size (int): Number of pipeline parallel instances.
+            dtype (str): Data type used by the model.
+            quantization (str): Quantization method used.
+            seed (int): Random seed.
+            gpu_memory_utilization (float): Fraction of GPU memory to use.
+            cpu_offload_gb (float): Amount of CPU offloading in GB.
+            ports (list): List of ports at which servers will be hosted.
+
+        Raises:
+            ValueError: If model_name is not specified.
+            RuntimeError: If not all servers can be started.
+        """
         if not model_name:
             raise ValueError("LocalVLLM model_name must be specified.")
         self.model_name = model_name
@@ -1439,7 +2294,13 @@ def __init__(
             raise RuntimeError(f"Failed to start all servers.")
 
     def _get_clients(self):
-        """Get clients to access vllm servers, by checking for running servers and deploying if necessary."""
+        """Get clients to access vLLM servers.
+
+        Checks for running servers and deploys them if necessary.
+
+        Returns:
+            list: A list of OpenAIClient objects for each healthy server.
+        """
         from openai import OpenAI as OpenAIClient
 
         # If the user passes ports, check if the servers are running and populate clients accordingly.
@@ -1473,8 +2334,11 @@ def _get_clients(self):
         raise RuntimeError(f"Failed to start all servers.")
 
     def get_healthy_ports(self) -> list[str]:
-        """Check if servers are running."""
+        """Check if vLLM servers are running.
 
+        Returns:
+            list[str]: A list of ports that are healthy and running.
+        """
         healthy_ports = []
         for port in self.ports:
             try:
@@ -1485,8 +2349,11 @@ def get_healthy_ports(self) -> list[str]:
         return healthy_ports
 
     def deploy_servers(self):
-        """Deploy vLLM servers in background threads using the specified parameters."""
+        """Deploy vLLM servers in background threads using the specified parameters.
 
+        Returns:
+            None
+        """
         logging.info(f"No vLLM servers are running. Starting {self.num_servers} new servers at {self.ports}.")
         import datetime
         import os
@@ -1505,8 +2372,18 @@ def deploy_servers(self):
             background_thread.start()
 
     def deploy_server(self, index: int, gpus_per_port: int, log_file: str):
-        """Deploy a single vLLM server using gpus_per_port many gpus starting at index*gpus_per_port."""
+        """Deploy a single vLLM server.
+
+        Uses gpus_per_port GPUs starting at index * gpus_per_port.
 
+        Args:
+            index (int): Index of the server to deploy.
+            gpus_per_port (int): Number of GPUs to allocate per server.
+            log_file (str): File path to store the server logs.
+
+        Returns:
+            None
+        """
         import subprocess
 
         port = 8000 + index
@@ -1545,8 +2422,11 @@ def deploy_server(self, index: int, gpus_per_port: int, log_file: str):
 
     @classmethod
     def shutdown_servers(cls):
-        """Shutdown all vLLM servers deployed during this run."""
+        """Shut down all vLLM servers deployed during this run.
 
+        Returns:
+            None
+        """
         import os
         import re
         import signal
@@ -1569,12 +2449,33 @@ def shutdown_servers(cls):
 
 @dataclass
 class LocalVLLMModel(OpenAICommonRequestResponseMixIn, EndpointModel):
-    """This class is used for vLLM servers running locally.
-
-    In case the servers are already deployed, specify the
-    model_name and the ports at which the servers are hosted.
-    Otherwise instantiating will initiate a deployment with
-    any deployment parameters specified."""
+    """Represent a local vLLM server deployment.
+
+    In case the servers are already deployed, specify the model_name and the
+    ports at which the servers are hosted. Otherwise, instantiating this class
+    will initiate a server deployment using any specified deployment parameters.
+
+    Attributes:
+        model_name (str): Name or path of the model.
+        num_servers (int): Number of servers to deploy.
+        trust_remote_code (bool): Whether to trust remote code.
+        tensor_parallel_size (int): Number of tensor parallel instances.
+        pipeline_parallel_size (int): Number of pipeline parallel instances.
+        dtype (str): Data type used by the model.
+        quantization (str): Quantization method used by the model.
+        seed (int): Random seed.
+        gpu_memory_utilization (float): Fraction of GPU memory to use.
+        cpu_offload_gb (float): Amount of CPU offloading in GB.
+        ports (list): Ports at which servers are hosted or will be hosted.
+        handler (_LocalVLLMDeploymentHandler): Deployment handler instance.
+        temperature (float): Sampling temperature.
+        top_p (float): Nucleus sampling probability threshold.
+        top_k (int): Top-k sampling cutoff.
+        max_tokens (int): Maximum number of tokens in the generated response.
+        frequency_penalty (float): Frequency penalty.
+        presence_penalty (float): Presence penalty.
+        num_retries (int): Number of retries for failed requests (from EndpointModel).
+    """
 
     model_name: str = None
 
@@ -1602,15 +2503,30 @@ class LocalVLLMModel(OpenAICommonRequestResponseMixIn, EndpointModel):
     presence_penalty: float = 0
 
     def __post_init__(self):
+        """Initialize the local vLLM model deployment.
+
+        Raises:
+            ValueError: If model_name is not specified.
+        """
         if not self.model_name:
             raise ValueError("LocalVLLM model_name must be specified.")
         self.handler = self._get_local_vllm_deployment_handler()
 
     @property
     def client(self):
+        """Get a randomly selected client from the list of deployed servers.
+
+        Returns:
+            OpenAIClient: A client for sending requests to the vLLM server.
+        """
         return random.choice(self.handler.clients)
 
     def _get_local_vllm_deployment_handler(self):
+        """Get or create a local vLLM deployment handler for this model.
+
+        Returns:
+            _LocalVLLMDeploymentHandler: The handler responsible for server deployment.
+        """
         if self.model_name not in local_vllm_deployment_handlers:
             with local_vllm_model_lock:
                 if self.model_name not in local_vllm_deployment_handlers:
@@ -1631,12 +2547,32 @@ def _get_local_vllm_deployment_handler(self):
         return local_vllm_deployment_handlers[self.model_name]
 
     def handle_request_error(self, e):
+        """Handle any request error that occurs during inference.
+
+        Args:
+            e (Exception): The exception raised during inference.
+
+        Returns:
+            bool: Always returns False indicating the request was not successful.
+        """
         return False
 
 
 @dataclass
 class ClaudeModel(EndpointModel, KeyBasedAuthMixIn):
-    """This class is used to interact with Claude models through the python api."""
+    """Interact with Claude models through the Python API.
+
+    Attributes:
+        model_name (str): Name of the Claude model.
+        temperature (float): Sampling temperature.
+        max_tokens (int): Maximum number of tokens in the generated response.
+        top_p (float): Nucleus sampling probability threshold.
+        timeout (int): Request timeout in seconds.
+        chat_mode (bool): Whether the model operates in chat mode (from Model).
+        system_message (str): A system message that can be used by the model (from Model).
+        num_retries (int): The number of attempts to retry the request (from EndpointModel).
+
+    """
 
     model_name: str = None
     temperature: float = 0
@@ -1645,6 +2581,11 @@ class ClaudeModel(EndpointModel, KeyBasedAuthMixIn):
     timeout: int = 60
 
     def __post_init__(self):
+        """Initialize the Claude client after dataclass fields are set.
+
+        Returns:
+            None
+        """
         super().__post_init__()
         self.client = anthropic.Anthropic(
             api_key=self.api_key,
@@ -1652,6 +2593,17 @@ def __post_init__(self):
         )
 
     def create_request(self, text_prompt, query_images=None, system_message=None, previous_messages=None):
+        """Create a request payload for Claude.
+
+        Args:
+            text_prompt (str): The user's prompt.
+            query_images (Any, optional): Additional images to encode and send. Defaults to None.
+            system_message (str, optional): A system message to prepend. Defaults to None.
+            previous_messages (list, optional): A list of previous conversation messages. Defaults to None.
+
+        Returns:
+            dict: A dictionary containing 'messages' and optionally 'system'.
+        """
         messages = []
         user_content = text_prompt
         if previous_messages:
@@ -1677,6 +2629,14 @@ def create_request(self, text_prompt, query_images=None, system_message=None, pr
             return {"messages": messages}
 
     def get_response(self, request):
+        """Send a request to the Claude model and get a response.
+
+        Args:
+            request (dict): The request payload to be sent.
+
+        Returns:
+            dict: A dictionary containing "model_output", "response_time", and optionally "usage".
+        """
         start_time = time.time()
         completion = self.client.messages.create(
             model=self.model_name,
@@ -1697,12 +2657,35 @@ def get_response(self, request):
         return response_dict
 
     def handle_request_error(self, e):
+        """Handle any request error that occurs during Claude inference.
+
+        Args:
+            e (Exception): The exception raised during inference.
+
+        Returns:
+            bool: Always returns False indicating the request was not successful.
+        """
         return False
 
 
 @dataclass
 class ClaudeReasoningModel(ClaudeModel):
-    """This class is used to interact with Claude reasoning models through the python api."""
+    """Interact with Claude reasoning models through the Python API.
+
+    Allows usage of a 'thinking' parameter for advanced reasoning.
+
+    Attributes:
+        model_name (str): Name of the Claude reasoning model.
+        temperature (float): Sampling temperature.
+        max_tokens (int): Maximum number of tokens in the generated response.
+        timeout (int): Request timeout in seconds.
+        thinking_enabled (bool): Whether thinking mode is enabled.
+        thinking_budget (int): Token budget for thinking mode.
+        top_p (float): This parameter is not supported and will be ignored.
+        chat_mode (bool): Whether the model operates in chat mode (from Model).
+        system_message (str): A system message that can be used by the model (from Model).
+        num_retries (int): The number of attempts to retry the request (from EndpointModel).
+    """
 
     model_name: str = None
     temperature: float = 1.0
@@ -1713,6 +2696,15 @@ class ClaudeReasoningModel(ClaudeModel):
     top_p: float = None
 
     def get_response(self, request):
+        """Send a request to the Claude reasoning model and get a response.
+
+        Args:
+            request (dict): The request payload to be sent.
+
+        Returns:
+            dict: A dictionary containing "model_output", "response_time",
+                "thinking_output", "redacted_thinking_output", and optionally "usage".
+        """
         model_output = None
         response_time = None
         thinking_output = None
@@ -1755,9 +2747,21 @@ def get_response(self, request):
 
 @dataclass
 class TestModel(Model):
-    # This class is used for testing purposes only. It only waits for a specified time and returns a response.
+    """A model used for testing purposes only.
+
+    This model waits for a specified time and returns a predetermined response.
+    """
 
     def generate(self, text_prompt, **kwargs):
+        """Generate a test response.
+
+        Args:
+            text_prompt (str): The input prompt (unused).
+
+        Returns:
+            dict: A dictionary containing "model_output", "is_valid", "response_time",
+                and "n_output_tokens".
+        """
         output = "This is a test response."
         is_valid = True
         return {
diff --git a/eureka_ml_insights/prompt_templates/doc_str.jinja b/eureka_ml_insights/prompt_templates/doc_str.jinja
new file mode 100644
index 00000000..5dc763f8
--- /dev/null
+++ b/eureka_ml_insights/prompt_templates/doc_str.jinja
@@ -0,0 +1,9 @@
+You are given the contents of a python file between the <module_content> </module_content> tags. This module contains classes and functions. Your task is write docstrings for the module, classes and functions in Google style.
+If there already exist docstrings, you should rewrite them to be in Google style. 
+If there are no docstrings, you should write them in Google style. 
+For dataclasses, since there is no explicit init method, make sure to add attribute docstrings to the class docstring. Note that if a dataclass inherits from other dataclasses, make sure to add all attributes of the base classes as well as new attributes to the docstring. 
+Rewrite the whole file with no changes to the code. Only the docstrings should be changed.
+
+<module_content>
+{{ file_content }}
+</module_content>
diff --git a/eureka_ml_insights/secret_management/secret_key_utils.py b/eureka_ml_insights/secret_management/secret_key_utils.py
index ab4c61ce..99a64eca 100644
--- a/eureka_ml_insights/secret_management/secret_key_utils.py
+++ b/eureka_ml_insights/secret_management/secret_key_utils.py
@@ -1,3 +1,9 @@
+"""
+This module provides functions for retrieving secrets from Azure Key Vault or a local JSON file cache.
+It includes functionality to look up secrets locally, fall back to Azure if necessary,
+and optionally cache new secrets in a local JSON file.
+"""
+
 import json
 import logging
 import os
@@ -9,16 +15,22 @@
 logging.basicConfig(level=logging.INFO, format="%(filename)s - %(funcName)s - %(message)s")
 
 
-def get_secret(key_name: str, local_keys_path:Optional[str]=None, key_vault_url:Optional[str]=None) -> Optional[str]:
-    """This function retrieves a key from key vault or if it is locally cached in a file.
-    args:
-        key_name: str, the name of the key to retrieve.
-        local_keys_path: str, the path to the keys file where the key is cached or should be cached by this method.
-        key_vault_url: str, the url of the key vault to retrieve the key from if not found in the local keys file.
-    Returns:
-        key_value: str, the value of the key if found, otherwise None.
+def get_secret(
+    key_name: str, local_keys_path: Optional[str] = None, key_vault_url: Optional[str] = None
+) -> Optional[str]:
     """
+    Retrieves a key from Azure Key Vault or a local file cache.
 
+    Args:
+        key_name (str): The name of the key to retrieve.
+        local_keys_path (Optional[str], optional): The path to the keys file where the key is
+            cached or should be cached by this method.
+        key_vault_url (Optional[str], optional): The URL of the key vault to retrieve the key
+            from if not found in the local keys file.
+
+    Returns:
+        Optional[str]: The value of the key if found, otherwise None.
+    """
     keys_dict = {}
 
     # make sure one of local_keys_path or key_vault_url is provided
@@ -38,7 +50,8 @@ def get_secret(key_name: str, local_keys_path:Optional[str]=None, key_vault_url:
     if key_value is None:
         if key_vault_url is None:
             raise ValueError(
-                f"Key [{key_name}] not found in local keys file [{local_keys_path}] and key_vault_url is not provided."
+                f"Key [{key_name}] not found in local keys file [{local_keys_path}] and "
+                f"key_vault_url is not provided."
             )
         else:
             key_value = get_key_from_azure(key_name, key_vault_url)
@@ -60,12 +73,15 @@ def get_secret(key_name: str, local_keys_path:Optional[str]=None, key_vault_url:
 
 
 def get_key_from_azure(key_name: str, key_vault_url: str) -> Optional[str]:
-    """This function retrieves a key from azure key vault.
-    args:
-        key_name: str, the name of the key to retrieve.
-        key_vault_url: str, the url of the key vault to retrieve the key from.
+    """
+    Retrieves a key from Azure Key Vault.
+
+    Args:
+        key_name (str): The name of the key to retrieve.
+        key_vault_url (str): The URL of the Azure Key Vault to retrieve the key from.
+
     Returns:
-        key_value: str, the value of the key if found, otherwise None.
+        Optional[str]: The value of the key if found, otherwise None.
     """
     logging.getLogger("azure").setLevel(logging.ERROR)
     try:
@@ -90,13 +106,17 @@ def get_key_from_azure(key_name: str, key_vault_url: str) -> Optional[str]:
 
 
 def get_key_from_local_file(key_name: str, local_keys_path: str) -> tuple[Optional[str], Dict[str, str]]:
-    """This function retrieves a key from a local file.
-    args:
-        key_name: str, the name of the key to retrieve.
-        local_keys_path: str, the path to the local keys file.
+    """
+    Retrieves a key from a local file.
+
+    Args:
+        key_name (str): The name of the key to retrieve.
+        local_keys_path (str): The path to the local keys file.
+
     Returns:
-        key_value: str, the value of the key if found in the local file, otherwise None.
-        keys_dict: dict, a dictionary containing the keys cached in the local file.
+        tuple[Optional[str], Dict[str, str]]: A tuple containing:
+            - Optional[str]: The value of the key if found in the local file, otherwise None.
+            - Dict[str, str]: A dictionary containing the keys cached in the local file.
     """
     keys_dict = {}
     key_value = None
@@ -110,12 +130,15 @@ def get_key_from_local_file(key_name: str, local_keys_path: str) -> tuple[Option
 
 
 def get_cached_keys_dict(local_keys_path: str) -> Dict[str, str]:
-    """This function retrieves the keys cached in local json file.
-    args:
-        local_keys_path: str, the path to the local keys file.
+    """
+    Retrieves the cached keys from a local JSON file.
+
+    Args:
+        local_keys_path (str): The path to the local keys file.
+
     Returns:
-        keys_dict: dict, a dictionary containing the keys cached in the local file
-                   or an empty dictionary if the file does not exist or can't be decoded.
+        Dict[str, str]: A dictionary containing the keys cached in the local file,
+        or an empty dictionary if the file does not exist or cannot be decoded.
     """
     try:
         with open(local_keys_path, "r") as file:
diff --git a/eureka_ml_insights/user_configs/doc_str.py b/eureka_ml_insights/user_configs/doc_str.py
new file mode 100644
index 00000000..d0b655b2
--- /dev/null
+++ b/eureka_ml_insights/user_configs/doc_str.py
@@ -0,0 +1,139 @@
+"""
+This module defines transformations for reading and writing file content
+and an experiment pipeline configuration for generating docstrings.
+"""
+
+import os
+from dataclasses import dataclass
+from typing import Any
+
+from eureka_ml_insights.configs import (
+    DataSetConfig,
+    ExperimentConfig,
+    InferenceConfig,
+    ModelConfig,
+    PipelineConfig,
+    PromptProcessingConfig,
+)
+from eureka_ml_insights.core import DataProcessing, Inference, PromptProcessing
+from eureka_ml_insights.data_utils import (
+    DataReader,
+    DFTransformBase,
+    MMDataLoader,
+)
+
+
+@dataclass
+class FileReaderTransform(DFTransformBase):
+    """
+    A transformation that reads file content from a specified file path column.
+    """
+
+    file_path_column: str = "file_path"
+
+    def transform(self, df):
+        """
+        Transform the DataFrame by reading content based on file paths.
+
+        Args:
+            df (pandas.DataFrame): The input DataFrame with the column specified
+                by file_path_column.
+
+        Returns:
+            pandas.DataFrame: The DataFrame with a new column 'file_content'.
+        """
+        # Implement the logic to read files from the specified column
+        df["file_content"] = df[self.file_path_column].apply(lambda x: open(x).read())
+        return df
+
+
+class FileWriterTransform(DFTransformBase):
+    """
+    A transformation that writes file content to a specified file path.
+    """
+
+    file_path_column: str = "file_path"
+    file_content_column: str = "model_output"
+
+    def transform(self, df):
+        """
+        Transforms the DataFrame by writing file content to disk.
+
+        This method replaces certain path elements, creates output directories if needed,
+        and writes the content from file_content_column to files specified by file_path_column.
+
+        Args:
+            df (pandas.DataFrame): The input DataFrame containing file path and content columns.
+
+        Returns:
+            pandas.DataFrame: The original DataFrame after writing the content to disk.
+        """
+        for index, row in df.iterrows():
+            with open(row[self.file_path_column], "w") as f:
+                f.write(row[self.file_content_column])
+        return df
+
+
+class DOCSTR_PIPELINE(ExperimentConfig):
+    """
+    An experiment configuration for a docstring generation pipeline.
+    """
+
+    def configure_pipeline(
+        self, model_config: ModelConfig, resume_from: str = None, **kwargs: dict[str, Any]
+    ) -> PipelineConfig:
+        """
+        Configure the pipeline components for docstring generation.
+
+        Args:
+            model_config (ModelConfig): Configuration for the model.
+            resume_from (str, optional): Path to a checkpoint to resume from. Defaults to None.
+            **kwargs (dict[str, Any]): Additional keyword arguments.
+
+        Returns:
+            PipelineConfig: The configured pipeline consisting of data processing,
+            inference, and post-processing components.
+        """
+
+        # input file should be a csv file with a column 'file_path' containing paths to Python files you want to add docstrings to.
+        input_file_path = kwargs.get("input_file_path", os.path.join(os.path.dirname(__file__), "../python_file_list.csv"))
+
+        # Configure the data processing component.
+        data_processing_comp = PromptProcessingConfig(
+            component_type=PromptProcessing,
+            prompt_template_path=os.path.join(os.path.dirname(__file__), "../prompt_templates/doc_str.jinja"),
+            data_reader_config=DataSetConfig(
+                DataReader,
+                {
+                    "path": input_file_path,
+                    "format": ".csv",
+                    "header": 0,
+                    "index_col": None,
+                    "transform": FileReaderTransform(file_path_column="file_path"),
+                },
+            ),
+            output_dir=os.path.join(self.log_dir, "data_processing_output"),
+        )
+        inference_comp = InferenceConfig(
+            component_type=Inference,
+            model_config=model_config,
+            data_loader_config=DataSetConfig(
+                MMDataLoader,
+                {"path": os.path.join(data_processing_comp.output_dir, "transformed_data.jsonl")},
+            ),
+            output_dir=os.path.join(self.log_dir, "inference_output"),
+            resume_from=resume_from,
+            max_concurrent=5,
+        )
+        post_process_comp = PromptProcessingConfig(
+            component_type=DataProcessing,
+            data_reader_config=DataSetConfig(
+                DataReader,
+                {
+                    "path": os.path.join(inference_comp.output_dir, "inference_result.jsonl"),
+                    "transform": FileWriterTransform(),
+                },
+            ),
+            output_dir=os.path.join(self.log_dir, "data_post_processing_output"),
+        )
+        return PipelineConfig([data_processing_comp, inference_comp, post_process_comp])
diff --git a/readme_docs/figures/arxiv_logo.svg b/readme_docs/figures/arxiv_logo.svg
new file mode 100644
index 00000000..0e60f635
--- /dev/null
+++ b/readme_docs/figures/arxiv_logo.svg
@@ -0,0 +1 @@
+<svg id="logomark" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 17.732 24.269"><g id="tiny"><path d="M573.549,280.916l2.266,2.738,6.674-7.84c.353-.47.52-.717.353-1.117a1.218,1.218,0,0,0-1.061-.748h0a.953.953,0,0,0-.712.262Z" transform="translate(-566.984 -271.548)" fill="#bdb9b4"/><path d="M579.525,282.225l-10.606-10.174a1.413,1.413,0,0,0-.834-.5,1.09,1.09,0,0,0-1.027.66c-.167.4-.047.681.319,1.206l8.44,10.242h0l-6.282,7.716a1.336,1.336,0,0,0-.323,1.3,1.114,1.114,0,0,0,1.04.69A.992.992,0,0,0,571,293l8.519-7.92A1.924,1.924,0,0,0,579.525,282.225Z" transform="translate(-566.984 -271.548)" fill="#b31b1b"/><path d="M584.32,293.912l-8.525-10.275,0,0L573.53,280.9l-1.389,1.254a2.063,2.063,0,0,0,0,2.965l10.812,10.419a.925.925,0,0,0,.742.282,1.039,1.039,0,0,0,.953-.667A1.261,1.261,0,0,0,584.32,293.912Z" transform="translate(-566.984 -271.548)" fill="#bdb9b4"/></g></svg>
\ No newline at end of file
diff --git a/docs/figures/eureka_logo.png b/readme_docs/figures/eureka_logo.png
similarity index 100%
rename from docs/figures/eureka_logo.png
rename to readme_docs/figures/eureka_logo.png
diff --git a/docs/figures/github.png b/readme_docs/figures/github.png
similarity index 100%
rename from docs/figures/github.png
rename to readme_docs/figures/github.png
diff --git a/docs/figures/msr_blog.png b/readme_docs/figures/msr_blog.png
similarity index 100%
rename from docs/figures/msr_blog.png
rename to readme_docs/figures/msr_blog.png
diff --git a/docs/figures/transparent_uml.png b/readme_docs/figures/transparent_uml.png
similarity index 100%
rename from docs/figures/transparent_uml.png
rename to readme_docs/figures/transparent_uml.png