add docs on how to use context model

maps-as-data · Nov 27, 2023 · 0b33fa1 · 0b33fa1
1 parent d8f31d3
commit 0b33fa1
Showing 1 changed file with 47 additions and 11 deletions.
diff --git a/docs/source/User-guide/Classify/Train.rst b/docs/source/User-guide/Classify/Train.rst
@@ -30,7 +30,7 @@ For example, if you have set up your directory as recommended in our `Input Guid
 .. admonition:: Advanced usage
     :class: dropdown
 
-    Other arguments you may want to specify when adding metadata to your images include:
+    Other arguments you may want to specify when loading your annotations include:
 
     - ``delimiter`` - By default, this is set to "\t" so will assume your ``csv`` file is tab delimited. You will need to specify the ``delimiter`` argument if your file is saved in another format.
     - ``id_col``, ``patch_paths_col``, ``label_col`` - These are used to indicate the column headings for the columns which contain image IDs, patch file paths and labels respectively. By default, these are set to "image_id", "image_path" and "label".
@@ -184,31 +184,41 @@ To split your annotated images and create your dataloaders, use:
 By default, this will split your annotated images using the :ref:`default train:val:test ratios<ratios>` and apply the :ref:`default image transforms<transforms>` to each by calling the ``.create_datasets()`` method.
 It will then create a dataloader for each dataset, using a batch size of 16 and the :ref:`default sampler<sampler>`.
 
-To change the ratios used to split your annotations, you can specify ``frac_train``, ``frac_val`` and ``frac_test``:
+To change the batch size used when creating your dataloaders, use the ``batch_size`` argument:
 
 .. code-block:: python
 
     #EXAMPLE
-    dataloaders = annotated_images.create_dataloaders(frac_train=0.6, frac_val=0.3, frac_test=0.1)
+    dataloaders = annotated_images.create_dataloaders(batch_size=24)
 
-This will result in a split of 60% (train), 30% (val) and 10% (test).
+.. admonition:: Advanced usage
+    :class: dropdown
 
-To change the batch size used when creating your dataloaders, use the ``batch_size`` argument:
+    Other arguments you may want to specify when creating your dataloaders include:
+
+    - ``sampler`` - By default, this is set to ``default`` and so the :ref:`default sampler<sampler>` will be used when creating your dataloaders and batches. You can choose not to use a sampler by specifying ``sampler=None`` or, you can define a custom sampler using `pytorch's sampler class <https://pytorch.org/docs/stable/data.html#data-loading-order-and-sampler>`__.
+    - ``shuffle`` - If your datasets are ordered (e.g. ``"a","a","a","a","b","c"``), you can use ``shuffle=True`` to create dataloaders which contain shuffled batches of data. This cannot be used in conjunction with a sampler and so, by default, ``shuffle=False``.
+
+
+If you would like to use custom settings when creating your datasets, you should call the ``create_datasets()`` method directly instead of via the ``create_dataloaders()`` method.
+You should then run the ``create_dataloaders()`` method afterwards to create your dataloaders as before.
+
+For example, to change the ratios used to split your annotations, you can specify ``frac_train``, ``frac_val`` and ``frac_test``:
 
 .. code-block:: python
 
     #EXAMPLE
-    dataloaders = annotated_images.create_dataloaders(batch_size=24)
+    annotated_images.create_datasets(frac_train=0.6, frac_val=0.3, frac_test=0.1)
+    dataloaders = annotated_images.create_dataloaders()
+
+This will result in a split of 60% (train), 30% (val) and 10% (test).
 
 .. admonition:: Advanced usage
-    :class: dropdown
 
-    Other arguments you may want to specify when adding metadata to your images include:
+    Other arguments you may want to specify when creating your datasets include:
 
-    - ``sampler`` - By default, this is set to ``default`` and so the :ref:`default sampler<sampler>` will be used when creating your dataloaders and batches. You can choose not to use a sampler by specifying ``sampler=None`` or, you can define a custom sampler using `pytorch's sampler class <https://pytorch.org/docs/stable/data.html#data-loading-order-and-sampler>`__.
-    - ``shuffle`` - If your datasets are ordered (e.g. ``"a","a","a","a","b","c"``), you can use ``shuffle=True`` to create dataloaders which contain shuffled batches of data. This cannot be used in conjunction with a sampler and so, by default, ``shuffle=False``.
     - ``train_transform``, ``val_transform`` and ``test_transform`` - By default, these are set to "train", "val" and "test" respectively and so the :ref:`default image transforms<transforms>` for each of these sets are applied to the images. You can define your own transforms, using  `torchvision's transforms module <https://pytorch.org/vision/stable/transforms.html>`__, and apply these to your datasets by specifying the ``train_transform``, ``val_transform`` and ``test_transform`` arguments.
-
+    - ``context_dataset`` - By default, this is set to ``False`` and so only the patches themselves are used as inputs to the model. Setting ``context_dataset=True`` will result in datasets which return both the patches and their context as inputs for the model.
 
 Train
 ------
@@ -323,6 +333,30 @@ There are a number of options for the ``model`` argument:
         .. note:: You will need to install the `timm <https://huggingface.co/docs/timm/index>`__ library to do this (``pip install timm``).
 
 
+.. admonition:: Context models
+    :class: dropdown
+
+    If you have created context datasets, you will need to load two models (one for processing patches and one for processing patches plus context) using the methods above.
+    You should then pass these models to MapReaders ``twoParrallelModels`` class which combines their outputs through one fully connected layer:
+
+    .. code:: python
+
+        # define fc layer inputs and output
+        import torch
+
+        fc_layer = torch.nn.Linear(1004, len(annotated_images.labels_map))
+
+    The number of inputs to your fully connected layer should be the sum of the number of outputs from your two models and the number of outputs should be the number of classes (labels) you are using.
+
+    Your models and ``fc_layer`` should then be used to set up your custom model:
+
+    .. code:: python
+
+        from mapreader.classify.custom_models import twoParrallelModels
+
+        my_model = twoParrallelModels(patch_model, context_model, fc_layer)
+
+
 Define criterion, optimizer and scheduler
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -388,6 +422,8 @@ In order to train/fine-tune your model, will need to define:
         #EXAMPLE
         params2optimize = my_classifier.generate_layerwise_lrs(min_lr=1e-4, max_lr=1e-3, spacing="geomspace")
 
+    .. note:: If you are using a context model, you should also set ``parameter_groups=True`` when running the ``generate_layerwise_lrs()`` method. This will ensure the two branches of your models are optimized properly.
+
     You should then pass your ``params2optimize`` list to the ``.initialize_optimizer()`` method:
 
     .. code-block:: python