Merge pull request #338 from Living-with-machines/dev

Mapreader 1.1.1
maps-as-data · Jan 8, 2024 · 566e602 · 566e602
2 parents 7e90e2b + 8997b4d
commit 566e602
Show file tree

Hide file tree

Showing 13 changed files with 987 additions and 375 deletions.
diff --git a/docs/source/User-guide/Annotate.rst b/docs/source/User-guide/Annotate.rst
@@ -65,6 +65,7 @@ Other arguments that you may want to be aware of when initializing the ``Annotat
 - ``show_context``: Whether to show a context image in the annotation interface (default: ``False``).
 - ``surrounding``: How many surrounding patches to show in the context image (default: ``1``).
 - ``sortby``: The name of the column to use to sort the patch Dataframe (e.g. "mean_pixel_R" to sort by red pixel intensities).
+- ``ascending``: A boolean indicating whether to sort in ascending or descending order (default: ``True``).
 - ``delimiter``: The delimiter to use when reading your data files (default: ``","`` for csv).
 
 After setting up the ``Annotator`` instance, you can interactively annotate a sample of your images using:
@@ -78,7 +79,7 @@ Patch size
 
 By default, your patches will be shown to you as their original size in pixels.
 This can make annotating difficult if your patches are very small.
-To resize your patches when viewing them in the annotation interface, you can pass the ``resize_to`` keyword argument when initializing the ``Annotator`` instance or when calling the ``annotate()`` method.
+To resize your patches when viewing them in the annotation interface, you can pass the ``resize_to`` argument when initializing the ``Annotator`` or when calling the ``annotate()`` method.
 
 e.g. to resize your patches so that their largest edge is 300 pixels:
 
@@ -101,14 +102,14 @@ Or, equivalently, :
 
     annotator.annotate(resize_to=300)
 
-.. note:: Passing the ``resize_to`` argument when calling the ``annotate()`` method overrides the ``resize_to`` argument passed when initializing the ``Annotator`` instance.
+.. note:: Passing the ``resize_to`` argument when calling the ``annotate()`` method overrides the ``resize_to`` argument passed when initializing the ``Annotator``.
 
 Context
 ~~~~~~~
 
 As well as resizing your patches, you can also set the annotation interface to show a context image using ``show_context=True``.
 This creates a panel of patches in the annotation interface, highlighting your patch in the middle of its surrounding immediate images.
-As above, you can either pass the ``show_context`` argument when initializing the ``Annotator`` instance or when calling the ``annotate`` method.
+As above, you can either pass the ``show_context`` argument when initializing the ``Annotator`` or when calling the ``annotate`` method.
 
 e.g. :
 
@@ -192,7 +193,7 @@ e.g. To sort your patches by the mean red pixel intensity in each patch but only
 Save your annotations
 ----------------------
 
-Your annotations are automatically saved as you're making progress through the annotation task as a ``csv`` file (unless you've set the ``auto_save`` keyword argument to ``False`` when you set up the ``Annotator`` instance).
+Your annotations are automatically saved as you're making progress through the annotation task as a ``csv`` file (unless you've set ``auto_save=False`` when you set up the ``Annotator`` instance).
 
 If you need to know the name of the annotations file, you may refer to a property on your ``Annotator`` instance:
 

diff --git a/docs/source/User-guide/Load.rst b/docs/source/User-guide/Load.rst
@@ -80,11 +80,13 @@ For example, if you have downloaded your maps using the default settings of our
     Other arguments you may want to specify when adding metadata to your images include:
 
     - ``index_col`` - By default, this is set to ``0`` so the first column of your csv/excel spreadsheet will be used as the index column when creating a pandas dataframe. If you would like to use a different column you can specify ``index_col``.
-    - ``columns`` - By default, the ``.add_metadata()`` method will add all the columns in your metadata to your ``MapImages`` object. If you would like to add only specific columns, you can pass a list of these as the ``columns``\s argument (e.g. ``columns=[`name`, `coordinates`, `region`]``) to add only these columns to your ``MapImages`` object.
+    - ``columns`` - By default, the ``add_metadata()`` method will add all the columns in your metadata to your ``MapImages`` object. If you would like to add only specific columns, you can pass a list of these as the ``columns``\s argument (e.g. ``columns=[`name`, `coordinates`, `region`]``) to add only these columns to your ``MapImages`` object.
     - ``ignore_mismatch``- By default, this is set to ``False`` so that an error is given if the images in your ``MapImages`` object are mismatched to your metadata. Setting ``ignore_mismatch`` to ``True`` (by specifying ``ignore_mismatch=True``) will allow you to bypass this error and add mismatched metadata. Only metadata corresponding to images in your ``MapImages`` object will be added.
     - ``delimiter`` - By default, this is set to ``|``. If your csv file is delimited using a different delimiter you should specify the delimiter argument.
 
 
+.. note:: In MapReader versions < 1.0.7, coordinates were miscalculated. To correct this, use the ``add_coords_from_grid_bb()`` method to calculate new, correct coordinates.
+
 Patchify
 ----------
 
@@ -184,7 +186,7 @@ As above, you can use the ``path_save`` argument to change where these patches a
     Other arguments you may want to specify when patchifying your images include:
 
     - ``square_cuts`` - By default, this is set to ``False``. Thus, if your ``patch_size`` is not a factor of your image size (e.g. if you are trying to slice a 100x100 pixel image into 8x8 pixel patches), you will end up with some rectangular patches at the edges of your image. If you set ``square_cuts=True``, then all your patches will be square, however there will be some overlap between edge patches. Using ``square_cuts=True`` is useful if you need square images for model training, and don't want to warp your rectangular images by resizing them at a later stage.
-    - ``add_to_parent`` - By default, this is set to ``True`` so that each time you run ``.patchify_all()`` your patches are added to your ``MapImages`` object. Setting it to ``False`` (by specifying ``add_to_parent=False``) will mean your patches are created, but not added to your ``MapImages`` object. This can be useful for testing out different patch sizes.
+    - ``add_to_parent`` - By default, this is set to ``True`` so that each time you run ``patchify_all()`` your patches are added to your ``MapImages`` object. Setting it to ``False`` (by specifying ``add_to_parent=False``) will mean your patches are created, but not added to your ``MapImages`` object. This can be useful for testing out different patch sizes.
     - ``rewrite`` - By default, this is set to ``False`` so that if your patches already exist they are not overwritten. Setting it to ``True`` (by specifying ``rewrite=True``) will mean already existing patches are recreated and overwritten.
 
 If you would like to save your patches as geo-referenced tiffs (i.e. geotiffs), use:
@@ -193,10 +195,12 @@ If you would like to save your patches as geo-referenced tiffs (i.e. geotiffs),
 
     my_files.save_patches_as_geotiffs()
 
-This will save each patch in your ``MapImages`` object as a ``.geotiff`` file in your patches directory.
+This will save each patch in your ``MapImages`` object as a georeferenced ``.tif`` file in your patches directory.
+
+.. note:: MapReader also has a ``save_parents_as_geotiff()`` method for saving parent images as geotiffs.
 
-After running the ``.patchify_all()`` method, you'll see that ``print(my_files)`` shows you have both 'parents' and 'patches'.
-To view an iterable list of these, you can use the ``.list_parents()`` and ``.list_patches()`` methods:
+After running the ``patchify_all()`` method, you'll see that ``print(my_files)`` shows you have both 'parents' and 'patches'.
+To view an iterable list of these, you can use the ``list_parents()`` and ``list_patches()`` methods:
 
 .. code-block:: python
 
@@ -229,7 +233,7 @@ or
 
 .. note:: These parent and patch dataframes **will not** automatically update so you will want to run this command again if you add new information into your ``MapImages`` object.
 
-At any point, you can also save these dataframes by passing the ``save`` argument to the ``.convert_images()`` method:
+At any point, you can also save these dataframes by passing the ``save`` argument to the ``convert_images()`` method:
 
 .. code-block:: python
 
@@ -280,7 +284,7 @@ If, however, you want to see a random sample of your patches use the ``tree_leve
 
 
 It can also be helpful to see your patches in the context of their parent image.
-To do this use the ``.show()`` method.
+To do this use the ``show()`` method.
 
 e.g. :
 
@@ -312,7 +316,7 @@ This will show you your chosen patches, by default highlighted with red borders,
 .. admonition:: Advanced usage
     :class: dropdown
 
-    Further usage of the ``.show()`` method is detailed in :ref:`Further_analysis`.
+    Further usage of the ``show()`` method is detailed in :ref:`Further_analysis`.
     Please head there for guidance on advanced usage.
 
 You may also want to see all the patches created from one of your parent images.
@@ -330,7 +334,7 @@ This can be done using:
 .. admonition:: Advanced usage
     :class: dropdown
 
-    Further usage of the ``.show_parent()`` method is detailed in :ref:`Further_analysis`.
+    Further usage of the ``show_parent()`` method is detailed in :ref:`Further_analysis`.
     Please head there for guidance on advanced usage.
 
 .. todo:: Move 'Further analysis/visualization' to a different page (e.g. as an appendix)
@@ -341,13 +345,13 @@ Further analysis/visualization (optional)
 -------------------------------------------
 
 If you have loaded geographic coordinates into your ``MapImages`` object, you may want to calculate the central coordinates of your patches.
-The ``.add_center_coord()`` method can used to do this:
+The ``add_center_coord()`` method can used to do this:
 
 .. code-block:: python
 
     my_files.add_center_coord()
 
-You can then rerun the ``.convert_images()`` method to see your results.
+You can then rerun the ``convert_images()`` method to see your results.
 
 i.e.:
 
@@ -358,15 +362,15 @@ i.e.:
 
 You will see that center coordinates of each patch have been added to your patch dataframe.
 
-The ``.calc_pixel_stats()`` method can be used to calculate means and standard deviations of pixel intensities of each of your patches:
+The ``calc_pixel_stats()`` method can be used to calculate means and standard deviations of pixel intensities of each of your patches:
 
 .. code-block:: python
 
     my_files.calc_pixel_stats()
 
-After rerunning the ``.convert_images()`` method (as above), you will see that mean and standard pixel intensities have been added to your patch dataframe.
+After rerunning the ``convert_images()`` method (as above), you will see that mean and standard pixel intensities have been added to your patch dataframe.
 
-The ``.show()`` and ``.show_parent()`` methods can be used to plot these values ontop of your patches.
+The ``show()`` and ``show_parent()`` methods can be used to plot these values ontop of your patches.
 This is done by specifying the ``column_to_plot`` argument.
 
 e.g. to view "mean_pixel_R" on your patches:
@@ -394,12 +398,12 @@ e.g. to view "mean_pixel_R" on your patches:
 .. image:: ../figures/show_par_RGB_0.5.png
     :width: 400px
 
-.. note:: The ``column_to_plot`` argument can also be used with the ``.show()`` method.
+.. note:: The ``column_to_plot`` argument can also be used with the ``show()`` method.
 
 .. admonition:: Advanced usage
     :class: dropdown
 
-    Other arguments you may want to specify when showing your images (for both the ``.show()`` and ``.show_parent()`` methods):
+    Other arguments you may want to specify when showing your images (for both the ``show()`` and ``show_parent()`` methods):
 
     - ``plot_parent`` - By default, this is set to ``True`` so that the parent image is shown. If you would like to remove the parent image, e.g. if you are plotting column values, you can set ``plot_parent=False``. This should speed up the code for plotting.
     - ``patch_border`` - By default, this is set to ``True`` so that borders are plotted around each patch. Setting ``patch_border`` to ``False`` (by specifying ``patch_border=False``) will stop patch borders being shown.

diff --git a/mapreader/annotate/annotator.py b/mapreader/annotate/annotator.py
@@ -22,8 +22,6 @@
 
 warnings.filterwarnings("ignore", category=UserWarning)
 
-MAX_SIZE = 1000
-
 _CENTER_LAYOUT = widgets.Layout(
     display="flex", flex_flow="column", align_items="center"
 )
@@ -64,8 +62,23 @@ class Annotator(pd.DataFrame):
     sortby : str or None, optional
         Name of the column to use to sort the patch DataFrame, by default None.
         Default sort order is ``ascending=True``. Pass ``ascending=False`` keyword argument to sort in descending order.
-    **kwargs
-        Additional keyword arguments
+    ascending : bool, optional
+        Whether to sort the DataFrame in ascending order when using the ``sortby`` argument, by default True.
+    username : str or None, optional
+        Username to use when saving annotations file, by default None.
+        If not provided, a random string is generated.
+    task_name : str or None, optional
+        Name of the annotation task, by default None.
+    min_values : dict, optional
+        A dictionary consisting of column names (keys) and minimum values as floating point values (values), by default None.
+    max_values : dict, optional
+        A dictionary consisting of column names (keys) and maximum values as floating point values (values), by default None.
+    surrounding : int, optional
+        The number of surrounding images to show for context, by default 1.
+    max_size : int, optional
+        The size in pixels for the longest side to which constrain each patch image, by default 1000.
+    resize_to : int or None, optional
+        The size in pixels for the longest side to which resize each patch image, by default None.
 
     Raises
     ------
@@ -79,21 +92,6 @@ class Annotator(pd.DataFrame):
         If labels provided are not in the form of a list
     SyntaxError
         If labels provided are not in the form of a list
-
-    Notes
-    -----
-
-    Additional kwargs:
-
-    - ``username``: Username to use when saving annotations file. Default: Randomly generated string.
-    - ``task_name``: Name of the annotation task. Default: "task".
-    - ``min_values``: A dictionary consisting of column names (keys) and minimum values as floating point values (values). Default: {}.
-    - ``max_values``: A dictionary consisting of column names (keys) and maximum values as floating point values (values). Default: {}.
-    - ``buttons_per_row``: Number of buttons to display per row. Default: None.
-    - ``ascending``: Whether to sort the DataFrame in ascending order. Default: True.
-    - ``surrounding``: The number of surrounding images to show for context. Default: 1.
-    - ``max_size``: The size in pixels for the longest side to which constrain each patch image. Default: 1000.
-    - ``resize_to``: The size in pixels for the longest side to which resize each patch image. Default: None.
     """
 
     def __init__(
@@ -111,7 +109,14 @@ def __init__(
         auto_save: bool = True,
         delimiter: str = ",",
         sortby: str | None = None,
-        **kwargs,
+        ascending: bool = True,
+        username: str | None = None,
+        task_name: str | None = None,
+        min_values: dict | None = None,
+        max_values: dict | None = None,
+        surrounding: int = 1,
+        max_size: int = 1000,
+        resize_to: int | None = None,
     ):
         if labels is None:
             labels = []
@@ -174,10 +179,6 @@ def __init__(
         # Check for url column and add to patch dataframe
         if "url" in parent_df.columns:
             patch_df = patch_df.join(parent_df["url"], on="parent_id")
-        else:
-            raise ValueError(
-                "[ERROR] Metadata (parent data) should contain a 'url' column."
-            )
 
         # Add label column if not present
         if label_col not in patch_df.columns:
@@ -195,13 +196,12 @@ def __init__(
         )
 
         # Set up annotations file
-        username = kwargs.get(
-            "username",
-            "".join(
+        if not username:
+            username = "".join(
                 [random.choice(string.ascii_letters + string.digits) for n in range(30)]
-            ),
-        )
-        task_name = kwargs.get("task_name", "task")
+            )
+        if not task_name:
+            task_name = "task"
         id = hashlib.md5(image_list.encode("utf-8")).hexdigest()
 
         annotations_file = task_name.replace(" ", "_") + f"_#{username}#-{id}.csv"
@@ -269,9 +269,7 @@ def __init__(
         # Sort by sortby column if provided
         if isinstance(sortby, str):
             if sortby in self.columns:
-                self.sort_values(
-                    sortby, ascending=kwargs.get("ascending", True), inplace=True
-                )
+                self.sort_values(sortby, ascending=ascending, inplace=True)
             else:
                 raise ValueError(f"[ERROR] {sortby} is not a column in the DataFrame.")
         elif sortby is not None:
@@ -287,35 +285,33 @@ def __init__(
         self.task_name = task_name
 
         # set up for the annotator
-        self.buttons_per_row = kwargs.get("buttons_per_row", None)
-        self._min_values = kwargs.get("min_values", {})
-        self._max_values = kwargs.get("max_values", {})  # pixel_bounds = x0, y0, x1, y1
+        self._min_values = min_values or {}
+        self._max_values = max_values or {}
 
         self.patch_width, self.patch_height = self.get_patch_size()
 
         # Create annotations_dir
         Path(annotations_dir).mkdir(parents=True, exist_ok=True)
 
         # Set up standards for context display
-        self.surrounding = kwargs.get("surrounding", 1)
-        self.max_size = kwargs.get("max_size", MAX_SIZE)
-        self.resize_to = kwargs.get("resize_to", None)
+        self.surrounding = surrounding
+        self.max_size = max_size
+        self.resize_to = resize_to
 
         # set up buttons
         self._buttons = []
 
         # Set max buttons
-        if not self.buttons_per_row:
-            if (len(self._labels) % 2) == 0:
-                if len(self._labels) > 4:
-                    self.buttons_per_row = 4
-                else:
-                    self.buttons_per_row = 2
+        if (len(self._labels) % 2) == 0:
+            if len(self._labels) > 4:
+                self.buttons_per_row = 4
             else:
-                if len(self._labels) == 3:
-                    self.buttons_per_row = 3
-                else:
-                    self.buttons_per_row = 5
+                self.buttons_per_row = 2
+        else:
+            if len(self._labels) == 3:
+                self.buttons_per_row = 3
+            else:
+                self.buttons_per_row = 5
 
         # Set indices
         self.current_index = -1

diff --git a/mapreader/classify/load_annotations.py b/mapreader/classify/load_annotations.py
@@ -180,6 +180,8 @@ def _load_annotations_csv(
         if os.path.isfile(annotations):
             print(f'[INFO] Reading "{annotations}"')
             annotations = pd.read_csv(annotations, sep=delimiter, index_col=0)
+            if annotations.index.name in ["name", "image_id"]:
+                annotations.reset_index(inplace=True, drop=False)
         else:
             raise ValueError(f'[ERROR] "{annotations}" cannot be found.')