Merge pull request #495 from maps-as-data/436-geopandas

Better use of geopandas in MapReader
maps-as-data · Sep 12, 2024 · d3f2c91 · d3f2c91
2 parents 3db8e6f + c3d0f64
commit d3f2c91
Show file tree

Hide file tree

Showing 46 changed files with 2,338 additions and 1,489 deletions.
diff --git a/.gitignore b/.gitignore
@@ -19,3 +19,10 @@ worked_examples/**/workshops_*/**/*.png
 worked_examples/**/workshops_*/**/*.csv
 worked_examples/**/workshops_*/**/*.geojson
 worked_examples/**/workshops_*/**/*.xlsx
+
+
+# test outputs
+/broken_files.txt
+/tmp_checkpoints/*
+/tests/sample_files/cropped_74488689.tif
+/tests/sample_files/cropped_L.tif
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -17,7 +17,30 @@ The following table shows which versions of MapReader are compatible with which
 
 _ADD NEW CHANGES HERE_
 
-### [v1.3.9](https://github.com/Living-with-machines/MapReader/releases/tag/v1.3.9) (2024-08-21)
+### Added
+
+- `check_georeferencing` method and `georeferenced` attribute added to `MapImages` class ([#495](https://github.com/maps-as-data/MapReader/pull/495))
+- `MapImages.convert_images` method now supports saving to GeoJSON format (set `save_format="geojson"`) ([#495](https://github.com/maps-as-data/MapReader/pull/495))
+- All file loading methods now support `pathlib.Path` and `gpd.GeoDataFrame` objects as input ([#495](https://github.com/maps-as-data/MapReader/pull/495))
+- Loading of dataframes from GeoJSON files now supported in many file loading methods (e.g. `add_metadata`, `Annotator.__init__`, `AnnotationsLoader.load`, etc.) ([#495](https://github.com/maps-as-data/MapReader/pull/495))
+- `load_frames.py` added to `mapreader.utils`. This has functions for loading from various file formats (e.g. CSV, Excel, GeoJSON, etc.) and converting to GeoDataFrames ([#495](https://github.com/maps-as-data/MapReader/pull/495))
+
+### Changed
+
+- Refactoring of `SheetDownloader` to make full use of geopandas functionality - "metadata.json" is now read in as a GeoDataFrame instead of json dictionary ([#495](https://github.com/maps-as-data/MapReader/pull/495))
+- `query_map_sheets_by_string` and `download_map_sheets_by_string` methods now search using columns in the GeoDataFrame instead of keys in the json dictionary ([#495](https://github.com/maps-as-data/MapReader/pull/495))
+- `columns` argument renamed to `usecols` in `MapImages.add_metadata` method (to align with pandas) ([#495](https://github.com/maps-as-data/MapReader/pull/495))
+- `polygon` column renamed to `geometry` (to align with geopandas) ([#495](https://github.com/maps-as-data/MapReader/pull/495))
+
+### Removed
+
+- `hist_published_dates` method removed from `SheetDownloader` as it is no longer needed. Use `sd.metadata["published_date"].hist()` instead ([#495](https://github.com/maps-as-data/MapReader/pull/495))
+
+## [v.1.3.10](https://github.com/Living-with-machines/MapReader/releases/tag/v1.3.9) (2024-09-04)
+
+_No changes to code. This release marks the move the the `maps-as-data` github organisation._
+
+## [v1.3.9](https://github.com/Living-with-machines/MapReader/releases/tag/v1.3.9) (2024-08-21)
 
 ### Added
 

diff --git a/docs/source/using-mapreader/input-guidance/file-map-options.rst b/docs/source/using-mapreader/input-guidance/file-map-options.rst
@@ -16,10 +16,10 @@ Instructions for accessing tile layers from one example collection is below:
 If you want to download maps from a TileServer using MapReader's ``Download`` subpackage, you will need to begin with the 'Download' task.
 For this, you will need:
 
-* A ``json`` file containing metadata for each map sheet you would like to query/download.
+* A `geojson <https://geojson.org/>`__ file containing metadata for each map sheet you would like to query/download.
 * The URL of the XYZ tile layer which you would like to access.
 
-At a minimum, for each map sheet, your ``json`` file should contain information on:
+At a minimum, for each map sheet, your geojson file should contain information on:
 
 - the name and URL of an individual sheet that is contained in the composite layer
 - the geometry of the sheet (i.e. its coordinates), so that, where applicable, individual sheets can be isolated from the whole layer

diff --git a/docs/source/using-mapreader/input-guidance/preparing-metadata.rst b/docs/source/using-mapreader/input-guidance/preparing-metadata.rst
@@ -1,25 +1,15 @@
 Preparing your metadata
 =======================
 
-MapReader uses the file names of your map images as unique identifiers (``image_id`` s).
-Therefore, if you would like to associate metadata to your map images, then, **at minimum**, your metadata must contain a column/header named ``image_id`` or ``name`` whose content is the file name of each map image.
+MapReader uses the file names of your map images as unique identifiers.
+Therefore, if you would like to associate metadata (e.g. georeferencing information, publication dates or any other information about your images) to your map images, then your metadata must contain a column/header named ``image_id`` or ``name`` that matches the file names of your map images and columns for the metadata you'd like to add.
 
-To load metadata (e.g. georeferencing information, publication dates or any other information about your images) into MapReader, your metadata must be in a `file format readable by Pandas <https://pandas.pydata.org/>`_.
+To load metadata from a file into MapReader, your metadata file should be in a CSV (or TSV/etc), Excel or GeoJSON file format.
 
-.. note:: Many map collections do not have item-level metadata, however even the minimal requirements here (a filename, geospatial coordinates, and CRS) will suffice for using MapReader. It is always a good idea to talk to the curators of the map collections you wish to use with MapReader to see if there are metadata files that can be shared for research purposes.
-
-
-Option 1 - Using a ``csv``, ``xls`` or ``xlsx`` file
------------------------------------------------------
-
-The simplest option is to save your metadata as a ``csv``, ``xls`` or ``xlsx`` file and load it directly into MapReader.
-
-.. note:: If you are using a ``csv`` file but the contents of you metadata contains commas, you will need to use another delimiter. We recommend using a pipe (``|``).
-
-If you are loading metadata from a ``csv``, ``xls`` or ``xlsx`` file, your file should be structures as follows:
+e.g. If you are loading metadata from a CSV/TSV/etc or Excel file, your file could be structured as follows:
 
 +-----------+-----------------------------+------------------------+--------------+
-| image_id  | column1 (e.g. coords)       | column2 (e.g. region)  | column3      |
+| image_id  | coordinates                 | region                 | column3      |
 +===========+=============================+========================+==============+
 | map1.png  | (-4.8, 55.8, -4.2, 56.4)    | Glasgow                | ...          |
 +-----------+-----------------------------+------------------------+--------------+
@@ -30,14 +20,16 @@ If you are loading metadata from a ``csv``, ``xls`` or ``xlsx`` file, your file
 | ...       | ...                         | ...                    | ...          |
 +-----------+-----------------------------+------------------------+--------------+
 
-Your file can contain as many columns/rows as you like, so long as it contains at least one named ``image_id`` or ``name``.
+This file can contain as many columns as you like, but the ``image_id`` column is required to ensure the metadata is matched to the correct map image.
+
+.. note:: Many map collections do not have item-level metadata, however even the minimal requirements here (a filename, geospatial coordinates, and CRS) will suffice for using MapReader. It is always a good idea to talk to the curators of the map collections you wish to use with MapReader to see if there are metadata files that can be shared for research purposes.
 
 .. Add comment about nature of coordinates as supplied by NLS vs what they might be for other collections
 
-Option 2 - Loading metadata from other file formats
----------------------------------------------------
+Using metadata in other formats
+--------------------------------
 
-As Pandas is able to read `a number of different file formats <https://pandas.pydata.org/docs/user_guide/io.html>`_, you may still be able to use your metadata even if it is saved in a different file format.
+So long as your file is in a format readable by `Pandas <https://pandas.pydata.org/docs/user_guide/io.html>`_ or `GeoPandas <https://geopandas.org/en/stable/docs/user_guide/io.html>`_, you may still be able to use your metadata even if it is saved in a file format not supported by MapReader.
 
 To do this, you will need to use Python to: