Skip to content

Commit

Permalink
Merge pull request #495 from maps-as-data/436-geopandas
Browse files Browse the repository at this point in the history
Better use of geopandas in MapReader
  • Loading branch information
rwood-97 authored Sep 12, 2024
2 parents 3db8e6f + c3d0f64 commit d3f2c91
Show file tree
Hide file tree
Showing 46 changed files with 2,338 additions and 1,489 deletions.
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,10 @@ worked_examples/**/workshops_*/**/*.png
worked_examples/**/workshops_*/**/*.csv
worked_examples/**/workshops_*/**/*.geojson
worked_examples/**/workshops_*/**/*.xlsx


# test outputs
/broken_files.txt
/tmp_checkpoints/*
/tests/sample_files/cropped_74488689.tif
/tests/sample_files/cropped_L.tif
25 changes: 24 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,30 @@ The following table shows which versions of MapReader are compatible with which

_ADD NEW CHANGES HERE_

### [v1.3.9](https://github.com/Living-with-machines/MapReader/releases/tag/v1.3.9) (2024-08-21)
### Added

- `check_georeferencing` method and `georeferenced` attribute added to `MapImages` class ([#495](https://github.com/maps-as-data/MapReader/pull/495))
- `MapImages.convert_images` method now supports saving to GeoJSON format (set `save_format="geojson"`) ([#495](https://github.com/maps-as-data/MapReader/pull/495))
- All file loading methods now support `pathlib.Path` and `gpd.GeoDataFrame` objects as input ([#495](https://github.com/maps-as-data/MapReader/pull/495))
- Loading of dataframes from GeoJSON files now supported in many file loading methods (e.g. `add_metadata`, `Annotator.__init__`, `AnnotationsLoader.load`, etc.) ([#495](https://github.com/maps-as-data/MapReader/pull/495))
- `load_frames.py` added to `mapreader.utils`. This has functions for loading from various file formats (e.g. CSV, Excel, GeoJSON, etc.) and converting to GeoDataFrames ([#495](https://github.com/maps-as-data/MapReader/pull/495))

### Changed

- Refactoring of `SheetDownloader` to make full use of geopandas functionality - "metadata.json" is now read in as a GeoDataFrame instead of json dictionary ([#495](https://github.com/maps-as-data/MapReader/pull/495))
- `query_map_sheets_by_string` and `download_map_sheets_by_string` methods now search using columns in the GeoDataFrame instead of keys in the json dictionary ([#495](https://github.com/maps-as-data/MapReader/pull/495))
- `columns` argument renamed to `usecols` in `MapImages.add_metadata` method (to align with pandas) ([#495](https://github.com/maps-as-data/MapReader/pull/495))
- `polygon` column renamed to `geometry` (to align with geopandas) ([#495](https://github.com/maps-as-data/MapReader/pull/495))

### Removed

- `hist_published_dates` method removed from `SheetDownloader` as it is no longer needed. Use `sd.metadata["published_date"].hist()` instead ([#495](https://github.com/maps-as-data/MapReader/pull/495))

## [v.1.3.10](https://github.com/Living-with-machines/MapReader/releases/tag/v1.3.9) (2024-09-04)

_No changes to code. This release marks the move the the `maps-as-data` github organisation._

## [v1.3.9](https://github.com/Living-with-machines/MapReader/releases/tag/v1.3.9) (2024-08-21)

### Added

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ Instructions for accessing tile layers from one example collection is below:
If you want to download maps from a TileServer using MapReader's ``Download`` subpackage, you will need to begin with the 'Download' task.
For this, you will need:

* A ``json`` file containing metadata for each map sheet you would like to query/download.
* A `geojson <https://geojson.org/>`__ file containing metadata for each map sheet you would like to query/download.
* The URL of the XYZ tile layer which you would like to access.

At a minimum, for each map sheet, your ``json`` file should contain information on:
At a minimum, for each map sheet, your geojson file should contain information on:

- the name and URL of an individual sheet that is contained in the composite layer
- the geometry of the sheet (i.e. its coordinates), so that, where applicable, individual sheets can be isolated from the whole layer
Expand Down
30 changes: 11 additions & 19 deletions docs/source/using-mapreader/input-guidance/preparing-metadata.rst
Original file line number Diff line number Diff line change
@@ -1,25 +1,15 @@
Preparing your metadata
=======================

MapReader uses the file names of your map images as unique identifiers (``image_id`` s).
Therefore, if you would like to associate metadata to your map images, then, **at minimum**, your metadata must contain a column/header named ``image_id`` or ``name`` whose content is the file name of each map image.
MapReader uses the file names of your map images as unique identifiers.
Therefore, if you would like to associate metadata (e.g. georeferencing information, publication dates or any other information about your images) to your map images, then your metadata must contain a column/header named ``image_id`` or ``name`` that matches the file names of your map images and columns for the metadata you'd like to add.

To load metadata (e.g. georeferencing information, publication dates or any other information about your images) into MapReader, your metadata must be in a `file format readable by Pandas <https://pandas.pydata.org/>`_.
To load metadata from a file into MapReader, your metadata file should be in a CSV (or TSV/etc), Excel or GeoJSON file format.

.. note:: Many map collections do not have item-level metadata, however even the minimal requirements here (a filename, geospatial coordinates, and CRS) will suffice for using MapReader. It is always a good idea to talk to the curators of the map collections you wish to use with MapReader to see if there are metadata files that can be shared for research purposes.


Option 1 - Using a ``csv``, ``xls`` or ``xlsx`` file
-----------------------------------------------------

The simplest option is to save your metadata as a ``csv``, ``xls`` or ``xlsx`` file and load it directly into MapReader.

.. note:: If you are using a ``csv`` file but the contents of you metadata contains commas, you will need to use another delimiter. We recommend using a pipe (``|``).

If you are loading metadata from a ``csv``, ``xls`` or ``xlsx`` file, your file should be structures as follows:
e.g. If you are loading metadata from a CSV/TSV/etc or Excel file, your file could be structured as follows:

+-----------+-----------------------------+------------------------+--------------+
| image_id | column1 (e.g. coords) | column2 (e.g. region) | column3 |
| image_id | coordinates | region | column3 |
+===========+=============================+========================+==============+
| map1.png | (-4.8, 55.8, -4.2, 56.4) | Glasgow | ... |
+-----------+-----------------------------+------------------------+--------------+
Expand All @@ -30,14 +20,16 @@ If you are loading metadata from a ``csv``, ``xls`` or ``xlsx`` file, your file
| ... | ... | ... | ... |
+-----------+-----------------------------+------------------------+--------------+

Your file can contain as many columns/rows as you like, so long as it contains at least one named ``image_id`` or ``name``.
This file can contain as many columns as you like, but the ``image_id`` column is required to ensure the metadata is matched to the correct map image.

.. note:: Many map collections do not have item-level metadata, however even the minimal requirements here (a filename, geospatial coordinates, and CRS) will suffice for using MapReader. It is always a good idea to talk to the curators of the map collections you wish to use with MapReader to see if there are metadata files that can be shared for research purposes.

.. Add comment about nature of coordinates as supplied by NLS vs what they might be for other collections
Option 2 - Loading metadata from other file formats
---------------------------------------------------
Using metadata in other formats
--------------------------------

As Pandas is able to read `a number of different file formats <https://pandas.pydata.org/docs/user_guide/io.html>`_, you may still be able to use your metadata even if it is saved in a different file format.
So long as your file is in a format readable by `Pandas <https://pandas.pydata.org/docs/user_guide/io.html>`_ or `GeoPandas <https://geopandas.org/en/stable/docs/user_guide/io.html>`_, you may still be able to use your metadata even if it is saved in a file format not supported by MapReader.

To do this, you will need to use Python to:

Expand Down
Loading

0 comments on commit d3f2c91

Please sign in to comment.