Skip to content

Commit

Permalink
Merge pull request #500 from maps-as-data/test_text_spotting
Browse files Browse the repository at this point in the history
Add tests for text spotting code
  • Loading branch information
rwood-97 authored Sep 12, 2024
2 parents d3f2c91 + ea06538 commit a87b5bc
Show file tree
Hide file tree
Showing 11 changed files with 813 additions and 2 deletions.
89 changes: 89 additions & 0 deletions .github/workflows/mr_ci_text_spotting.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
name: Units Tests - Text Spotting

on: [push]

# Run linter with github actions for quick feedbacks.
jobs:

macos_tests:
runs-on: macos-latest
# run on PRs, or commits to facebookresearch (not internal)
strategy:
fail-fast: false
matrix:
torch: ["1.13.1", "2.2.2"]
include:
- torch: "1.13.1"
torchvision: "0.14.1"
- torch: "2.2.2"
torchvision: "0.17.2"

env:
# point datasets to ~/.torch so it's cached by CI
DETECTRON2_DATASETS: ~/.torch/datasets
steps:
- name: Checkout
uses: actions/checkout@v2

- name: Set up Python 3.9
uses: actions/setup-python@v2
with:
python-version: 3.9

- name: Update pip
run: |
python -m ensurepip
python -m pip install --upgrade pip
- name: Install dependencies
run: |
python -m pip install -U pip
python -m pip install wheel ninja opencv-python-headless onnx pytest-xdist
python -m pip install numpy==1.26.4
python -m pip install torch==${{matrix.torch}} torchvision==${{matrix.torchvision}} -f https://download.pytorch.org/whl/torch_stable.html
# install from github to get latest; install iopath first since fvcore depends on it
python -m pip install -U 'git+https://github.com/facebookresearch/iopath'
python -m pip install -U 'git+https://github.com/facebookresearch/fvcore'
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
python collect_env.py
- name: Build and install
run: |
CC=clang CXX=clang++ python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
python -m detectron2.utils.collect_env
python -m pip install ".[dev]"
- name: Install DPText-DETR
run: |
git clone https://github.com/maps-as-data/DPText-DETR.git
python -m pip install 'git+https://github.com/maps-as-data/DPText-DETR.git' # Install DPText-DETR
python -m pip install numpy==1.26.4
wget https://huggingface.co/rwood-97/DPText_DETR_ArT_R_50_poly/resolve/main/art_final.pth
- name: Run DPText-DETR unittests
run: |
python -m pytest test_text_spotting/test_dptext_runner.py
- name: Install DeepSolo
run: |
git clone https://github.com/maps-as-data/DeepSolo.git
python -m pip install 'git+https://github.com/maps-as-data/DeepSolo.git' --force-reinstall --no-deps # Install DeepSolo
python -m pip install numpy==1.26.4
wget https://huggingface.co/rwood-97/DeepSolo_ic15_res50/resolve/main/ic15_res50_finetune_synth-tt-mlt-13-15-textocr.pth
- name: Run DeepSolo unittests
run: |
python -m pytest test_text_spotting/test_deepsolo_runner.py
- name: Install MapTextPipeline
run: |
git clone https://github.com/maps-as-data/MapTextPipeline.git
python -m pip install 'git+https://github.com/maps-as-data/MapTextPipeline.git' --force-reinstall --no-deps # Install MapTextPipeline
python -m pip install "numpy<2.0.0"
wget https://huggingface.co/rwood-97/MapTextPipeline_rumsey/resolve/main/rumsey-finetune.pth
- name: Run MapTextPipeline unittests
run: |
python -m pytest test_text_spotting/test_maptext_runner.py
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ _ADD NEW CHANGES HERE_
- All file loading methods now support `pathlib.Path` and `gpd.GeoDataFrame` objects as input ([#495](https://github.com/maps-as-data/MapReader/pull/495))
- Loading of dataframes from GeoJSON files now supported in many file loading methods (e.g. `add_metadata`, `Annotator.__init__`, `AnnotationsLoader.load`, etc.) ([#495](https://github.com/maps-as-data/MapReader/pull/495))
- `load_frames.py` added to `mapreader.utils`. This has functions for loading from various file formats (e.g. CSV, Excel, GeoJSON, etc.) and converting to GeoDataFrames ([#495](https://github.com/maps-as-data/MapReader/pull/495))
- Added tests for text spotting code ([#500](https://github.com/maps-as-data/MapReader/pull/500))

### Changed

Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Running tests
=============

To run the tests for MapReader, you will need to have installed the **dev dependencies** as described above.
To run the tests for MapReader, you will need to have installed the **dev dependencies** (as described :doc:`here </getting-started/installation-instructions/2-install-mapreader>`.

Also, if you have followed the "Install from PyPI" instructions, you will need to clone the MapReader repository to access the tests. i.e.:
.. note:: If you have followed the "Install from PyPI" instructions, you will also need to clone the MapReader repository to access the tests. i.e.:

.. code-block:: bash
Expand All @@ -18,3 +18,44 @@ You can then run the tests using from the root of the MapReader directory using
python -m pytest -v
If all tests pass, this means that MapReader has been installed and is working as expected.

Testing text spotting
---------------------

The tests for the text spotting code are separated from the main tests due to dependency conflicts.

You will only be able to run the text spotting tests for the text spotting framework (DPTextDETR, DeepSolo or MapTextPipeline) you have installed.

For DPTextDETR, use the following commands:

.. code-block:: bash
cd path/to/MapReader # change this to your path, e.g. cd ~/MapReader
conda activate mapreader
export ADET_PATH=path/to/DPTextDETR # change this to the path where you have saved the DPTextDETR repository
wget https://huggingface.co/rwood-97/DPText_DETR_ArT_R_50_poly/resolve/main/art_final.pth # download the model weights
python -m pytest -v tests_text_spotting/test_dptext_runner.py
For DeepSolo:

.. code-block:: bash
cd path/to/MapReader # change this to your path, e.g. cd ~/MapReader
conda activate mapreader
export ADET_PATH=path/to/DeepSolo # change this to the path where you have saved the DeepSolo repository
wget https://huggingface.co/rwood-97/DeepSolo_ic15_res50/resolve/main/ic15_res50_finetune_synth-tt-mlt-13-15-textocr.pth # download the model weights
python -m pytest -v tests_text_spotting/test_deepsolo_runner.py
For MapTextPipeline:

.. code-block:: bash
cd path/to/MapReader # change this to your path, e.g. cd ~/MapReader
conda activate mapreader
export ADET_PATH=path/to/MapTextPipeline # change this to the path where you have saved the MapTextPipeline repository
wget https://huggingface.co/rwood-97/MapTextPipeline_rumsey/resolve/main/rumsey-finetune.pth # download the model weights
python -m pytest -v tests_text_spotting/test_maptext_runner.py
If all tests pass, this means that the text spotting framework has been installed and is working as expected.
228 changes: 228 additions & 0 deletions test_text_spotting/test_deepsolo_runner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
from __future__ import annotations

import os
import pathlib
import pickle

import adet
import geopandas as gpd
import pandas as pd
import pytest
from detectron2.engine import DefaultPredictor
from detectron2.structures.instances import Instances

from mapreader import DeepSoloRunner
from mapreader.load import MapImages

print(adet.__version__)

# use cloned DeepSolo path if running in github actions
ADET_PATH = (
pathlib.Path("./DeepSolo/").resolve()
if os.getenv("GITHUB_ACTIONS") == "true"
else pathlib.Path(os.getenv("ADET_PATH")).resolve()
)


@pytest.fixture
def sample_dir():
return pathlib.Path(__file__).resolve().parent.parent / "tests" / "sample_files"


@pytest.fixture
def init_dataframes(sample_dir, tmp_path):
"""Initializes MapImages object (with metadata from csv and patches) and creates parent and patch dataframes.
Returns
-------
tuple
path to parent and patch dataframes
"""
maps = MapImages(f"{sample_dir}/mapreader_text.png")
maps.add_metadata(f"{sample_dir}/mapreader_text_metadata.csv")
maps.patchify_all(patch_size=800, path_save=tmp_path)
maps.check_georeferencing()
parent_df, patch_df = maps.convert_images()
return parent_df, patch_df


@pytest.fixture(scope="function")
def mock_response(monkeypatch, sample_dir):
def mock_pred(self, *args, **kwargs):
with open(f"{sample_dir}/patch-0-0-800-40-deepsolo-pred.pkl", "rb") as f:
outputs = pickle.load(f)
return outputs

monkeypatch.setattr(DefaultPredictor, "__call__", mock_pred)


@pytest.fixture
def init_runner(init_dataframes):
parent_df, patch_df = init_dataframes
runner = DeepSoloRunner(
patch_df,
parent_df=parent_df,
cfg_file=f"{ADET_PATH}/configs/R_50/IC15/finetune_150k_tt_mlt_13_15_textocr.yaml",
)
return runner


@pytest.fixture
def runner_run_all(init_runner, mock_response):
runner = init_runner
_ = runner.run_all()
return runner


def test_deepsolo_init(init_dataframes):
parent_df, patch_df = init_dataframes
runner = DeepSoloRunner(
patch_df,
parent_df=parent_df,
cfg_file=f"{ADET_PATH}/configs/R_50/IC15/finetune_150k_tt_mlt_13_15_textocr.yaml",
)
assert isinstance(runner, DeepSoloRunner)
assert isinstance(runner.predictor, DefaultPredictor)
assert isinstance(runner.parent_df.iloc[0]["coordinates"], tuple)
assert isinstance(runner.patch_df.iloc[0]["coordinates"], tuple)


def test_deepsolo_init_str(init_dataframes, tmp_path):
parent_df, patch_df = init_dataframes
parent_df = parent_df.to_csv(f"{tmp_path}/parent_df.csv")
patch_df = patch_df.to_csv(f"{tmp_path}/patch_df.csv")
runner = DeepSoloRunner(
f"{tmp_path}/patch_df.csv",
parent_df=f"{tmp_path}/parent_df.csv",
cfg_file=f"{ADET_PATH}/configs/R_50/IC15/finetune_150k_tt_mlt_13_15_textocr.yaml",
)
assert isinstance(runner, DeepSoloRunner)
assert isinstance(runner.predictor, DefaultPredictor)
assert isinstance(runner.parent_df.iloc[0]["coordinates"], tuple)
assert isinstance(runner.patch_df.iloc[0]["coordinates"], tuple)


def test_deepsolo_init_pathlib(init_dataframes, tmp_path):
parent_df, patch_df = init_dataframes
parent_df = parent_df.to_csv(f"{tmp_path}/parent_df.csv")
patch_df = patch_df.to_csv(f"{tmp_path}/patch_df.csv")
runner = DeepSoloRunner(
pathlib.Path(f"{tmp_path}/patch_df.csv"),
parent_df=pathlib.Path(f"{tmp_path}/parent_df.csv"),
cfg_file=f"{ADET_PATH}/configs/R_50/IC15/finetune_150k_tt_mlt_13_15_textocr.yaml",
)
assert isinstance(runner, DeepSoloRunner)
assert isinstance(runner.predictor, DefaultPredictor)
assert isinstance(runner.parent_df.iloc[0]["coordinates"], tuple)
assert isinstance(runner.patch_df.iloc[0]["coordinates"], tuple)


def test_deepsolo_init_tsv(init_dataframes, tmp_path):
parent_df, patch_df = init_dataframes
parent_df = parent_df.to_csv(f"{tmp_path}/parent_df.tsv", sep="\t")
patch_df = patch_df.to_csv(f"{tmp_path}/patch_df.tsv", sep="\t")
runner = DeepSoloRunner(
f"{tmp_path}/patch_df.tsv",
parent_df=f"{tmp_path}/parent_df.tsv",
delimiter="\t",
cfg_file=f"{ADET_PATH}/configs/R_50/IC15/finetune_150k_tt_mlt_13_15_textocr.yaml",
)
assert isinstance(runner, DeepSoloRunner)
assert isinstance(runner.predictor, DefaultPredictor)
assert isinstance(runner.parent_df.iloc[0]["coordinates"], tuple)
assert isinstance(runner.patch_df.iloc[0]["coordinates"], tuple)


def test_deepsolo_run_all(init_runner, mock_response):
runner = init_runner
# dict
out = runner.run_all()
assert isinstance(out, dict)
assert "patch-0-0-800-40-#mapreader_text.png#.png" in out.keys()
assert isinstance(out["patch-0-0-800-40-#mapreader_text.png#.png"], list)
# dataframe
out = runner._dict_to_dataframe(runner.patch_predictions, geo=False, parent=False)
assert isinstance(out, pd.DataFrame)
assert set(out.columns) == set(["image_id", "geometry", "text", "score"])
assert "patch-0-0-800-40-#mapreader_text.png#.png" in out["image_id"].values


def test_deepsolo_convert_to_parent(runner_run_all, mock_response):
runner = runner_run_all
# dict
out = runner.convert_to_parent_pixel_bounds()
assert isinstance(out, dict)
assert "mapreader_text.png" in out.keys()
assert isinstance(out["mapreader_text.png"], list)
# dataframe
out = runner._dict_to_dataframe(runner.parent_predictions, geo=False, parent=True)
assert isinstance(out, pd.DataFrame)
assert set(out.columns) == set(
["image_id", "patch_id", "geometry", "text", "score"]
)
assert "mapreader_text.png" in out["image_id"].values


def test_deepsolo_convert_to_parent_coords(runner_run_all, mock_response):
runner = runner_run_all
# dict
out = runner.convert_to_coords()
assert isinstance(out, dict)
assert "mapreader_text.png" in out.keys()
assert isinstance(out["mapreader_text.png"], list)
# dataframe
out = runner._dict_to_dataframe(runner.geo_predictions, geo=True, parent=True)
assert isinstance(out, gpd.GeoDataFrame)
assert set(out.columns) == set(
["image_id", "patch_id", "geometry", "crs", "text", "score"]
)
assert "mapreader_text.png" in out["image_id"].values
assert out.crs == runner.parent_df.crs


def test_deepsolo_deduplicate(sample_dir, tmp_path, mock_response):
maps = MapImages(f"{sample_dir}/mapreader_text.png")
maps.add_metadata(f"{sample_dir}/mapreader_text_metadata.csv")
maps.patchify_all(patch_size=800, path_save=tmp_path, overlap=0.5)
maps.check_georeferencing()
parent_df, patch_df = maps.convert_images()
runner = DeepSoloRunner(
patch_df,
parent_df=parent_df,
cfg_file=f"{ADET_PATH}/configs/R_50/IC15/finetune_150k_tt_mlt_13_15_textocr.yaml",
)
_ = runner.run_all()
out = runner.convert_to_parent_pixel_bounds(deduplicate=False)
len_before = len(out["mapreader_text.png"])
runner.parent_predictions = {}
out_07 = runner.convert_to_parent_pixel_bounds(deduplicate=True)
len_07 = len(out_07["mapreader_text.png"])
print(len_before, len_07)
assert len_before >= len_07
runner.parent_predictions = {}
out_05 = runner.convert_to_parent_pixel_bounds(deduplicate=True, min_ioa=0.5)
len_05 = len(out_05["mapreader_text.png"])
print(len_before, len_05)
assert len_before >= len_05
assert len_07 >= len_05


def test_deepsolo_run_on_image(init_runner, mock_response):
runner = init_runner
out = runner.run_on_image(
runner.patch_df.iloc[0]["image_path"], return_outputs=True
)
assert isinstance(out, dict)
assert "instances" in out.keys()
assert isinstance(out["instances"], Instances)


def test_deepsolo_save_to_geojson(runner_run_all, tmp_path, mock_response):
runner = runner_run_all
_ = runner.convert_to_coords()
runner.save_to_geojson(f"{tmp_path}/text.geojson")
assert os.path.exists(f"{tmp_path}/text.geojson")
gdf = gpd.read_file(f"{tmp_path}/text.geojson")
assert isinstance(gdf, gpd.GeoDataFrame)
assert set(gdf.columns) == set(
["image_id", "patch_id", "geometry", "crs", "text", "score"]
)
Loading

0 comments on commit a87b5bc

Please sign in to comment.