Skip to content

Commit

Permalink
Step function docs
Browse files Browse the repository at this point in the history
  • Loading branch information
mikejcorey committed Jul 25, 2024
1 parent fdbf080 commit 939c7ad
Show file tree
Hide file tree
Showing 6 changed files with 105 additions and 12 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 11 additions & 9 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,15 +52,6 @@ Deed Machine full workflow
modules/downloading-new-results.rst
modules/mapping-covenants.rst
modules/manual-data-cleaning.rst

.. toctree::
:maxdepth: 2
:caption: Step function lambdas

modules/lambdas/mp-covenants-split-pages.rst
modules/lambdas/mp-covenants-ocr-page.rst
modules/lambdas/mp-covenants-term-search-basic.rst
modules/lambdas/mp-covenants-resize-image.rst

.. toctree::
:maxdepth: 2
Expand All @@ -71,6 +62,17 @@ Deed Machine full workflow
modules/apps-parcel-models.rst
modules/apps-plat-models.rst
modules/apps-zoon-models.rst

.. toctree::
:maxdepth: 2
:caption: Step function lambdas

modules/lambdas/mp-covenants-split-pages.rst
modules/lambdas/mp-covenants-ocr-page.rst
modules/lambdas/mp-covenants-term-search-basic.rst
modules/lambdas/mp-covenants-resize-image.rst
modules/lambdas/mp-covenants-fake-ocr.rst


Indices and tables
==================
Expand Down
44 changes: 41 additions & 3 deletions docs/modules/components.rst
Original file line number Diff line number Diff line change
@@ -1,19 +1,26 @@
Components
==========

Django core component
---------------------

The `Django component of the Deed Machine <https://github.com/UMNLibraries/racial_covenants_processor>`_ can generally be thought of as the conductor or hub that turns raw processing results into structured data, facilitates the import and export to the Zooniverse crowdsourcing platform, allows for manual GUI editing, and manages final data exports.

- See :ref:`django-models`

Standalone deed uploader
------------------------

Often deed images are stored on a local machine or network drive, and it's not feasible or efficient to move them. This standalone uploader is designed to avoid the user having to do a full install on this computer.

- `mp-upload-deed-images-standalone <https://github.com/UMNLibraries/mp-upload-deed-images-standalone>`_

Lambda functions used for OCR step machine
DeedPageProcessor step function components
------------------------------------------

.. image:: ../_static/DeedMachineStepFunction20240723.png
:width: 800
:alt: A diagram showing the logic of the Deed Machine step function as of July 24. After Start, a lambda invoke splits pages if needed. The output enters a choice state. If no pages ready for processing are returned, the step function ends. If pages ready for processing are returned, then loop through each page. For each page, lambda invoke OCR's the page. Output from the OCR step is directed to a parallel state that invokes lambdas for a basic racial terms search and Web image optimization.
:alt: A diagram showing the logic of the Deed Machine step function as of July 24, 2024. After Start, a lambda invoke splits pages if needed. The output enters a choice state. If no pages ready for processing are returned, the step function ends. If pages ready for processing are returned, then loop through each page. For each page, lambda invoke OCR's the page. Output from the OCR step is directed to a parallel state that invokes lambdas for a basic racial terms search and Web image optimization.


The individual lambda functions that make up the OCR, term search and web image optimization processes are in separate repositories:
Expand All @@ -23,4 +30,35 @@ The individual lambda functions that make up the OCR, term search and web image
- :ref:`mp-covenants-term-search-basic`
- :ref:`mp-covenants-resize-image`

- `mp-covenants-fake-ocr <https://github.com/UMNLibraries/mp-covenants-fake-ocr>`_

.. image:: ../_static/TermSearchRefreshStepFunction20240725.png
:width: 200
:align: right
:alt: A diagram showing the logic of the Deed Machine FAKE OCR step function as of July 25, 2024. After Start, a lambda invoke performs a basic racial terms search.


TermSearchRefresh step function components
------------------------------------------



This step function is triggered by the Django management command `trigger_term_search_refresh`. Lambda function for term search is stored in a separate repository and is identical to above:

- :ref:`mp-covenants-term-search-basic`


DeedPageProcessorFAKEOCR step function components
-------------------------------------------------

.. image:: ../_static/DeedPageProcessorFAKEOCRStepFunction20240725.png
:width: 400
:align: right
:alt: A diagram showing the logic of the Deed Machine FAKE OCR step function as of July 25, 2024. After Start, a lambda invoke simulates a re-run of OCR. Output is directed to a parallel state that invokes lambdas for a basic racial terms search and Web image optimization.


This step function is triggered by the Django management command `trigger_lambda_refresh`. The individual lambda functions that make up the OCR simulation, term search and web image optimization processes are in separate repositories:

- :ref:`mp-covenants-fake-ocr`
- :ref:`mp-covenants-term-search-basic`
- :ref:`mp-covenants-resize-image`

2 changes: 2 additions & 0 deletions docs/modules/django-models.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _django-models:

Django model structure
======================

Expand Down
51 changes: 51 additions & 0 deletions docs/modules/lambdas/mp-covenants-fake-ocr.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
.. _mp-covenants-fake-ocr:

mp-covenants-fake-ocr
===============================

- Code: `mp-covenants-fake-ocr on Github <https://github.com/UMNLibraries/mp-covenants-fake-ocr>`_

This component is designed to mimic actions of OCR step, but skip actual OCR, to avoid needing to re-OCR files that have already been OCRed, which is relatively expensive. The function opens a previously created OCR JSON saved to s3 and passes data about it on to the next step. This lambda is only used for DeedPageProcessorFAKEOCR, which is run to correct errors in post-OCR stages of the main DeedPageProcessor Step Function. This step function is triggered by the Django management command `trigger_lambda_refresh`


Steps of the function
---------------------

1. Check event for valid raw image path.
2. Determine matching OCR JSON and OCR TXT s3 paths, which should already exist
3. Open pre-existing OCR JSON file
4. Generate new UUID and stats, save stats object
5. Pass output on to next stage.

Software development requirements
---------------------------------

The Lambda components of the Deed Machine are built using Amazon's Serverless Application Model (SAM) and the AWS SAM CLI tool.

- Pipenv (Can use other virtual environments, but will require fiddling on your part)
- AWS SAM CLI
- Docker
- Python 3

Quickstart commands
-------------------

To build the application:

.. code-block:: bash
pipenv install
pipenv shell
sam build
To rebuild and deploy the application:

.. code-block:: bash
sam build && sam deploy
To run tests:

.. code-block:: bash
pytest

0 comments on commit 939c7ad

Please sign in to comment.