Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added updated docs #503

Merged
merged 13 commits into from
Jan 14, 2025
172 changes: 138 additions & 34 deletions OCR/README.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,63 @@
## OCR
# OCR Layer - ReportVision

The **OCR Layer** in the ReportVision project processes document images, performs segmentation and optical character recognition (OCR), and computes accuracy metrics by comparing OCR outputs to ground truth data.

---

## Table of Contents
1. [Introduction](#introduction)
2. [Installation](#installation)
3. [Running the Application](#running-the-application)
4. [Development Tools](#development-tools)
5. [Testing](#testing)
6. [End-to-End Benchmarking](#end-to-end-benchmarking)
7. [Dockerized Development](#dockerized-development)
8. [Benchmarking](#end-to-end-benchmarking)
9. [Project Architecture](#project-architecture)
10. [API Endpoints](#api-endpoints)


---

## Introduction

The OCR layer uses **Poetry** for dependency management and virtual environment setup. It provides:
- An API for performing OCR operations.
- Support for benchmarking OCR accuracy.
- Configuration for different OCR models and segmentation templates.

### Installation

### Prerequisites
- Python 3.9 or later
- [Poetry](https://python-poetry.org/) for dependency management
- Docker (optional for containerized development)

```shell
pipx install poetry
```

### Running The Application
Activate the virtual environment and install dependencies, all subsequent commands assume you are in the virtual env

```shell
poetry shell
poetry install
```

Run unit tests

```shell
poetry run pytest
```

Run benchmark tests

```shell
cd tests
poetry run pytest benchmark_test.py -v
fastapi dev ocr/api.py
```

poetry run pytest bench_test.py -v
### Testing

Run main, hoping to convert this to a cli at some point
Run unit tests

```shell
poetry run main
```shell
poetry run pytest
```

To build the OCR service into an executable artifact

```shell
poetry run build
```
### Development Tools

Adding new dependencies

Expand Down Expand Up @@ -82,12 +101,37 @@ To run the API in prod mode
poetry run api
```

### Test Data Sets

You can also run the script `pytest run reportvision-dataset-1/medical_report_import.py` to pull in all relevant data.
To build the OCR service into an executable artifact

```shell
poetry run build
```

### Dockerized Development

It is also possible to run the project in a collection of docker containers. This is useful for development and testing purposes as it doesn't require any additional dependencies to be installed.

### Run end-to-end benchmarking
To start the containers, run the following command:

```shell
docker compose -f dev-env.yaml up
```

This will start the following containers:

- ocr: The OCR service container
- frontend: The frontend container

The frontend container will automatically reload when changes are made to the frontend. To access the frontend, navigate to http://localhost:5173 in your browser.

The OCR service container will restart automatically when changes are made to the OCR code. To access the API, navigate to http://localhost:8000/ in your browser.


### End to End Benchmarking

#### Overview
End-to-end benchmarking evaluates OCR accuracy by:

End-to-end benchmarking scripts can:

Expand Down Expand Up @@ -117,21 +161,81 @@ Run notes:
* Benchmark takes one second per segment for OCR using the default `trocr` model. Please be patient or set a counter to limit the number of files processed.
* Only one segment can be input at a time

### Dockerized Development

It is also possible to run the entire project in a collection of docker containers. This is useful for development and testing purposes as it doesn't require any additional dependencies to be installed on your local machine.
### Test Data Sets

To start the containers, run the following command:
You can run the script `pytest run reportvision-dataset-1/medical_report_import.py` to pull in all relevant data.

```shell
docker compose -f dev-env.yaml up
```

This will start the following containers:

- ocr: The OCR service container
- frontend: The frontend container
## Project Architecture

The frontend container will automatically reload when changes are made to the frontend code. To access the frontend, navigate to http://localhost:5173 in your browser.
The OCR Layer is organized as follows:

The OCR service container will restart automatically when changes are made to the OCR code. To access the API, navigate to http://localhost:8000/ in your browser.
- **`ocr/`**:
- **`api.py`**: Defines the API for the OCR service.
- **`main.py`**: Entry point script to run the OCR service.
- **`segmenter.py`**: Handles image segmentation based on templates and labels.
- **`ocr_engine.py`**: OCR logic using the specified OCR models.
- **`metrics.py`**: Computes metrics (e.g., confidence, Levenshtein distance) by comparing OCR results with ground truth.
- **`config.py`**: Contains configuration files for paths, environment variables, and model settings.

- **`tests/`**: Contains unit tests, integration tests, and benchmarking scripts.
- **`benchmark_test.py`**: Tests benchmarking logic for OCR and metrics.
- **`unit_test.py`**: Includes unit tests for individual components of the OCR service.
- **`benchmark_main.py`**: Main script for running end-to-end benchmarking, including segmentation, OCR, and metrics computation.

- **`data/`**: location of segmentation templates, labels, ground truth, and test datasets (not included in the repository by default).

- **`reportvision-dataset-1/`**: Example dataset folder for running benchmarks and tests.
- **`medical_report_import.py`**: Script to import and prepare medical reports for testing.

- **`Dockerfile`**: Defines the container for running the OCR service in a Dockerized environment.

- **`dev-env.yaml`**: Docker Compose file for setting up a development environment with containers for the OCR service and frontend.

- **`pyproject.toml`**: Poetry configuration file specifying project dependencies and settings.

- **`poetry.lock`**: Lock file generated by Poetry to ensure dependency consistency.

## API Endpoints

### For Swagger Docs Start API and go to /docs endpoint

ex: http://localhost:8000/docs

The OCR service exposes the following API endpoints:

#### Health Check
- **`GET /`**
- **Description**: Returns the status of the OCR service.
- **Response**: Status message indicating the service's health.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we also doing swagger type docs as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point I will add the link here.


#### Image Alignment
- **`POST /image_alignment/`**
- **Description**: Aligns a source image with a segmentation template.
- **Request Body**:
- `source_image` (Base64-encoded string): The source image to align.
- `segmentation_template` (Base64-encoded string): The segmentation template to align with.
- **Response**:
- Base64-encoded string of the aligned image.

#### Image File to Text
- **`POST /image_file_to_text/`**
- **Description**: Processes an image file and a segmentation template to extract text based on labeled regions.
- **Request Body**:
- `source_image` (file): The uploaded source image file.
- `segmentation_template` (file): The uploaded segmentation template file.
- `labels` (JSON string): Defines labeled regions in the segmentation template.
- **Response**:
- JSON object containing text extracted from labeled regions.

#### Image to Text
- **`POST /image_to_text`**
- **Description**: Processes Base64-encoded images and extracts text from labeled regions.
- **Request Body**:
- `source_image` (Base64-encoded string): The source image.
- `segmentation_template` (Base64-encoded string): The segmentation template.
- `labels` (JSON string): Defines labeled regions in the segmentation template.
- **Response**:
- JSON object containing text extracted from labeled regions.
4 changes: 2 additions & 2 deletions OCR/ocr/services/phdc_converter/builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -644,7 +644,7 @@ def _build_patient(self, patient: Patient) -> ET.Element:
)
patient_data.append(v)
else:
logging.warning(f"Race code {patient.race_code} not found in " "the OMB classification.")
logging.warning(f"Race code {patient.race_code} not found in the OMB classification.")

if patient.ethnic_group_code is not None:
if patient.ethnic_group_code in ethnicity_code_and_mapping:
Expand All @@ -658,7 +658,7 @@ def _build_patient(self, patient: Patient) -> ET.Element:
)
patient_data.append(v)
else:
logging.warning(f"Ethnic group code {patient.ethnic_group_code} not " "found in OMB classification.")
logging.warning(f"Ethnic group code {patient.ethnic_group_code} not found in OMB classification.")

return patient_data

Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,9 @@

## Overview

Describe the purpose of your project. Add additional sections as necessary to help collaborators and potential collaborators understand and use your project.

Please see the [UserGuide](./user_guide.md) to get a overview of this project.


## Public Domain Standard Notice
This repository constitutes a work of the United States Government and is not
subject to domestic copyright protection under 17 USC § 105. This repository is in
Expand Down
Loading
Loading