Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow processing of rgb images #8

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 37 additions & 35 deletions ocrd_detectron2/segment.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,45 +187,47 @@ class id to a new PAGE region type (and subtype).
else:
zoomed = 1.0

# for morphological post-processing, we will need the binarized image, too
page_image_bin, _, _ = self.workspace.image_from_page(
page, page_id,
feature_selector='binarized')
# workaround for OCR-D/core#687:
if 0 < abs(page_image_raw.width - page_image_bin.width) <= 2:
diff = page_image_raw.width - page_image_bin.width
if diff > 0:
page_image_raw = crop_image(
page_image_raw,
(int(np.floor(diff / 2)), 0,
page_image_raw.width - int(np.ceil(diff / 2)),
page_image_raw.height))
else:
page_image_bin = crop_image(
page_image_bin,
(int(np.floor(-diff / 2)), 0,
page_image_bin.width - int(np.ceil(-diff / 2)),
page_image_bin.height))
if 0 < abs(page_image_raw.height - page_image_bin.height) <= 2:
diff = page_image_raw.height - page_image_bin.height
if diff > 0:
page_image_raw = crop_image(
page_image_raw,
(0, int(np.floor(diff / 2)),
page_image_raw.width,
page_image_raw.height - int(np.ceil(diff / 2))))
else:
page_image_bin = crop_image(
page_image_bin,
(0, int(np.floor(-diff / 2)),
page_image_bin.width,
page_image_bin.height - int(np.ceil(-diff / 2))))
# check wether input image is binarized
if page_image_info.photometricInterpretation == "1":
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a strange thing to query: it is only defined if the image was a TIFF (Pillow's TIFF plugin), and it does not even discern binarized vs others: binarized would be .mode == '1'. This signifies BlackIsZero, which could be true for

  • binarized
  • grayscale 8-bit
  • grayscale 8-bit plus alpha 8-bit
  • grayscale 16-bit int
  • grayscale 32-bit int
  • grayscale 32-bit float

I don't understand the purpose yet: what did go wrong before?

# for morphological post-processing, we will need the binarized image, too
page_image_bin, _, _ = self.workspace.image_from_page(
page, page_id,
feature_selector='binarized')
# workaround for OCR-D/core#687:
if 0 < abs(page_image_raw.width - page_image_bin.width) <= 2:
diff = page_image_raw.width - page_image_bin.width
if diff > 0:
page_image_raw = crop_image(
page_image_raw,
(int(np.floor(diff / 2)), 0,
page_image_raw.width - int(np.ceil(diff / 2)),
page_image_raw.height))
else:
page_image_bin = crop_image(
page_image_bin,
(int(np.floor(-diff / 2)), 0,
page_image_bin.width - int(np.ceil(-diff / 2)),
page_image_bin.height))
if 0 < abs(page_image_raw.height - page_image_bin.height) <= 2:
diff = page_image_raw.height - page_image_bin.height
if diff > 0:
page_image_raw = crop_image(
page_image_raw,
(0, int(np.floor(diff / 2)),
page_image_raw.width,
page_image_raw.height - int(np.ceil(diff / 2))))
else:
page_image_bin = crop_image(
page_image_bin,
(0, int(np.floor(-diff / 2)),
page_image_bin.width,
page_image_bin.height - int(np.ceil(-diff / 2))))

# ensure RGB (if raw was merely grayscale)
if page_image_raw.mode == '1':
page_image_raw = page_image_raw.convert('L')
page_image_raw = page_image_raw.convert(mode='RGB')
page_image_bin = page_image_bin.convert(mode='1')
page_image_bin = page_image_raw.convert(mode='1')
Comment on lines -228 to +230
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not going to work well: binarization is usually more than a simple conversion – you have to find the best threshold, ideally localized across the image.

# reduce resolution to 300 DPI max
if zoomed != 1.0:
page_image_bin = page_image_bin.resize(
Expand Down Expand Up @@ -267,7 +269,7 @@ def _process_page(self, page, ignore, page_coords, page_id, page_array_raw, page
#page.set_TextRegion([])
page.set_custom('coords=%s' % page_coords['transform'])
height, width, _ = page_array_raw.shape
# get connected components to estimate scale
# get connected components to estimate ignorescale
Comment on lines -270 to +272
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

_, components = cv2.connectedComponents(page_array_bin.astype(np.uint8))
# estimate glyph scale (roughly)
_, counts = np.unique(components, return_counts=True)
Expand Down