Tools=>Illustration Fixup #595

okrick · 2024-12-23T22:53:30Z

The current GG2 Illustration Fixup tool has limitations:

Limited Detection of Mid-Paragraph Illustrations: It only reliably identifies mid-paragraph Illustrations if they are explicitly marked with an asterisk before the Illustration tag (e.g., *[Illustration...).

Failure to Detect Page Break Interruptions: The tool fails to recognize instances where a paragraph is interrupted by a page break, followed by an Illustration, and potentially more page breaks before the paragraph resumes.

Dependence on Manual Asterisk Placement: Proofreaders often omit the necessary asterisk when the Illustration occupies an entire page, making these instances difficult for the tool to detect.

Addressing these limitations would require enhancements to the GG2 Illustration Fixup tool:

Improved Contextual Analysis: The tool could be enhanced to analyze paragraph flow across page breaks, considering the presence of Illustrations as potential interruptions.

Current Workaround:

To manually identify potential paragraph interruptions caused by Illustrations, I currently use the following search term: ^-+[^\n+]+-\n+\*?\[(Illustration|Music)

This search term helps locate images that might be breaking paragraphs by targeting images at the top of pages.

Note: I realize a complete solution might not be feasible. I only ask that the problem be given some consideration.

henry.txt
henry.txt.json

The text was updated successfully, but these errors were encountered:

windymilla · 2024-12-24T17:46:37Z

Thanks for the suggestions, Rick.

Notes (partly for whoever looks at this)

Rick's regex relies on the "----- File:... -----" page break lines still being in the file at the time you use it.
I believe it just finds illos at the top of a page, but doesn't detect if they are mid-paragraph
There are some cases it's not possible to detect, e.g.

this is the final line of a paragraph (or is it mid-paragraph) - who can tell?
-----File: 024.png---------------------------------------------------------

[Illustration]

This is the next line of text, but is it a new paragraph, or is the blank line
above to set it off from the illo? 
-----File: 025.png---------------------------------------------------------

okrick · 2024-12-24T18:13:11Z

It may never be possible to catch all but more would be helpful. Relying on the asterisk alone misses far too many and may leave the PPer surprised later. I was astonished when only one was found in the entire file (already corrected before I zipped it).

Perhaps the better solution may be to provide a manual tool similar to the GG1 Tools=>Character Tools=>Search for Transliterations or the GG2 Tools=>Stealth Scannos tools. And skip identifying the asterisks in the illustration check--there are several other checks for asterisks in the menus, e.g. Search=>Find Asterisk w/o Slash.

windymilla · 2024-12-26T21:03:42Z

I composed a reply, but obviously never clicked the final "Comment" button to post it.
My suggestion is that we improve the checking to catch the following case:

this is the final line of a paragraph (or is it mid-paragraph) - who can tell?
-----File: 024.png---------------------------------------------------------

[Illustration]
-----File: 025.png---------------------------------------------------------

-----File: 026.png---------------------------------------------------------
This is the next line of text, but is it a new paragraph, or is the blank line
above to set it off from the illo?

So the algorithm would be:

Look forward from the Illo markup, skipping blank lines and "-----File" lines until we find the next bit of "real" text.
If there is not a blank line immediately before the "real" text, the illo is mid-paragraph.

I think this would catch a lot of the cases where the formatter hasn't marked it as a mid-para illo, and shouldn't produce false positives.

okrick · 2024-12-26T21:42:16Z

I concur. While it might miss a few instances, this approach should significantly improve the situation.

One potential limitation is the program's ability to accurately identify instances where multiple illustrations occur consecutively. I'll leave the decision regarding the feasibility of implementing a longer lookahead mechanism to your discretion.

this is the final line of a paragraph (or is it mid-paragraph) - who can tell?
-----File: 024.png---------------------------------------------------------

[Illustration]
-----File: 025.png---------------------------------------------------------

-----File: 026.png---------------------------------------------------------

[Illustration]
-----File: 027.png---------------------------------------------------------

-----File: 028.png---------------------------------------------------------
This is the next line of text, but is it a new paragraph, or is the blank line

windymilla · 2025-01-05T16:27:22Z

@okrick - I've changed the code to cope with the above situation of multiple illos & blank lines, and also with illos that span more than one line, and even even ones that have blank lines within the caption, so in the next release you would get the following illos all reported as being MID-PARAGRAPH:


At the end of the Yser battle, after the 29^{th} of
October 1914, Oud-Stuyvekenskerke was only occupied for
-----File: 031.png---------------------------------------------------------

[Illustration: <sc>Dixmude.</sc>--Aerial photo (Mai 26^{th} 1917).]
-----File: 032.png---------------------------------------------------------

[Illustration: <sc>Dixmude.</sc>--Their Majesties King and Queen at the "Death trench".
(June 1^{st} 1917).]

[Illustration: <sc>Dixmude.</sc>--Their Majesties King and Queen at the Riderswork.
(June 1^{st} 1917).

/#
The Queen examining private J. Vermeire's helmet,
which had just been pierced by a German bullet.
#/
]
-----File: 033.png---------------------------------------------------------
a few days by weak German detachements, whilst our line
of defence had been brought back upon the Nieuport-Dixmude
railway line and rejoining the Yser at the

It now looks forward to find the first "normal" line, i.e. not an empty line, a `[Blank Page]`, another illo/SN, nor a page separator line. Then it finds a normal line, it checks if the line above it is blank, meaning it's the start of a paragraph. If not, then the illo/SN is mid-paragraph. Fixes DistributedProofreaders#595

okrick · 2025-01-05T16:46:23Z

Wow, that's quite an accomplishment.

Thanks

windymilla added the core feature Required for basic PPing label Dec 27, 2024

windymilla mentioned this issue Jan 5, 2025

Improve detection of mid-para illos/SNs #644

Merged

windymilla closed this as completed in #644 Jan 5, 2025

windymilla closed this as completed in a78562f Jan 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tools=>Illustration Fixup #595

Tools=>Illustration Fixup #595

okrick commented Dec 23, 2024

windymilla commented Dec 24, 2024

okrick commented Dec 24, 2024

windymilla commented Dec 26, 2024

okrick commented Dec 26, 2024

windymilla commented Jan 5, 2025

okrick commented Jan 5, 2025

Tools=>Illustration Fixup #595

Tools=>Illustration Fixup #595

Comments

okrick commented Dec 23, 2024

windymilla commented Dec 24, 2024

okrick commented Dec 24, 2024

windymilla commented Dec 26, 2024

okrick commented Dec 26, 2024

windymilla commented Jan 5, 2025

okrick commented Jan 5, 2025