Handle control characters in usfm #223

linearcombination · 2024-10-16T20:13:05Z

This handles the STET en + ln (Lingala) case, but will handle others with this same class of source USFM errors.

Our test team noted that when building a doc for English->Lingala (ln), the build fails with an error This happened because the USFM for Lingala had control characters in the text, in particular ^B (STX in ASCII - start of text). Since there could be other source languages that have control characters I have augmented STET to strip them out prior to processing.

Refactored a little to move code to DOC since STET will take advantage of that and it is likely to impact DOC too.

linearcombination added 2 commits October 16, 2024 10:58

Handle NULL bytes and control characters in USFM for STET and DOC

2708635

Refactored a little to move code to DOC since STET will take advantage of that and it is likely to impact DOC too.

linearcombination merged commit e4ae96f into doc-dev.walink.org Oct 16, 2024
13 checks passed

linearcombination deleted the handle-control-characters-in-usfm branch October 18, 2024 20:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle control characters in usfm #223

Handle control characters in usfm #223

linearcombination commented Oct 16, 2024

Handle control characters in usfm #223

Handle control characters in usfm #223

Conversation

linearcombination commented Oct 16, 2024