Allow html tags that contain only spaces #155

tthkbw · 2024-10-27T17:42:59Z

markdownify converts this:

one two

into:

oneone

because the function chomp returns the text inside the tags as an empty string.

One can argue that empty tags like this should not be allowed, but the epub The Last Dangerous Visions from Blackstone Publishing has this structure in dozens of locations and the space inside the tag is the space between two words of the text. This results in my epub reader showing a bunch of words without spaces between them when I use markdownify to convert the text of the epub files into markdown.

I don't pretend to understand how the markdownify code works in detail, but modifying chomp to look like this fixes the issue for me:

def chomp(text):
    """
    If the text in an inline tag like b, a, or em contains a leading or trailing
    space, strip the string and return a space as suffix of prefix, if needed.
    This function is used to prevent conversions like
        <b> foo</b> => ** foo**
    """
    #change to allow empty tags: "<i> </i>" and maintain the space
    if text.isspace():
        prefix = ''
        suffix = ''
        text = ' '
    else:
        prefix = ' ' if text and text[0] == ' ' else ''
        suffix = ' ' if text and text[-1] == ' ' else ''
        text = text.strip()
    return (prefix, suffix, text)

An unfortunate side effect of my fix is that this turns into something similar to &nbsp.

The text was updated successfully, but these errors were encountered:

AlexVonB · 2024-11-24T20:05:59Z

Hey Terry, thanks for your issue! Unfortunately, your code converts   to ** ** which is not what should happen. Maybe you could preprocess your ebook by replacing all occurances of   with a single space?

chrispy-snps · 2025-01-02T18:41:40Z

Perhaps there could be a combined fix for #175 and this issue. Let's see where that discussion leads first.

jsm28 mentioned this issue Jan 3, 2025

Inconsistent handling of No-Break Space and Space #175

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow html tags that contain only spaces #155

Allow html tags that contain only spaces #155

tthkbw commented Oct 27, 2024 •

edited

Loading

AlexVonB commented Nov 24, 2024

chrispy-snps commented Jan 2, 2025 •

edited

Loading

Allow html tags that contain only spaces #155

Allow html tags that contain only spaces #155

Comments

tthkbw commented Oct 27, 2024 • edited Loading

AlexVonB commented Nov 24, 2024

chrispy-snps commented Jan 2, 2025 • edited Loading

tthkbw commented Oct 27, 2024 •

edited

Loading

chrispy-snps commented Jan 2, 2025 •

edited

Loading