-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indent before HTML block elements causes indent in Markdown output #98
Comments
chrispy-snps
changed the title
Indent in
Indent before HTML block elements causes indent in Markdown output
Nov 26, 2023
<p>
causes indent in Markdown output
This seems to be a duplicate of issue #96. |
or rather #88 perhaps |
jsm28
added a commit
to jsm28/python-markdownify
that referenced
this issue
Apr 9, 2024
There are various cases in which inline text fails to be separated by (sufficiently many) newlines from adjacent block content. A paragraph needs a blank line (two newlines) separating it from prior text, as does an underlined header; an ATX header needs a single newline separating it from prior text. A list needs at least one newline separating it from prior text, but in general two newlines (for an ordered list starting other than at 1, which will only be recognized given a blank line before). To avoid accumulation of more newlines than necessary, take care when concatenating the results of converting consecutive tags to remove redundant newlines (keeping the greater of the number ending the prior text and the number starting the subsequent text). This is thus an alternative to matthewwithanm#108 that tries to avoid the excess newline accumulation that was a concern there, as well as fixing more cases than just paragraphs, and updating tests. Fixes matthewwithanm#92 Fixes matthewwithanm#98
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In our HTML, block elements are indented:
When HTML with indented block elements is converted, the indent causes incorrect formatting in the output.
Converting this indented
<p>
element:produces this:
It happens for non-
<p>
elements too. Converting these indented<h1>
elements with theUNDERLINED
andATX
heading formats:produces this:
As a workaround, we iterate through all text object descendants in all text-containing block elements (
<p>
,<entry>
,<li>
, etc.) and convert newlines to spaces, but this is expensive on large document sets.Possibly related to #31.
The text was updated successfully, but these errors were encountered: