Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non isomorphic parsing/formatting for bold/italic with spaces #908

Closed
4 tasks done
SamyPesse opened this issue Nov 29, 2021 · 7 comments
Closed
4 tasks done

Non isomorphic parsing/formatting for bold/italic with spaces #908

SamyPesse opened this issue Nov 29, 2021 · 7 comments
Labels
💪 phase/solved Post is done

Comments

@SamyPesse
Copy link

Initial checklist

Affected packages and versions

[email protected]

Link to runnable example

https://codesandbox.io/s/cocky-meitner-88li6

Steps to reproduce

To reproduce, parse the following markdown:

**Our **_**developer**_** guides** and APIs have a home of their own now.

Expected behavior

This markdown snippet works on GitHub:

Our developer guides and APIs have a home of their own now.

Actual behavior

The markdown snipped is being reprocessed at:

**Our **\_**developer**\_\*\* guides\*\* and APIs have a home of their own now.

Runtime

Node v14

Package manager

yarn v2

OS

Linux, macOS

Build and bundle tools

esbuild

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Nov 29, 2021
@SamyPesse
Copy link
Author

To provide a bit more context, in our application users can select text which leading/trailing spaces and format it as bold/italic, basically something like:

hello<bold> world </bold>!

It was leading to issues when generating markdown with remark, because the following is not a valid markdown:

hello** world **!

So we implemented a custom logic to trim the inner content and move the spaces outside the bold/italic and other marks. But it can lead to more complex tree and remark generated the following markdown:

**Our **_**developer**_** guides** and APIs have a home of their own now.

that it can't parse after.

I'm seeing 2 issues:

  1. remark should probably trim the inner content of bold/italic/code to avoid generating invalid markup(ex it should generate **world** instead of ** world **.
  2. remark cannot parse this markdown that works on GitHub

@ChristianMurphy
Copy link
Member

Likely related to syntax-tree/mdast-util-to-markdown#12

@wooorm
Copy link
Member

wooorm commented Nov 29, 2021

remark should probably trim the inner content of bold/italic/code to avoid generating invalid markup(ex it should generate **world** instead of ** world **.

I dunno on the first point. Your code here is generating an object model that is impossible to make with markdown syntax. Take the DOM:

p = document.createElement('p')
h1 = document.createElement('h1')
h1.textContent = 'Hi!'
p.append(h1)

p.outerHTML // "<p><h1>Hi!</h1></p>"

d = document.createElement('div')
d.innerHTML = p.outerHTML;
d.outerHTML // "<div><p></p><h1>Hi!</h1><p></p></div>"

Especially with a vague language like markdown, I think there will always be cases that can easily be represented by JSON but are impossible to serialize/parse.


If you’re generating **Our **_**developer**_** guides**, why not generate **Our _developer_ guides** instead?

remark cannot parse this markdown that works on GitHub

Sure! Minimal repro: *a *__*b*__* c*

@SamyPesse
Copy link
Author

Especially with a vague language like markdown, I think there will always be cases that can easily be represented by JSON but are impossible to serialize/parse.

Yes, I was wondering if the case of trimming spaces in bold/italic should be something handled by remark or not. Maybe it's something we can implement as a plugin, similar to the rehype-minify-whitespace.

Because I can imagine the confusion when the following tree generates an invalid markdown:

{
    type: 'paragraph',
    children: [
        {
            type: 'strong',
            children: [
                {
                    type: 'text',
                    value: 'Hello ',
                },
            ],
        }
    ]
}

If you’re generating Our developer guides, why not generate Our developer guides instead?

Yes, I'm looking at improving this on our side in our step which is going from our AST into the remark AST.

@wooorm
Copy link
Member

wooorm commented Nov 29, 2021

What do you care most about? That it’s readable markdown? Or that it works?
Because readable would always have such problems (also in Chinese and other languages).

There might be something to be done in CommonMark, e.g., <-** a **-> or so might be possible (although this looks horrible). A character to force them to open or close even when they currently can’t.

And a plugin as you mention might indeed be useful to a lot of folks.

Alternatively, inject HTML instead. <b>, <i> and such?

@wooorm
Copy link
Member

wooorm commented Feb 4, 2022

I came up with a way to do it, I think: syntax-tree/unist#60 (comment).

@wooorm
Copy link
Member

wooorm commented Nov 1, 2024

This was fixed in https://github.com/syntax-tree/mdast-util-to-markdown/releases/tag/2.1.1:

import {remark} from 'remark'

const sourceMarkdown = `
**Our **_**developer**_** guides** and APIs have a home of their own now.
`;

const markdown1 = String(
  await remark().process(sourceMarkdown)
);

const markdown2 = String(
  await remark().process(markdown1)
);

console.log(markdown1 === markdown2) // `true`

Thanks :)

@wooorm wooorm closed this as completed Nov 1, 2024
@wooorm wooorm added the 💪 phase/solved Post is done label Nov 1, 2024
@github-actions github-actions bot removed the 🤞 phase/open Post is being triaged manually label Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 phase/solved Post is done
Development

No branches or pull requests

3 participants