-
-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fuzz result] code span vanishes when link destination is ` #136
Comments
``
The code that looks like the culprit is here: commonmark-hs/commonmark/src/Commonmark/Inlines.hs Lines 840 to 843 in 1875e9a
In this case, the end paren is in the middle of a "parsed code" chunk, which gets split into the link destination and some plain text.
The simplest solution is to make inline code bind tighter than link destinations, but this wouldn't match the reference implementation. The other simple options are to re-parse with the inline syntax parsers, or to interleave bracket matching with inline code span parsing. |
None of those are simple and obvious fixes, unfortunately. |
The three options that come to my mind are:
|
Fixes jgm#136 This works by re-parsing the tokens that come after the link, but only when the end delimiter isn't on a chunk boundary (since that's the only way this problem can happen). Re-parsing a specific chunk won't work, because the part that needs re-interpreted can span more than one chunk. For example, we can draw the bounds of the erroneous code chunk in this example: [x](`) <a href="`"> ^-----------^ If we re-parse the underlined part in isolation, we'll fix the first link, but won't find the HTML (since the closing angle bracket is in the next chunk). On the other hand, parsing links, code, and HTML in a single pass would make writing extensions more complicated. For example, LaTeX math is supposed to have the same binding strength as code spans: $first[$](about) ^------^ this is a math span, not a link [first]($)$5/8$ ^-^ this is an analogue of the original bug it shouldn't be a math span, but looks like one
Fixes jgm#136 This works by re-parsing the tokens that come after the link, but only when the end delimiter isn't on a chunk boundary (since that's the only way this problem can happen). Re-parsing a specific chunk won't work, because the part that needs re-interpreted can span more than one chunk. For example, we can draw the bounds of the erroneous code chunk in this example: [x](`) <a href="`"> ^-----------^ If we re-parse the underlined part in isolation, we'll fix the first link, but won't find the HTML (since the closing angle bracket is in the next chunk). On the other hand, parsing links, code, and HTML in a single pass would make writing extensions more complicated. For example, LaTeX math is supposed to have the same binding strength as code spans: $first[$](about) ^------^ this is a math span, not a link [first]($)$5/8$ ^-^ this is an analogue of the original bug it shouldn't be a math span, but looks like one
Fixes jgm#136 This works by re-parsing the tokens that come after the link, but only when the end delimiter isn't on a chunk boundary (since that's the only way this problem can happen). Re-parsing a specific chunk won't work, because the part that needs re-interpreted can span more than one chunk. For example, we can draw the bounds of the erroneous code chunk in this example: [x](`) <a href="`"> ^-----------^ If we re-parse the underlined part in isolation, we'll fix the first link, but won't find the HTML (since the closing angle bracket is in the next chunk). On the other hand, parsing links, code, and HTML in a single pass would make writing extensions more complicated. For example, LaTeX math is supposed to have the same binding strength as code spans: $first[$](about) ^------^ this is a math span, not a link [first]($)$5/8$ ^-^ this is an analogue of the original bug it shouldn't be a math span, but looks like one
Fixes jgm#136 This works by re-parsing the tokens that come after the link, but only when the end delimiter isn't on a chunk boundary (since that's the only way this problem can happen). Re-parsing a specific chunk won't work, because the part that needs re-interpreted can span more than one chunk. For example, we can draw the bounds of the erroneous code chunk in this example: [x](`) <a href="`"> ^-----------^ If we re-parse the underlined part in isolation, we'll fix the first link, but won't find the HTML (since the closing angle bracket is in the next chunk). On the other hand, parsing links, code, and HTML in a single pass would make writing extensions more complicated. For example, LaTeX math is supposed to have the same binding strength as code spans: $first[$](about) ^------^ this is a math span, not a link [first]($)$5/8$ ^-^ this is an analogue of the original bug it shouldn't be a math span, but looks like one
This would require a spec change. As for options 2 and 3, I'm not sure. I agree that 2 is ugly. So 3 has some appeal, but I'd have to see what is actually involved in going this way. |
I implemented option 2 in #137, but this implementation has potentially quadratic behavior. The trouble with option 3 is that the extensions also need redone, because |
Fixes jgm#136 This works by re-parsing the tokens that come after the link, but only when the end delimiter isn't on a chunk boundary (since that's the only way this problem can happen). Re-parsing a specific chunk won't work, because the part that needs re-interpreted can span more than one chunk. For example, we can draw the bounds of the erroneous code chunk in this example: [x](`) <a href="`"> ^-----------^ If we re-parse the underlined part in isolation, we'll fix the first link, but won't find the HTML (since the closing angle bracket is in the next chunk). On the other hand, parsing links, code, and HTML in a single pass would make writing extensions more complicated. For example, LaTeX math is supposed to have the same binding strength as code spans: $first[$](about) ^------^ this is a math span, not a link [first]($)$5/8$ ^-^ this is an analogue of the original bug it shouldn't be a math span, but looks like one
Fixes jgm#136 This works by re-parsing the tokens that come after the link, but only when the end delimiter isn't on a chunk boundary (since that's the only way this problem can happen). Re-parsing a specific chunk won't work, because the part that needs re-interpreted can span more than one chunk. For example, we can draw the bounds of the erroneous code chunk in this example: [x](`) <a href="`"> ^-----------^ If we re-parse the underlined part in isolation, we'll fix the first link, but won't find the HTML (since the closing angle bracket is in the next chunk). On the other hand, parsing links, code, and HTML in a single pass would make writing extensions more complicated. For example, LaTeX math is supposed to have the same binding strength as code spans: $first[$](about) ^------^ this is a math span, not a link [first]($)$5/8$ ^-^ this is an analogue of the original bug it shouldn't be a math span, but looks like one
This markdown:
In most engines I've tried with, including GitHub, it does this:
commonmark-hs generates this:
Events from pulldown-cmark:
Events from pandoc:
Events from commonmark.js:
The text was updated successfully, but these errors were encountered: