Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to treat , as a word delimiter for info-strings #246

Open
Nemo157 opened this issue Oct 16, 2022 · 6 comments · May be fixed by #328
Open

Option to treat , as a word delimiter for info-strings #246

Nemo157 opened this issue Oct 16, 2022 · 6 comments · May be fixed by #328

Comments

@Nemo157
Copy link

Nemo157 commented Oct 16, 2022

Currently when parsing the info-string to determine the codeblock language the word up to the first space character is used:

comrak/src/html.rs

Lines 490 to 492 in 03238b8

while first_tag < ncb.info.len() && !isspace(ncb.info[first_tag]) {
first_tag += 1;
}

This causes issues with markdown such as in the regex readme using the info-string rust,ignore, it is passed into the syntax highlighter as the string rust,ignore and applied as an attribute class="language-rust,ignore" on the element. Both rustdoc and github support the , character being a delimiter between the language and additional attributes (rustdoc actually supports more, but that's for back-compat, afaik only [ ,] is intended to be used).

@kivikakk
Copy link
Owner

Then we should definitely support it, imo. PRs happily accepted.

@CosmicHorrorDev
Copy link
Contributor

Just noticed this too. I'd be happy to take on implementing this!

@CosmicHorrorDev
Copy link
Contributor

Implementation-wise where exactly should this change go? Should it change the parsed AST / HTML to split the language for the info string on a comma (this would seem to diverge from cmark-gfm which includes the first word up to the space), or should it just change the syntect plugin to split off of a comma when trying to find a matching syntax?

@CosmicHorrorDev CosmicHorrorDev linked a pull request Jun 28, 2023 that will close this issue
@gjtorikian
Copy link
Collaborator

Out of pure curiosity, how does GitHub use this comma delimiting feature? I tried to find an example on their help docs but couldn’t find an explanation.

@Nemo157
Copy link
Author

Nemo157 commented Jun 28, 2023

or should it just change the syntect plugin to split off of a comma when trying to find a matching syntax?

We did this for docs.rs' highlighting plugin (using syntect as well, but we need to customize it more), it's ok for the actual highlighting, but fixing up the attributes in write_code_tag is a pain: https://github.com/rust-lang/docs.rs/blob/eb803472b52aac49fb0c8a736d7d74f87533e12d/src/web/markdown.rs#L37-L50

Out of pure curiosity, how does GitHub use this comma delimiting feature?

AFAIK it doesn't, it just strips everything after the comma at some point between markdown -> html.

@gjtorikian
Copy link
Collaborator

Ah, got it—the stripping is simply to remove the info, not do anything special with it. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants