Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to implement Markdown subset similar to StackOverflow's mini-Markdown #99

Open
Blinky32 opened this issue Sep 26, 2016 · 2 comments

Comments

@Blinky32
Copy link

What is the easiest way to implement something similar to StackOverflow comments section? They refer to it as 'mini-Markdown': only italic, bold and code is allowed. So a white list of Markdown tags. Everything else, including HTML and other MD tags should be displayed as is (passed through or HtmlEncoded) in order to avoid XSS and to match specific business requirements.

Basically I need to let my users mark some of their text as bold or italic. I would also also allow paragraphs and lists. But everything else (quotations, links, images, headings, any HTML) should be preserved and displayed AS IS (HTML encoded because it will be rendered within bigger HTML page). Essentially I'm inventing my own super strict and limited subset of Markdown, lets call it MarkdownSlim. I want to implement it with CommonMark.NET because I may need to extend it easily in future (allow more MD tags).

I can not simply pass input through CommonMarkConverter.Convert because it may find and convert MD tags that I don't support into HTML. So they will be displayed differently from how they were entered.

Would this be a right approach? I tried it but it will require more debugging and learning since it does not seem to be preserving all input.

if (block.Tag == BlockTag.List || block.Tag == _OTHER_TAGS_ALLOWED_BY_MARKDOWNSLIM ) {

    base.WriteBlock(block, isOpening, isClosing, out ignoreChildNodes);

} else {

    ignoreChildNodes = false;
    if (block.StringContent!= null) {
        this.Write(AntiXss.HtmlEncode(block.StringContent.ToString()));
    }
}

protected override void WriteInline(
    Inline inline, 
    bool isOpening, 
    bool isClosing, 
    out bool ignoreChildNodes) {

    if (inline.Tag == InlineTag.Emphasis 
              || inline.Tag == _OTHER_TAGS_ALLOWED_BY_MARKDOWNSLIM_ ) {

        base.WriteInline(inline, isOpening, isClosing, out ignoreChildNodes);

    } else {

        ignoreChildNodes = false;
        this.Write(AntiXss.HtmlEncode(inline.LiteralContent));      
    }
}

I feel like this is a very common use case and I could not find a good example and I'm not sure I'm even on the right track. There seem to be a LOT OF INTEREST in implementing 'safe markdown' and I think it should boil down to be able to easily implement subsets of Markdown like the one I've described. Maybe a good example on a wiki?

@Blinky32 Blinky32 changed the title How to implement mardown subset similar to StackOverflow's comment section mini-Mardown How to implement mardown subset similar to StackOverflow's mini-Mardown Sep 26, 2016
@Blinky32 Blinky32 changed the title How to implement mardown subset similar to StackOverflow's mini-Mardown How to implement Markdown subset similar to StackOverflow's mini-Mardown Sep 26, 2016
@Blinky32 Blinky32 changed the title How to implement Markdown subset similar to StackOverflow's mini-Mardown How to implement Markdown subset similar to StackOverflow's mini-Markdown Sep 26, 2016
@Knagis
Copy link
Owner

Knagis commented Oct 4, 2016

Yes, I would probably to something like this - create a custom renderer that only renders the markup that you allow. This would render some unsupported things like lists or headings as plain text while removing the markdown specifics. You can also run the parser on old markdown inputs once you extend the list of supported things.

As for creating "safe output" - I think it should be possible to just run the whole generated HTML output through XSS encoder so that it encodes everything. The problem is finding a good library for that. Some time ago there wasn't anything in .NET world (only one older library from Microsoft that no longer supported the sanitization option), now a quick search showed this one that seems promising: https://github.com/mganss/HtmlSanitizer/

@tiesont
Copy link

tiesont commented Nov 1, 2016

I use HtmlSanitizer for cleaning inputs where I have to allow HTML, and I've never really had problems with it. It has some nice extensibility points that allow you customize how you want to handle unwanted/disallowed inputs.

One caveat is that HtmlSanitizer recently switched to AngleSharp (from CsQuery) for parsing the entered markup, and AngleSharp seems to like introducing random breaking changes into their API (just something to be aware of). HtmlSanitizer handles that problem at this point by fixing their dependency on a specific version of AngleSharp as they can test it.

Just FYI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants