-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<th> is stripped even though it's an allowed tag if not part of full table markup #110
Comments
In the first example, the var e = document.createElement("div");
e.innerHTML = "<th>Header</th>";
// e.innerHTML === "Header" |
Are you saying that HtmlSanitizer isn't intended to parse fragments? The context matters for a fragment because your example works fine with a tr instead of div
If fragment parsing is intended, then you would probably need to add a context parameter to the Santize method.
or
Or, a user would have to workaround by manually adding the full context to the string before calling the Sanitize method and removing it after. I have no idea if this is feasible - just thought I'd make a comment from an API perspective. |
@panetta-net-au Well, the HTML parser is AngleSharp, which in theory is supposed to give you the same "corrected" markup as most browsers (meaning it will try to correct fragments if possible). A |
I suppose it depends how you define a fragment. If a fragment is a node or nodes that can exist as a direct descendent of body - then yeah, I agree - but if a fragment is a node that can exist anywhere in the tree, then I'd say they're not currently supported. Regardless, the workaround is probably to wrap your orphaned fragments in the context they'll be used in before sending it through AngleSharp. Whether that's the responsibility of the user or this library - I think that is the debate. IMHO, if the user has access to the anglesharp document before and after sanitisation, that's probably enough to do what I suggested earlier. |
@panetta-net-au You're right, currently the context of fragments is always var sanitizer = new HtmlSanitizer();
var html = @"<th>Header</th>";
var actual = sanitizer.Sanitize("<table>" + html + "</table>");
actual = actual.Remove(0, "<table><tbody><tr>".Length);
actual = actual.Remove(actual.Length - "</tr></tbody></table>".Length); I have no plans to add a context feature to the library, but PRs are always welcome 😉 It might just be a couple of lines of code. I know AngleSharp accepts a context parameter when parsing fragments. |
I mean you can just use regex, regardless, I fee like this behavior was to prevent tag poisoning. |
sanitizer.Sanitize("<th>Header</th>")
Result is
Header
. However:sanitizer.Sanitize("<table><tr><th>Header</th></tr></table>")
Result is
<table><tbody><tr><th>Header</th></tr></tbody></table>
My expectation was that when parsing a fragment rather than a document the full structure would not be required.
The text was updated successfully, but these errors were encountered: