Try to parse any file with the HTML parser #209

Zegnat · 2018-11-19T13:15:12Z

This became clear when inspecting indieweb/indiewebify-me#78.

@sknebel noted that several parsers will actually run on resources like Atom files, even though those are not HTML, and are then able to extract some useful data such as link relationships. php-mf2 does this too, except in cases where it first has to fetch the file from a remote URL.

But: files fetched from remote URLs have to be HTML to be allowed through to parsing. Should we remove this limitation?

Note that it is probably not technically correct to run any of the parser code on non-HTML documents. While Atom happens to include link elements the same way as HTML, that may not be true for all generic XML documents. I am unsure what the actual harm would be. Minimal I expect.

The text was updated successfully, but these errors were encountered:

sknebel · 2018-11-22T14:00:07Z

fetch should probably at least also accept application/xhtml+xml.

snarfed · 2024-07-30T18:30:08Z

Got a request for this for Bridgy Publish: snarfed/bridgy#1766

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to parse any file with the HTML parser #209

Try to parse any file with the HTML parser #209

Zegnat commented Nov 19, 2018

sknebel commented Nov 22, 2018

snarfed commented Jul 30, 2024

Try to parse any file with the HTML parser #209

Try to parse any file with the HTML parser #209

Comments

Zegnat commented Nov 19, 2018

sknebel commented Nov 22, 2018

snarfed commented Jul 30, 2024