Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to parse any file with the HTML parser #209

Open
Zegnat opened this issue Nov 19, 2018 · 2 comments
Open

Try to parse any file with the HTML parser #209

Zegnat opened this issue Nov 19, 2018 · 2 comments

Comments

@Zegnat
Copy link
Member

Zegnat commented Nov 19, 2018

This became clear when inspecting indieweb/indiewebify-me#78.

@sknebel noted that several parsers will actually run on resources like Atom files, even though those are not HTML, and are then able to extract some useful data such as link relationships. php-mf2 does this too, except in cases where it first has to fetch the file from a remote URL.

But: files fetched from remote URLs have to be HTML to be allowed through to parsing. Should we remove this limitation?

Note that it is probably not technically correct to run any of the parser code on non-HTML documents. While Atom happens to include link elements the same way as HTML, that may not be true for all generic XML documents. I am unsure what the actual harm would be. Minimal I expect.

@sknebel
Copy link
Member

sknebel commented Nov 22, 2018

fetch should probably at least also accept application/xhtml+xml.

@snarfed
Copy link
Member

snarfed commented Jul 30, 2024

Got a request for this for Bridgy Publish: snarfed/bridgy#1766

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants