Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URLs replaced in inappropriate contexts (i.e. inside <a href="">...) #4

Open
carljm opened this issue Jan 28, 2010 · 1 comment
Open

Comments

@carljm
Copy link

carljm commented Jan 28, 2010

This is similar/related to #3, but it's a broader issue, not specific to Wikipedia.

There is no context-sensitivity in the replacement, so we've had cases where a link to a Flickr photo (that was intended to be just a link) got replaced with totally invalid HTML:

<a href="http://www.flickr.com/photos/gruber/4309828383">something</a>

gets turned into:

<a href="<img src="http://farm3.static.flickr.com/2690/4309828383_6cc07082f6_m.jpg" alt="Jobs Listens to Mossberg\'s Ideas About What\'s Wrong With the iPad"></img>">something</a>

I realize that given the way OEmbed uses regexes, this is a tough nut to crack in the general case. Is the only real solution to never run OEmbed on chunks of text that might already contain HTML?

Apart from the heavyweight options that don't seem realistic (parsing the text into a DOM tree and only running OEmbed on the cdata nodes?), one simple "80%" fix would be to enforce at least one character of white-space on either end of the URL. Technically a link could have href=" http://..." but that's pretty unlikely, so I think this would improve the situation quite a bit.

Would a working patch like that be considered, or is this just a case of "don't do that"?

@carljm
Copy link
Author

carljm commented Jan 28, 2010

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant