Repeatable unescaping of html content leads to not valid html #5

Joyfolk · 2020-06-21T12:23:18Z

edsapi-php-sample/rest/EBSCOResponse.php

Line 1115 in b3e5f32

$data = html_entity_decode($data);

This line leads to invalid HTML for some documents (for example for /edsapi/rest/Retrieve?an=T115986&dbid=dmp) because of double decoding of HTML content (&lt; becomes < inside HTML body).

Looks like there is no reason to decode HTML content here - it is already decoded inside SimpleXML object. The only thing left to decode is the content of the <ephtml> tags which is double encoded.
So, this line should probably be something like this:

$data = preg_replace_callback('/<ephtml>(.*?)<\/ephtml>/m', function($escaped) {
            return html_entity_decode($escaped[0]);
}, $data);

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeatable unescaping of html content leads to not valid html #5

Repeatable unescaping of html content leads to not valid html #5

Joyfolk commented Jun 21, 2020

Repeatable unescaping of html content leads to not valid html #5

Repeatable unescaping of html content leads to not valid html #5

Comments

Joyfolk commented Jun 21, 2020