You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are using CSSBox DOM parser for parsing the HTML source, here is the implementation:
try (DocumentSource docSource = new StreamDocumentSource(JAFIOUtils.toInputStream(htmlSource),
null, "text/html;charset=UTF-8")) {
LOGGER.error("Before parse "+htmlSource);
// Parse the input document
DOMSource parser = new DefaultDOMSource(docSource);
Document doc = parser.parse();
LOGGER.error("After parse "+doc.getFirstChild().getTextContent());
}
For example lets consider the input source or htmlSource is <style></style>Test User <[email protected]>
After parsing the output will be Test User <[email protected]>.
Here the text content which contains email field enclosed with < and > are decoded to < and >, but as per our requirement, the parser should not decode < and > to < and >.
How to retain the text as it is without decoding or encoding text in this case, @radkovo could you please provide the solution for this issue?
The text was updated successfully, but these errors were encountered:
Hi @radkovo ,
We are using CSSBox DOM parser for parsing the HTML source, here is the implementation:
For example lets consider the input source or htmlSource is
<style></style>Test User <[email protected]>
After parsing the output will be
Test User <[email protected]>
.Here the text content which contains email field enclosed with
<
and>
are decoded to < and >, but as per our requirement, the parser should not decode<
and>
to < and >.How to retain the text as it is without decoding or encoding text in this case, @radkovo could you please provide the solution for this issue?
The text was updated successfully, but these errors were encountered: