-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade CSSbox to use active NekoHTML? #72
Comments
With a bit of hacking I've found a way to configure a local installation (through Gradle) to use nekohtml 2.59. Configure your implementation("net.sf.cssbox:cssbox:$cssboxVersion") {
exclude group: "net.sourceforge.nekohtml", module: "nekohtml"
}
implementation "net.sourceforge.htmlunit:neko-htmlunit:$nekoHtmlUnitVersion" And then it looks like the only change one needs to make is to not use public class BetterDOMSource extends DOMSource {
public BetterDOMSource(DocumentSource src) {
super(src);
}
@Override
public Document parse() throws SAXException, IOException {
DOMParser parser = new DOMParser(new HTMLConfiguration(););
parser.setProperty("http://cyberneko.org/html/properties/names/elems", "lower");
if (charset != null)
parser.setProperty("http://cyberneko.org/html/properties/default-encoding", charset);
parser.parse(new org.xml.sax.InputSource(getDocumentSource().getInputStream()));
return parser.getDocument();
}
} And use this source to load your ByteArrayInputStream is = new ByteArrayInputStream(html.getBytes(Charset.forName("UTF-8")));
StreamDocumentSource source = new StreamDocumentSource(is, url, "text/html");
DOMSource parser = new BetterDOMSource(source);
Document document = parser.parse(); |
I think we can use I've changed public Document parse() throws SAXException, IOException
{
DOMParser parser = new DOMParser(HTMLDocumentImpl.class);
parser.setProperty("http://cyberneko.org/html/properties/names/elems", "lower");
if (charset != null)
parser.setProperty("http://cyberneko.org/html/properties/default-encoding", charset);
parser.parse(new org.xml.sax.InputSource(getDocumentSource().getInputStream()));
return parser.getDocument();
} |
I've proposed the change to [email protected] in Apr. 2023, and further update today. |
It seems that CssBox is set up to use nekohtml 1.9.22, of which development seems to have ceased in 2015.
Nekohtml has since been forked into a new project https://github.com/HtmlUnit/htmlunit-neko which has active development (2.59.0 was released 14 days ago).
Is there any appetite for upgrading Cssbox to use the newer nekohtml?
Alternatively is there a way to configure a local installation to use this nekohtml instead?
(I ask because I'm hitting some weird bugs that I think are due to nekohtml's parsing.)
The text was updated successfully, but these errors were encountered: