Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a local entity resolver for XML, if possible #16

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cordawyn
Copy link

I've been using RMLmapper for a bunch of XML/XHTML files, and found out that whenever an XML file has a DOCTYPE entry, DocumentBuilder (in XML.java) tries to pull those ".dtd" (and ".ent") files from their original sources on the web, e.g.:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

... which is its natural behaviour, but it becomes very slow (or even not usable) in case of slow/unavailable network connections.

As a quick solution, I've implemented the ability to resolve those XML entities using local DTD files, if those are available in CLASSPATH, or falling back to the default behaviour (pulling them from the web), if a local DTD file is not found.

@cordawyn
Copy link
Author

Also, since XML file is re-parsed every time for every triple map (which should be optimized too, btw), those DTDs have to be re-downloaded again for every parsing pass. So it's downloading N .dtd files per M triple maps. Uh-oh 😉

@cordawyn
Copy link
Author

cordawyn commented Jan 9, 2019

I think that a proper catalog resolver should be used instead of this ad-hoc solution, actually. Will probably reiterate on this code later, if I have some more free time on my hands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants