Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues when processing a large set of documents #2

Open
anjackson opened this issue Mar 31, 2020 · 0 comments
Open

Issues when processing a large set of documents #2

anjackson opened this issue Mar 31, 2020 · 0 comments

Comments

@anjackson
Copy link
Owner

anjackson commented Mar 31, 2020

Processing 7655 large documents, it mostly worked but then I got a

create data tables
java.lang.RuntimeException: cannot parse/read stream: 
	at org.contentmine.eucl.xml.XMLUtil.parseQuietlyToDocument(XMLUtil.java:1276)
	at org.contentmine.ami.plugins.ResultsAnalysisImpl.addSnippetsFile(ResultsAnalysisImpl.java:82)
	at org.contentmine.ami.plugins.ResultsAnalysisImpl.addDefaultSnippets(ResultsAnalysisImpl.java:327)
	at org.contentmine.ami.plugins.CommandProcessor.createDataTables(CommandProcessor.java:233)
	at org.contentmine.ami.tools.AbstractAMISearchTool.runLegacyCommandProcessor(AbstractAMISearchTool.java:212)
	at org.contentmine.ami.tools.AMISearchTool.runLegacyCommandProcessor(AMISearchTool.java:249)
	at org.contentmine.ami.tools.AMISearchTool.runProjectSearch(AMISearchTool.java:244)
	at org.contentmine.ami.tools.AMISearchTool.processProject(AMISearchTool.java:230)
	at org.contentmine.ami.tools.AbstractAMISearchTool.runSpecifics(AbstractAMISearchTool.java:175)
	at org.contentmine.ami.tools.AbstractAMITool.runCommands(AbstractAMITool.java:347)
	at org.contentmine.ami.tools.AbstractAMITool.runCommands(AbstractAMITool.java:329)
	at org.contentmine.ami.tools.AMISearchTool.main(AMISearchTool.java:150)
Caused by: nu.xom.ParsingException: 2048
	at nu.xom.Builder.build(Unknown Source)
	at nu.xom.Builder.build(Unknown Source)
	at org.contentmine.eucl.xml.XMLUtil.parseQuietlyToDocument(XMLUtil.java:1274)
	... 11 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2048
	at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
	at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
	at org.apache.xerces.impl.XMLEntityScanner.scanLiteral(Unknown Source)
	at org.apache.xerces.impl.XMLScanner.scanAttributeValue(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
	at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	... 14 more
1564613 [main] ERROR org.contentmine.ami.plugins.ResultsAnalysisImpl  - bad snippets file:ethos-sample/word.frequencies.snippets.xml
1564613 [main] ERROR org.contentmine.ami.plugins.ResultsAnalysisImpl  - bad snippets file:ethos-sample/word.frequencies.snippets.xml

Because it threw this error, the analysis didn't complete, so I'm not sure I have all the outputs.

Also, the process appeared to slow down linearly, and consume very large amounts of memory.

@anjackson anjackson changed the title Exception when processing a large set of documents Issues when processing a large set of documents Apr 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant