different abstract extracted via ocr #44

wo · 2016-02-19T20:16:56Z

https://studies2.hec.fr/jahia/webdav/site/hec/shared/sites/mongin/acces_anonyme/page%20internet/O12.MonginExpectedHbk97.pdf

Here publication info is treated as part of the abstract if processed via ocr2xml, not if processed via pdf2html.

wo · 2016-04-04T14:46:14Z

The reason is probably that the publication info is classified as having the same font by ocr2xml, but not by pdftohtml. Not really a problem. There's a significant gap between the publication info and the abstract though that should prevent treating both as abstract.

wo added the ocr2xml label Feb 19, 2016

wo added this to the new server start milestone Feb 19, 2016

wo modified the milestones: someday maybe, new server start Apr 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

different abstract extracted via ocr #44

different abstract extracted via ocr #44

wo commented Feb 19, 2016

wo commented Apr 4, 2016

different abstract extracted via ocr #44

different abstract extracted via ocr #44

Comments

wo commented Feb 19, 2016

wo commented Apr 4, 2016