command line pdftotext works fine, not so pdf-text-extract for same pdf file #30

markbobick · 2017-12-12T05:44:52Z

OS is fedora 20
using a file with "good provenance" "https://www.edge.org/documents/life/Life.pdf" as test for code:
(wget the file without issue)
...
var pdfToTextCommand = '/usr/bin/pdftotext';
var extract = require('pdf-text-extract');
...
extract(filePath, { splitPages: false, eol: "unix" }, pdfToTextCommand, function (err, text) {
if (err) {
var message = { message: err + " could not convert requested PDF file to text" };
console.log(JSON.stringify(message));
res.json(message);
}
console.log(text);

result:
"Error: pdf-text-extract command failed: Syntax Warning: May not be a PDF file (continuing anyway)\nSyntax Error: Couldn't find trailer dictionary\nSyntax Error: Couldn't read xref table\n

all files from any source I've attempted have same issue. performed the stare and compare. do not see error. using command line pdftotext works great. where is my error? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

command line pdftotext works fine, not so pdf-text-extract for same pdf file #30

command line pdftotext works fine, not so pdf-text-extract for same pdf file #30

markbobick commented Dec 12, 2017

command line pdftotext works fine, not so pdf-text-extract for same pdf file #30

command line pdftotext works fine, not so pdf-text-extract for same pdf file #30

Comments

markbobick commented Dec 12, 2017