-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nondeterministic behavior when fed lots of files quickly #8
Comments
Most likely related to the parallel extraction problems: #7 While you trigger one file after another it still extracts async in parallel: Try:
|
@rdpoor are you still seeing this issue in the latest version |
Sorry to report that I switched away from pdf-text-extract, so I cannot say whether 1.3.1 fixes the original issue. But I have provided a test case... :) |
Possibly related behavior after the first ~600-ish files ~/Documents/pa-en-export master *% ⟠ node index.js
/Users/ia/Documents/pa-en-export/node_modules/pdf-text-extract/index.js:120
child.stdout.setEncoding('utf8')
^
TypeError: Cannot read property 'setEncoding' of undefined
at streamResults (/Users/ia/Documents/pa-en-export/node_modules/pdf-text-extract/index.js:120:15)
at pdfTextExtract (/Users/ia/Documents/pa-en-export/node_modules/pdf-text-extract/index.js:109:5)
at files.forEach.file (/Users/ia/Documents/pa-en-export/index.js:21:9)
at Array.forEach (native)
at fs.readdir (/Users/ia/Documents/pa-en-export/index.js:18:11)
at FSReqWrap.oncomplete (fs.js:123:15)
|
When I call `pdf_text_extract()' with one file at a time, all works well. But when I hit it with a directory of pdf files (still one at a time, just fast), I get non-deterministic behavior where the resulting text is truncated.
Following is an example. Notice that calling
reportOneFile(...)
correctly processes and reports on individual files. ButreportDirectory(...)
reports that the processed text is either zero length, 8191 or (once in a while) the correct length. Subsequent calls toreportDirectory()
produce different results:Here is the code in its entirety:
P.S.: Regrettably I cannot post the .pdfs themselves in a gist as they contain personal customer data. But I've verified this fails on other directories full of pdfs. If you want a gist, let me know.
The text was updated successfully, but these errors were encountered: