Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to process a txtUnstructured image/document with paragraphAsOneLine? #94

Open
felipedaraujo opened this issue Oct 20, 2020 · 0 comments

Comments

@felipedaraujo
Copy link

I successfully run the JavaScript example in this repo and now I trying to use the parameter txtUnstructured:paragraphAsOneLine, but so far I haven't had any luck.

After line 87 I tried all the options below and none of them worked for me. Could you guide me on how to use this parameter in the correct way?

settings.language = "English"; // Can be comma-separated list, e.g. "German,French".
settings.exportFormat = "txtUnstructured";

// Alternative 1 - Didn't work
// settings["txtUnstructured:paragraphAsOneLine"] = "true";

// Alternative 2 - Didn't work
// settings["txtUnstructured:paragraphAsOneLine"] = true;

// Alternative 3 - Didn't work
// settings.txtUnstructured = { paragraphAsOneLine: true };

// Alternative 4 - Didn't work
// settings.txtUnstructured = { paragraphAsOneLine: "true" };

// Alternative 5 - Didn't work
// settings.paragraphAsOneLine = "true";

// Alternative 6 - Didn't work
// settings.paragraphAsOneLine = true;

https://cloud-westus.ocrsdk.com is the service target I am using.

My ultimate goal is to parse a PDF to txt the same way finereaderonline.com does, converting multiple columns to a single column and ignoring footers/page numbers.

Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant