We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Even if parser.from_text(x, service = 'meta') is selected, Tika extracts the content. For files that need OCR'ing this can take a lot of time.
parser.from_text(x, service = 'meta')
There are some solutions offered by Tika here to turn off OCR'ing. Since tika-python uses a Tika Server the last solution can be used:
parser.from_file(x, service = 'meta', headers = {"X-Tika-OCRskipOcr": 'true'})
This also works with service = 'all'. It returns the content if there is content that can be returned without OCR.