The reader self creates data out of thin air? #420

kinshukkaura · 2024-09-29T18:52:08Z

Describe the bug
The reader creates data out of thin air for multiple pages. It creates tables with information that is not available anywhere in the pdf page.
.

Files
ppfas-mf-factsheet-for-August-2024.pdf

Job ID
7f63cc55-1a75-450d-aea0-3a6aa3c648ba

Screenshots

Client:

Frontend (cloud.llamaindex.ai)
Python Library
API

Options
Using the accurate method, with all other fields default/empty.

Additional context
Add any additional context about the problem here.

hexapode · 2024-09-30T01:16:53Z

Had a look at your job and you used our default mode (Accurate). This document work well with premium mode (see attached markdown, with the exception of a miss-classified chart as an image).
ppfas-mf-factsheet-for-August-2024.pdf.md

However the premium mode is more expensive as more compute is involved. Alternatively you can try to use our fast mode that will layout the text in an understandable way for LLM (but not extract the tables)

kinshukkaura · 2024-09-30T02:10:57Z

Thanks. I believe I would have to use the premium mode.
is there any reason why the model hallucinates in the default mode (Accurate) and not in other modes? Could playing with the parsing instructions (prompt) help in any way?

kinshukkaura added the bug Something isn't working label Sep 29, 2024

hexapode self-assigned this Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The reader self creates data out of thin air? #420

The reader self creates data out of thin air? #420

kinshukkaura commented Sep 29, 2024

hexapode commented Sep 30, 2024

kinshukkaura commented Sep 30, 2024

The reader self creates data out of thin air? #420

The reader self creates data out of thin air? #420

Comments

kinshukkaura commented Sep 29, 2024

hexapode commented Sep 30, 2024

kinshukkaura commented Sep 30, 2024