Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The reader self creates data out of thin air? #420

Open
kinshukkaura opened this issue Sep 29, 2024 · 3 comments
Open

The reader self creates data out of thin air? #420

kinshukkaura opened this issue Sep 29, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@kinshukkaura
Copy link

Describe the bug
The reader creates data out of thin air for multiple pages. It creates tables with information that is not available anywhere in the pdf page.
.

Files
ppfas-mf-factsheet-for-August-2024.pdf

Job ID
7f63cc55-1a75-450d-aea0-3a6aa3c648ba

Screenshots
image
image

Client:

  • Frontend (cloud.llamaindex.ai)
  • Python Library
  • API

Options
Using the accurate method, with all other fields default/empty.

Additional context
Add any additional context about the problem here.

@kinshukkaura kinshukkaura added the bug Something isn't working label Sep 29, 2024
@hexapode hexapode self-assigned this Sep 30, 2024
@hexapode
Copy link
Member

Had a look at your job and you used our default mode (Accurate). This document work well with premium mode (see attached markdown, with the exception of a miss-classified chart as an image).
ppfas-mf-factsheet-for-August-2024.pdf.md

However the premium mode is more expensive as more compute is involved. Alternatively you can try to use our fast mode that will layout the text in an understandable way for LLM (but not extract the tables)

@kinshukkaura
Copy link
Author

Thanks. I believe I would have to use the premium mode.
is there any reason why the model hallucinates in the default mode (Accurate) and not in other modes? Could playing with the parsing instructions (prompt) help in any way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants
@hexapode @kinshukkaura and others